| yieldReduce {Rsamtools} | R Documentation |
Rsamtools files can be created with a ‘yieldSize’ argument that
influences the number of records (chunk size) input at one time (see,
e.g,. BamFile). yieldReduce iterates through the
file, processing each chunk and reducing it with previously input
chunks. This is a memory efficient way to process large data files,
especially when the final result fits in memory.
yieldReduce(X, MAP, REDUCE, DONE, ..., init, ITERATE = TRUE)
X |
A |
MAP |
A function of one or more arguments, |
REDUCE |
A function of one ( |
DONE |
A function of one argument, the |
... |
Additional arguments, passed to |
init |
(Optional) Initial value used for |
ITERATE |
logical(1) determining whether the call to
|
When ITERATE=TRUE, REDUCE is initially invoked with
either the init value and the value of the first call to
MAP or, if init is missing, the values of the first two
calls to MAP.
When ITERATE=FALSE, REDUCE is invoked with a list
containing a list with as many elements as there were calls to
MAP. Each element the result of an invocation of MAP.
The return value is the value returned by the final invocation of
REDUCE, or init if provided and no data were yield'ed,
or list() if init is missing and no data were yield'ed.
Martin Morgan mtmorgan@fhcrc.org
BamFile, TabixFile, RsamtoolsFile.
fl <- system.file(package="Rsamtools", "extdata", "ex1.bam")
## nucleotide frequency of mapped reads
bf <- BamFile(fl, yieldSize=500) ## typically, yieldSize=1e6
param <- ScanBamParam(
flag=scanBamFlag(isUnmappedQuery=FALSE),
what="seq")
MAP <- function(X, param) {
value <- scanBam(X, param=param)[[1]][["seq"]]
if (length(value))
alphabetFrequency(value, collapse=TRUE)
else value # will be integer(0)
}
REDUCE <- `+` # add successive alphabetFrequency matrices
yieldReduce(bf, MAP, REDUCE, param=param)
## coverage
if (require(GenomicAlignments)) {
MAP <- function(X)
coverage(readGAlignments(X))
REDUCE <- `+`
DONE <- function(VALUE)
## coverage() on zero GAlignments returns an RleList,
## each element of which has 0 coverage
sum(sum(VALUE)) == 0L
yieldReduce(bf, MAP, REDUCE, DONE)
}