| BamFile {Rsamtools} | R Documentation |
Use BamFile() to create a reference to a BAM file (and
optionally its index). The reference remains open across calls to
methods, avoiding costly index re-loading.
BamFileList() provides a convenient way of managing a list of
BamFile instances.
## Constructors
BamFile(file, index=file, ..., yieldSize=NA_integer_, obeyQname=FALSE)
BamFileList(..., yieldSize=NA_integer_, obeyQname=FALSE)
## Opening / closing
## S3 method for class 'BamFile'
open(con, ...)
## S3 method for class 'BamFile'
close(con, ...)
## accessors; also path(), index(), yieldSize(), obeyQname()
## S4 method for signature 'BamFile'
isOpen(con, rw="")
## actions
## S4 method for signature 'BamFile'
scanBamHeader(files, ...)
## S4 method for signature 'BamFile'
seqinfo(x)
## S4 method for signature 'BamFile'
scanBam(file, index=file, ..., param=ScanBamParam(what=scanBamWhat()))
## S4 method for signature 'BamFile'
countBam(file, index=file, ..., param=ScanBamParam())
## S4 method for signature 'BamFileList'
countBam(file, index=file, ..., param=ScanBamParam())
## S4 method for signature 'BamFile'
filterBam(file, destination, index=file, ...,
indexDestination=TRUE, param=ScanBamParam(what=scanBamWhat()))
## S4 method for signature 'BamFile'
indexBam(files, ...)
## S4 method for signature 'BamFile'
sortBam(file, destination, ..., byQname=FALSE, maxMemory=512)
## S4 method for signature 'BamFileList'
mergeBam(files, destination, ...)
## S4 method for signature 'BamFile'
readBamGappedAlignments(file, index=file, ..., use.names=FALSE, param=NULL)
## S4 method for signature 'BamFile'
readBamGappedReads(file, index=file, use.names=FALSE, param=NULL)
## S4 method for signature 'BamFile'
readBamGappedAlignmentPairs(file, index=file, use.names=FALSE, param=NULL)
## S4 method for signature 'BamFile'
readBamGAlignmentsList(file, index=file, ...,
use.names=FALSE, param=ScanBamParam(), asProperPairs=TRUE)
## counting
## S4 method for signature 'GRanges,BamFileList'
summarizeOverlaps(features, reads, mode, ignore.strand=FALSE, ...,
singleEnd=TRUE, param=ScanBamParam())
## S4 method for signature 'GRangesList,BamFileList'
summarizeOverlaps(features, reads, mode, ignore.strand=FALSE, ...,
singleEnd=TRUE, param=ScanBamParam())
## S4 method for signature 'character,ANY'
findSpliceOverlaps(query, subject, ignore.strand=FALSE, ...,
param=ScanBamParam(), pairedEnd=FALSE)
## S4 method for signature 'BamFile,ANY'
findSpliceOverlaps(query, subject, ignore.strand=FALSE, ...,
param=ScanBamParam(), pairedEnd=FALSE)
## S4 method for signature 'BamFile'
coverage(x, shift=0L, width=NULL, weight=1L, ..., param = ScanBamParam())
## S4 method for signature 'BamFile'
quickCountBam(file, ..., param=ScanBamParam(), mainGroupsOnly=FALSE)
... |
Additional arguments. For |
con |
An instance of |
x, file, files |
A character vector of BAM file paths (for
|
index |
character(1); the BAM index file path (for
|
yieldSize |
Number of records to yield each time the file is read
from using |
destination |
character(1) file path to write filtered reads to. |
indexDestination |
logical(1) indicating whether the destination file should also be indexed. |
byQname, maxMemory |
See |
obeyQname |
A logical(1) indicating whether the file is
sorted by |
param |
An optional |
use.names |
Construct the names of the returned object from the query template names (QNAME field)? If not (the default), then the returned object has no names. |
rw |
Mode of file; ignored. |
reads |
A |
features |
A GRanges or a GRangesList object of genomic regions of
interest. When a GRanges is supplied, each row is considered a
feature. When a GRangesList is supplied, each higher list-level is
considered a feature. This distinction is important when defining an overlap
between a read and a feature. See ? |
mode |
A function that defines the method to be used when a read overlaps more than one feature. Pre-defined options are "Union", "IntersectionStrict", or "IntersectionNotEmpty" and are designed after the counting modes available in the HTSeq package by Simon Anders (see references).
|
ignore.strand |
A logical value indicating if strand should be considered when matching. |
singleEnd |
A logical value indicating if reads are single or paired-end. |
pairedEnd |
A logical value indicating if reads are single or paired-end. |
query |
Paired-end reads can be supplied in a Bam file or GappedAlignmentPairs object. Single-end may be in a Bam file, GappedAlignments or GRanges object. |
subject |
A TranscriptDb, or GRangesList containing the annotations. |
shift, width, weight |
See |
mainGroupsOnly |
See |
asProperPairs |
A logical indicating if the records should be filtered
such that only proper pairs are returned. Applies to
|
Objects are created by calls of the form BamFile().
The BamFile class inherits fields from the
RsamtoolsFile class.
BamFileList inherits methods from
RsamtoolsFileList and SimpleList.
Opening / closing:
Opens the (local or remote) path and
index (if bamIndex is not character(0)),
files. Returns a BamFile instance.
Closes the BamFile con; returning
(invisibly) the updated BamFile. The instance may be
re-opened with open.BamFile.
Accessors:
Returns a character(1) vector of BAM path names.
Returns a character(1) vector of BAM index path names.
Return or set an integer(1) vector indicating yield size.
Return or set a logical(0) indicating if the file was sorted by qname.
Methods:
Visit the path in path(file), returning
the information contained in the file header; see
scanBamHeader.
Visit the path in path(file), returning
a Seqinfo instance containing information on
the lengths of each sequence.
Visit the path in path(file), returning the
result of scanBam applied to the specified path.
Visit the path(s) in path(file), returning
the result of countBam applied to the specified
path.
Visit the path in path(file), returning
the result of filterBam applied to the specified
path.
Visit the path in path(file), returning
the result of indexBam applied to the specified
path.
Visit the path in path(file), returning the
result of sortBam applied to the specified path.
Merge several BAM files into a single BAM file. See
mergeBam for details; additional arguments supported
by mergeBam,character-method are also available for
BamFileList.
Visit the path in path(file), returning the result of
readBamGappedAlignments, readBamGappedReads,
or readBamGappedAlignmentPairs applied to the specified path.
See readBamGappedAlignments.
Visit the Bam file in path(file). The file must be sorted
by qname, see ?sortBam. When a yieldSize is set on
the BamFile data are read in chunks. To read the complete file a
while or similar loop construct must be used. When
asProperPairs=TRUE only proper pairs are returned.
See the ?GappedAlignmentsPairs man page for details of the
proper pairs filtering.
The return value from readBamGAlignmentList is a
GAlignmentsList where each list element contains all records
of the same id (QNAME in SAM/BAM file). When asProperPairs is
TRUE each list element has exactly 2 records; these are the
same data as that returned from readBamGappedAlignmentPairs, only
the return class is different. When asProperPairs is FALSE,
no QC is performed resulting in 1 or more records per element. List
elements containing singletons, unpaired reads or single fragments have
a length of 1 while paired-end reads or those with multiple fragments
have a length of 2 or greater.
(NOTE: asProperPairs=TRUE not yet implemented)
Compactly display the object.
Martin Morgan and Marc Carlson
The GenomicRanges package is where the summarizeOverlaps
method originates.
fl <- system.file("extdata", "ex1.bam", package="Rsamtools",
mustWork=TRUE)
length(scanBam(fl)[[1]][[1]]) # all records
bf <- open(BamFile(fl)) # implicit index
bf
identical(scanBam(bf), scanBam(fl))
close(bf)
## chunks of size 1000
bf <- open(BamFile(fl, yieldSize=1000))
while (nrec <- length(scanBam(bf)[[1]][[1]]))
cat("records:", nrec, "\n")
close(bf)
rng <- GRanges(c("seq1", "seq2"), IRanges(1, c(1575, 1584)))
## repeatedly visit 'bf'
bf <- open(BamFile(fl))
sapply(seq_len(length(rng)), function(i, bamFile, rng) {
param <- ScanBamParam(which=rng[i], what="seq")
bam <- scanBam(bamFile, param=param)[[1]]
alphabetFrequency(bam[["seq"]], baseOnly=TRUE, collapse=TRUE)
}, bf, rng)
close(bf)
##------------------------------------------------------------------------
## summarizeOverlaps with BamFileList
##
library(pasillaBamSubset)
library("TxDb.Dmelanogaster.UCSC.dm3.ensGene")
exbygene <- exonsBy(TxDb.Dmelanogaster.UCSC.dm3.ensGene, "gene")
## single-end:
## When 'yieldSize' is specified the file is processed by chunks.
## Otherwise the complete file is read into memory.
fl <- untreated1_chr4()
bfl <- BamFileList(fl, yieldSize=50000)
se1 <- summarizeOverlaps(exbygene, bfl, singleEnd=TRUE)
counts1 <- assays(se1)$counts
## paired-end sorted by qname:
## Set 'singleEnd' to 'FALSE'. A BAM file sorted by qname
## can be read in chunks with 'yieldSize'.
fl <- untreated3_chr4()
sortfl <- sortBam(fl, tempfile(), byQname=TRUE)
bf2 <- BamFileList(sortfl, index=character(0),
yieldSize=50000, obeyQname=TRUE)
se2 <- summarizeOverlaps(exbygene, bf2, singleEnd=FALSE)
counts2 <- assays(se2)$counts
## paired-end not sorted:
## If the file is not sorted by qname, all records are read
## into memory for sorting and to determine proper pairs.
## Any 'yieldSize' set on the BamFile will be ignored.
fl <- untreated3_chr4()
bf3 <- BamFileList(fl)
se3 <- summarizeOverlaps(exbygene, bf3, singleEnd=FALSE)
counts3 <- assays(se3)$counts
identical(as.vector(counts2), as.vector(counts3))
##------------------------------------------------------------------------
## findSpliceOverlaps
##
## See ?'findSpliceOverlaps' for examples