| findOverlaps-methods {IRanges} | R Documentation |
Various methods for finding/counting interval overlaps between two "range-based" objects: a query and a subject.
NOTE: This man page describes the methods that operate on a query and a subject that are both either a Ranges, Views, RangesList, ViewsList, or RangedData object. (In addition, if the query is a Ranges object, the subject can be an IntervalTree object; if the query is a RangesList object, the subject can be a IntervalForest object. And if the subject is a Ranges object, the query can be an integer vector.)
See ?`findOverlaps,GenomicRanges,GenomicRanges-method`
in the GenomicRanges package for methods that operate on
GRanges or GRangesList
objects. See also the ?`GIntervalTree` class and the
?`findOverlaps,GenomicRanges,GIntervalTree-method` method
for finding overlaps with persistent IntervalForest objects.
findOverlaps(query, subject, maxgap=0L, minoverlap=1L,
type=c("any", "start", "end", "within", "equal"),
select=c("all", "first", "last", "arbitrary"), ...)
countOverlaps(query, subject, maxgap=0L, minoverlap=1L,
type=c("any", "start", "end", "within", "equal"), ...)
overlapsAny(query, subject, maxgap=0L, minoverlap=1L,
type=c("any", "start", "end", "within", "equal"), ...)
query %over% subject
query %within% subject
query %outside% subject
subsetByOverlaps(query, subject, maxgap=0L, minoverlap=1L,
type=c("any", "start", "end", "within", "equal"), ...)
## S4 method for signature 'Hits'
ranges(x, query, subject)
query, subject |
Each of them can be a Ranges, Views, RangesList,
ViewsList, or RangedData object.
In addition, if If both lists have names, each element from the subject is paired with the element from the query with the matching name, if any. Otherwise, elements are paired by position. The overlap is then computed between the pairs as described below. If If |
maxgap, minoverlap |
Intervals with a separation of |
type |
By default, any overlap is accepted. By specifying the The |
select |
When If |
... |
Further arguments to be passed to or from other methods:
|
x |
Hits object returned by |
A common type of query that arises when working with intervals is finding which intervals in one set overlap those in another.
The simplest approach is to call the findOverlaps function
on a Ranges or other object with range information (aka
"range-based object").
An IntervalTree object is a derivative of Ranges and
stores its ranges as a tree that is optimized for overlap queries.
Thus, for repeated queries against the same subject, it is more
efficient to create an IntervalTree once for the subject
using the IntervalTree constructor and then perform
the queries against the IntervalTree instance. An IntervalForest
object is a derivative of RangesList and stores its ranges
as a set of trees optimizized for partitioned overlap queries.
Again, for repeated queries against the same subject list, it is more
efficient to create an IntervalForest once and then perform
the queries against the IntervalForest instance.
findOverlaps returns either a Hits object when
select="all" (the default), or an integer vector when
select is not "all". For RangesList objects
it returns a HitsList-class object when select="all", or
an IntegerList when select is not "all".
When subject is an IntervalForest object,
it returns a CompressedHitsList or CompressedIntegerList
respectively.
countOverlaps returns the overlap hit count for each range
in query using the specified findOverlaps parameters.
For RangesList objects, it returns an IntegerList object.
When subject is an IntervalForest it returns a
CompressedIntegerList.
overlapsAny finds the ranges in query that overlap any
of the ranges in subject. For Ranges or Views
objects, it returns a logical vector of length equal to the number of
ranges in query. For RangesList, RangedData, or
ViewsList objects, it returns a LogicalList object,
where each element of the result corresponds to a space in query.
When subject is an IntervalForest object, it returns
a CompressedLogicalList object.
%over% and %within% are convenience wrappers for the
2 most common use cases. Currently defined as
`%over%` <- function(query, subject) overlapsAny(query, subject)
and
`%within%` <- function(query, subject)
overlapsAny(query, subject,
type="within"). %outside% is simply the inverse of %over%.
subsetByOverlaps returns the subset of query that
has an overlap hit with a range in subject using the specified
findOverlaps parameters.
ranges(x, query, subject) returns a Ranges of the same
length as Hits object x holding the regions of intersection
between the overlapping ranges in objects query and subject,
which should be the same query and subject used in the call to
findOverlaps that generated x.
Michael Lawrence with contributions by Hector Corrada Bravo
Allen's Interval Algebra: James F. Allen: Maintaining knowledge about temporal intervals. In: Communications of the ACM. 26/11/1983. ACM Press. S. 832-843, ISSN 0001-0782
The Hits and HitsList classes for representing a set of hits between 2 vector-like objects.
findOverlaps,GenomicRanges,GenomicRanges-method in the GenomicRanges package for methods that operate on GRanges or GRangesList objects.
findOverlaps,GenomicRanges,GIntervalTree-method in the GenomicRanges package for methods that use IntervalForest objects to find overlaps.
The IntervalTree class and constructor.
The IntervalForest class and constructor.
The Ranges, Views, RangesList, ViewsList, and RangedData classes.
The IntegerList and LogicalList classes.
query <- IRanges(c(1, 4, 9), c(5, 7, 10))
subject <- IRanges(c(2, 2, 10), c(2, 3, 12))
tree <- IntervalTree(subject)
## ---------------------------------------------------------------------
## findOverlaps()
## ---------------------------------------------------------------------
## at most one hit per query
findOverlaps(query, tree, select = "first")
findOverlaps(query, tree, select = "last")
findOverlaps(query, tree, select = "arbitrary")
## overlap even if adjacent only
## (FIXME: the gap between 2 adjacent ranges should be still considered
## 0. So either we have an argument naming problem, or we should modify
## the handling of the 'maxgap' argument so that the user would need to
## specify maxgap = 0L to obtain the result below.)
findOverlaps(query, tree, maxgap = 1L)
## shortcut
findOverlaps(query, subject)
query <- IRanges(c(1, 4, 9), c(5, 7, 10))
subject <- IRanges(c(2, 2), c(5, 4))
tree <- IntervalTree(subject)
## one Ranges with itself
findOverlaps(query)
## single points as query
subject <- IRanges(c(1, 6, 13), c(4, 9, 14))
findOverlaps(c(3L, 7L, 10L), subject, select = "first")
## alternative overlap types
query <- IRanges(c(1, 5, 3, 4), width=c(2, 2, 4, 6))
subject <- IRanges(c(1, 3, 5, 6), width=c(4, 4, 5, 4))
findOverlaps(query, subject, type = "start")
findOverlaps(query, subject, type = "start", maxgap = 1L)
findOverlaps(query, subject, type = "end", select = "first")
ov <- findOverlaps(query, subject, type = "within", maxgap = 1L)
ov
## ---------------------------------------------------------------------
## overlapsAny()
## ---------------------------------------------------------------------
overlapsAny(query, subject, type="start")
overlapsAny(query, subject, type="end")
query %over% subject # same as overlapsAny(query, subject)
query %within% subject # same as overlapsAny(query, subject,
# type="within")
## ---------------------------------------------------------------------
## "ranges" METHOD FOR Hits OBJECTS
## ---------------------------------------------------------------------
## extract the regions of intersection between the overlapping ranges
ranges(ov, query, subject)
## ---------------------------------------------------------------------
## using IntervalForest objects
## ---------------------------------------------------------------------
query <- IRanges(c(1, 4, 9), c(5, 7, 10))
qpartition <- factor(c("a","a","b"))
qlist <- split(query, qpartition)
subject <- IRanges(c(2, 2, 10), c(2, 3, 12))
spartition <- factor(c("a","a","b"))
slist <- split(subject, spartition)
forest <- IntervalForest(slist)
## at most one hit per query
findOverlaps(qlist, forest, select = "first")
findOverlaps(qlist, forest, select = "last")
findOverlaps(qlist, forest, select = "arbitrary")
query <- IRanges(c(1, 5, 3, 4), width=c(2, 2, 4, 6))
qpartition <- factor(c("a","a","b","b"))
qlist <- split(query, qpartition)
subject <- IRanges(c(1, 3, 5, 6), width=c(4, 4, 5, 4))
spartition <- factor(c("a","a","b","b"))
slist <- split(subject, spartition)
forest <- IntervalForest(slist)
overlapsAny(qlist, forest, type="start")
overlapsAny(qlist, forest, type="end")
qlist
subsetByOverlaps(qlist, forest)
countOverlaps(qlist, forest)