Main Content

getSegments

Return table of non-overlapping segments from GTFAnnotation object

Description

segments = getSegments(AnnotObj) returns segments, a table of non-overlapping segments of nucleotide sequences built by flattening the transcripts in AnnotObj. If an exon boundary is not the same in two or more transcripts of a gene, then the function creates two or more non-overlapping segments which cover all exons in the transcript.

[segments,transcriptIDs] = getSegments(AnnotObj) returns transcriptIDs, a cell array of character vectors containing all unique transcript IDs in AnnotObj.

example

[___] = getSegments(AnnotObj,"Reference",R) returns the segments that belong to one or more references specified by R.

[___] = getSegments(AnnotObj"Gene",G) returns the segments that belong to one of more genes specified by G.

[___] = getSegments(AnnotObj,"Transcript",T) returns the segments that belong to one or more transcripts specified by T.

Examples

collapse all

Create a GTFAnnotation object from a GTF-formatted file.

obj = GTFAnnotation('hum37_2_1M.gtf');

Retrieve unique reference names. In this case, there is only one reference sequence, which is chromosome 2 (chr2).

ref = getReferenceNames(obj)
ref = 1x1 cell array
    {'chr2'}

Get a table of all non-overlapping segments of nucleotide sequences which belong to chr2.

segments = getSegments(obj,"Reference",ref);

Input Arguments

collapse all

GTF annotation, specified as a GTFAnnotation object.

Names of reference sequences, specified as a character vector, string, string vector, cell array of character vectors, or categorical array.

The names must come from the Reference field of AnnotObj. If a name does not exist, the function provides a warning and ignores it.

Data Types: char | string | cell | categorical

Names of genes, specified as a character vector, string, string vector, cell array of character vectors, or categorical array.

The names must come from the Gene field of AnnotObj. If a name does not exist, the function provides a warning and ignores the name.

Data Types: char | string | cell | categorical

Names of transcripts, specified as a character vector, string, string vector, cell array of character vectors, or categorical array.

The names must come from the Transcript field of AnnotObj. If a name does not exist, the function gives a warning and ignores the name.

Data Types: char | string | cell | categorical

Output Arguments

collapse all

Non-overlapping segments, returned as a table. The table contains the following variables for each segments.

Variable NameDescription
StartStart location of each segment.
StopStop location of each segment.
ReferenceCategorical array representing the names of reference sequences to which the segments belong, obtained from the Reference field of AnnotObj.
ExonIndicatorLogical sparse matrix of segment versus exon. The rows represent segments. The columns are exons. If the ith segment is part of the jth exon, the element at position (i,j) is 1. Otherwise, it is 0.
TranscriptIndicatorLogical sparse matrix of segment versus transcript. The rows represent segments and the columns are transcripts. The element at position (i,j) is 1 if the ith segment is part of the jth transcript, and 0 otherwise.

Unique transcript IDs, returned as a cell array of character vectors. The transcript IDs correspond to columns of the TranscriptIndicator variable of segments. For instance, the first element of transcriptIDs is the ID of the first column of TranscriptIndicator matrix.

Version History

Introduced in R2014b