Main Content

getData

Create structure containing subset of data from GTFAnnotation or GFFAnnotation object

Description

AnnotStruct = getData(AnnotObj) returns AnnotStruct, an array of structures containing data from all elements in AnnotObj. The fields in the return structures are the same as the elements in the FieldNames property of AnnotObj.

example

AnnotStruct = getData(AnnotObj,StartPos,EndPos) returns AnnotStruct, an array of structures containing data from a subset of the elements in AnnotObj that falls within each reference sequence range specified by StartPos and EndPos.

example

AnnotStruct = getData(AnnotObj,Subset) returns AnnotStruct, an array of structures containing subset of data from AnnotObj specified by Subset, a vector of integers.

AnnotStruct = getData(___,Name,Value) returns AnnotStruct, an array of structures, using any of the input arguments in the previous syntaxes and additional options specified by one or more Name,Value pair arguments.

Examples

collapse all

Construct a GTFAnnotation object using a GTF-formatted file that is provided with Bioinformatics Toolbox™.

GTFAnnotObj = GTFAnnotation('hum37_2_1M.gtf');

Extract the annotation data for positions 668,000 through 680,000 from the reference sequence.

AnnotStruct1 = getData(GTFAnnotObj,668000,680000)
AnnotStruct1=18×1 struct array with fields:
    Reference
    Start
    Stop
    Feature
    Gene
    Transcript
    Source
    Score
    Strand
    Frame
    Attributes

Extract the first five annotations from the object.

AnnotStruct2 = getData(GTFAnnotObj,1:5)
AnnotStruct2=5×1 struct array with fields:
    Reference
    Start
    Stop
    Feature
    Gene
    Transcript
    Source
    Score
    Strand
    Frame
    Attributes

Construct a GFFAnnotation object using a GFF-formatted file that is provided with Bioinformatics Toolbox™.

GFFAnnotObj = GFFAnnotation('tair8_1.gff');

Extract annotations for positions 10,000 through 20,000 from the reference sequence.

AnnotStruct1 = getData(GFFAnnotObj,10000,20000)
AnnotStruct1=9×1 struct array with fields:
    Reference
    Start
    Stop
    Feature
    Source
    Score
    Strand
    Frame
    Attributes

Extract the first five annotations from the object.

AnnotStruct2 = getData(GFFAnnotObj,1:5)
AnnotStruct2=5×1 struct array with fields:
    Reference
    Start
    Stop
    Feature
    Source
    Score
    Strand
    Frame
    Attributes

Input Arguments

collapse all

Feature annotations, specified as a GTFAnnotation or GFFAnnotation object.

Start of a range in each reference sequence in AnnotObj, specified as a nonnegative integer less than or equal to EndPos.

Data Types: double

End of a range in each reference sequence in AnnotObj, specified as a nonnegative integer greater than or equal to StartPos.

Data Types: double

Subset of data from AnnotObj to retrieve, specified as a vector of positive integers. Each integer must be less than or equal to the number of entries in the object.

Data Types: double

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: AnnotStruct = getData(AnnotObj,"Reference","Chr")

One or more reference sequences in AnnotObj, specified as a character vector, string, string vector, or cell array of character vectors. Only annotations whose reference field matches one of the character vectors or strings are included in AnnotStruct.

Data Types: char | string | cell

One or more features in AnnotObj, specified as a character vector, string, string vector, or cell array of character vectors. Only annotations whose feature field matches one of the character vectors or strings are included in AnnotStruct.

Data Types: char | string | cell

One or more genes in AnnotObj of type GTFAnnotation, specified as a character vector, string, string vector, or cell array of character vectors. Only annotations whose gene field matches one of the character vectors or strings are included in AnnotStruct.

Data Types: char | string | cell

One or more transcripts in AnnotObj of type GTFAnnotation, specified as a character vector, string, string vector, or cell array of character vectors. Only annotations whose transcript field matches one of the character vectors or strings are included in AnnotStruct.

Data Types: char | string | cell

Minimum number of base positions that annotation must overlap in the range, to be included in AnnotStruct, specified as a positive integer, "full" or "start". Use "full" when an annotation must be fully contained in the range to be included. Use "start" when an annotation’s start position must lie within the range to be included.

Data Types: double | char | string

Output Arguments

collapse all

Data from elements in AnnotObj, returned as a structure array with these fields:

  • Reference

  • Start

  • Stop

  • Feature

  • Gene (for AnnotObj of type GTFAnnotation)

  • Transcript (for AnnotObj of type GTFAnnotation)

  • Source

  • Score

  • Strand

  • Frame

  • Attributes

The fields are the same as the elements in the FieldNames property of AnnotObj. See GTF2.2: A Gene Annotation Format.

Tips

Using getData creates a structure, which provides better access to the annotation data than an object.

  • You can access all field values in a structure.

  • You can extract, assign, and delete field values.

  • You can use linear indexing to access field values of specific annotations. For example, you can access the start value of only the fifth annotation.

Version History

Introduced in R2013a