Main Content

aacount

Count amino acids in sequence

    Description

    countStruct = aacount(SeqAA) counts the number of each type of amino acid in SeqAA, an amino acid sequence, and returns the counts in countStruct, a 1-by-1 MATLAB® structure containing fields for the standard 20 amino acids (A, R, N, D, C, Q, E, G, H, I, L, K, M, F, P, S, T, W, Y, and V).

    example

    countStruct = aacount(SeqAA,Name=Value) uses additional options specified by one or more name-value arguments. For example, countStruct = aacount(SeqAA,Chart="pie") creates a pie chart showing relative proportions of the amino acids.

    example

    Examples

    collapse all

    Use the fastaread function to load the sequence of the human p53 tumor protein.

    p53 = fastaread('p53aa.txt')
    p53 = struct with fields:
          Header: 'gi|8400738|ref|NP_000537.2| tumor protein p53 [Homo sapiens]'
        Sequence: 'MEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLPSQAMDDLMLSPDDIEQWFTEDPGPDEAPRMPEAAPRVAPAPAAPTPAAPAPAPSWPLSSSVPSQKTYQGSYGFRLGFLHSGTAKSVTCTYSPALNKMFCQLAKTCPVQLWVDSTPPPGTRVRAMAIYKQSQHMTEVVRRCPHHERCSDSDGLAPPQHLIRVEGNLRVEYLDDRNTFRHSVVVPYEPPEVGSDCTTIHYNYMCNSSCMGGMNRRPILTIITLEDSSGNLLGRNSFEVRVCACPGRDRRTEEENLRKKGEPHHELPPGSTKRALPNNTSSSPQPKKKPLDGEYFTLQIRGRERFEMFRELNEALELKDAQAGKEPGGSRAHSSHLKSKKGQSTSRHKKLMFKTEGPDSD'
    
    

    Count the amino acids in the sequence, and display the results in a pie chart.

    count = aacount(p53,'Chart','pie');

    Figure contains an axes object. The hidden axes object contains 40 objects of type patch, text. These objects represent A, R, N, D, C, Q, E, G, H, I, L, K, M, F, P, S, T, W, Y, V.

    Input Arguments

    collapse all

    Amino acid sequence, specified as one of the following.

    • Character vector or string scalar consisting of single-letter codes of an amino acid sequence. For valid letter codes, see the table Mapping Amino Acid Letter Codes to Integers.

      • Unknown characters are mapped to 0.

      • Ambiguous amino acid characters (B, Z, or X), gaps that are indicated by a hyphens (-), and end terminators (*) are ignored by default.

      • Unrecognized characters are ignored and cause the following warning message.

        Warning: Unknown symbols appear in the sequence. These will be ignored.

    • Row vector of integers specifying an amino acid sequence. For valid integers, see the table Mapping Amino Acid Integers to Letter Codes.

    • Structure that contains an amino acid sequence in the Sequence field. The fastaread, getgenpept, genpeptread, getpdb, and pdbread functions return structures with a Sequence field.

    Example: "ARN"

    Data Types: char | string

    Name-Value Arguments

    Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

    Example: aa_count = aacount(seq,Ambiguous="ignore",Gaps=true,Chart="pie")

    Approach to treat ambiguous amino acid characters (B, Z, or X), specified as one of the following:

    Ambiguous ValueDescription
    "ignore"Skips ambiguous characters.
    "bundle"Counts ambiguous characters and reports the total count in the Ambiguous field.
    "prorate"Counts ambiguous characters and distributes them proportionately in the appropriate fields. For example, the counts for the character B are distributed evenly between the D and N fields.
    "individual"Counts ambiguous characters and reports them in individual fields.
    "warn"Skips ambiguous characters symbols and displays a warning.

    Data Types: char | string

    Flag to dictate whether to count gaps or ignore them, specified as true or false. Each gap is indicated by a hyphen (-).

    Data Types: logical

    Chart type, specified as "pie" or "bar". The pie or bar chart shows the relative proportions of the amino acids.

    Data Types: char | string

    Output Arguments

    collapse all

    Total amino acid count, returned as a structure. The structure contains fields for the standard 20 amino acids (A, R, N, D, C, Q, E, G, H, I, L, K, M, F, P, S, T, W, Y, and V) and the corresponding counts.

    Version History

    Introduced before R2006a