seq2regexp
Convert sequence with ambiguous characters to regular expression
Syntax
RegExp
= seq2regexp(Seq
)
RegExp
= seq2regexp(Seq
,
...'Alphabet', AlphabetValue
, ...)
RegExp
= seq2regexp(Seq
,
...'Ambiguous', AmbiguousValue
, ...)
Input Arguments
Seq | Either of the following:
|
AlphabetValue | Character vector or string specifying the sequence alphabet. Choices are:
|
AmbiguousValue | Controls whether ambiguous characters are included in
|
Output Arguments
RegExp | Character vector of codes specifying an amino acid or nucleotide sequence in regular expression format using IUB/IUPAC codes. |
Description
converts
ambiguous amino acid or nucleotide symbols in a sequence to a regular
expression format using IUB/IUPAC codes.RegExp
= seq2regexp(Seq
)
calls RegExp
= seq2regexp(Seq
,
...'PropertyName
', PropertyValue
,
...)seq2regexp
with optional properties
that use property name/property value pairs. You can specify one or
more properties in any order. Each PropertyName
must
be enclosed in single quotation marks and is case insensitive. These
property name/property value pairs are as follows:
specifies
the sequence alphabet. RegExp
= seq2regexp(Seq
,
...'Alphabet', AlphabetValue
, ...)AlphabetValue
can
be either 'NT'
for nucleotide sequences or 'AA'
for
amino acid sequences. Default is 'NT'
.
controls
whether ambiguous characters are included in RegExp
= seq2regexp(Seq
,
...'Ambiguous', AmbiguousValue
, ...)RegExp
,
the regular expression return value. Choices are true
(default)
or false
. For example:
If
Seq
= 'ACGTK'
, andAmbiguousValue
istrue
, the MATLAB® software returnsACGT[GTK]
with the unambiguous charactersG
andT
and the ambiguous characterK
.If
Seq
= 'ACGTK'
, andAmbiguousValue
isfalse
, the MATLAB software returnsACGT[GT]
with only the unambiguous characters.
Nucleotide Conversion
Nucleotide Code | Nucleotide | Conversion |
---|---|---|
A | Adenosine | A |
C | Cytosine | C |
G | Guanine | G |
T | Thymidine | T |
U | Uridine | U |
R | Purine | [AG] |
Y | Pyrimidine | [TC] |
K | Keto | [GT] |
M | Amino | [AC] |
S | Strong interaction (3 H bonds) | [GC] |
W | Weak interaction (2 H bonds) | [AT] |
B | Not A | [CGT] |
D | Not C | [AGT] |
H | Not G | [ACT] |
V | Not T or U | [ACG] |
N | Any nucleotide | [ACGT] |
- | Gap of indeterminate length | - |
? | Unknown | ? |
Amino Acid Conversion
Amino Acid Code | Amino Acid | Conversion |
---|---|---|
B | Asparagine or Aspartic acid (Aspartate) | [DN] |
Z | Glutamine or Glutamic acid (Glutamate) | [EQ] |
X | Any amino acid | [A R N D C Q E G H I L K M F
P S T W Y V] |
Examples
Convert a nucleotide sequence to a regular expression.
seq2regexp('ACWTMAN') ans = AC[ATW]T[ACM]A[ACGTRYKMSWBDHVN]
Convert the same nucleotide sequence, but remove ambiguous characters from the regular expression.
seq2regexp('ACWTMAN', 'ambiguous', false) ans = AC[AT]T[AC]A[ACGT]
Version History
Introduced before R2006a
See Also
restrict
| seqwordcount
| regexp
| regexpi