- preallocate and asign into the array instead
- size x() based on the length of the string, not hardcode the loop count
I want to convert a character series into numerical series using for loop
2 views (last 30 days)
Show older comments
I have a character sequence stored in variable DNA_SEQS = 'AGGTAT.....'. The sequence consists of four type of character 'A', 'C', 'T' & 'G', therefore I have used swith case to generate the numerical sequence. The code I have written is:
seqs = fastaread('AF0071891.fasta');
DNA_SEQS = seqs.Sequence;
len = length(DNA_SEQS);
for j = 1:5
x = [];
a = DNA_SEQS(j);
switch a
case 'A'
v = 0;
case 'C'
v = 1;
case 'G'
v = 2;
case 'T'
v = 3;
end
x(j+1) = [x(j) v];
end
By using this code I supposed to get a numerical array like [0,2,2,3,0] but I got an error as: Index exceeds matrix dimensions.
Please help
0 Comments
Accepted Answer
dpb
on 8 Jun 2022
Edited: dpb
on 8 Jun 2022
for j = 1:5
x = [];
a = DNA_SEQS(j);
...
You wipe out what you put in x later every time you start through the loop again...don't do that!!! :)
x = [];
for j = 1:5
a = DNA_SEQS(j);
...
instead, although you should
N=strlength(DNA_SEQS);
x=zeros(1,N);
for j = 1:N
a = DNA_SEQS(j);
...
However, in MATLAB you don't need a loop; use a lookup table instead. One way (not necessarily the fastest, but pretty easy to code) would be
DNA_VALS=interp1(double('ACGT'),0:3,double(DNA_SEQS));
This would return for your sample above...
>> DNA_SEQS = 'AGGTAT';
DNA_VALS=interp1(double('ACGT'),0:3,double(DNA_SEQS))
DNA_VALS =
0 2 2 3 0 3
>>
More Answers (1)
DGM
on 8 Jun 2022
You can use ismember():
thisstr = 'AGGATATC';
charmap = 'ACGT';
[~,idx] = ismember(thisstr,charmap);
idx = idx-1
4 Comments
dpb
on 8 Jun 2022
For exactly the reason I outlined above as a possibility -- it isn't a char() array --
>> DNA_SEQS='AGGTAT'; % assign as char() string (and array of char())
>> N=strlength(DNA_SEQS) % strlength() is same as length(x,2) here...
ans =
6
>> for i=1:N,disp(DNA_SEQS(i));end % works find for a char() array with () addressing
A
G
G
T
A
T
>> DNA_SEQS = cellstr('AGGTAT'); % redefine as a cellstr() instead...
>> N=strlength(DNA_SEQS) % strlength knows about what is in the cell
N =
6
>> for i=1:N,disp(DNA_SEQS(i));end % but it fails as you see...
{'AGGTAT'}
Index exceeds the number of array elements (1).
>>
WHY!!!???
>> size(DNA_SEQS) % because now the cellstr is a 1x1 CELL array, NOT 1x6 char() array...
ans =
1 1
>>
How to make work???
"Use the curlies, Luke!!!"
>> for i=1:N,disp(DNA_SEQS{1}(i));end
A
G
G
T
A
T
>>
NB: above the use of {1} to "dereference" the cell array back to the content of the char() array inside it -- the subsequent "smooth" parenstheses (i) then picks the ith element from that vector again, just as it did directly when it was "only" a char() array, not a char() array in a cell.
Strings behave similarly as cellstr(); you have to use {} (the "curlies") to reference inside the string to the individual characters that make up the string array element.
See Also
Categories
Find more on Cell Arrays in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!