getting the nth term out of a sequence

Question

SANGBIN LEE on 29 Feb 2024

0
Link

Direct link to this question

https://au.mathworks.com/matlabcentral/answers/2088736-getting-the-nth-term-out-of-a-sequence

Edited: John D'Errico on 29 Feb 2024

% Define the input and output file names
inputFileName = 'KIF11.txt';
outputFileName = 'CDS.txt';
% Read the sequence from the input file
fid = fopen(inputFileName, 'r');
sequence = fscanf(fid, '%c');
fclose(fid);
% Define the start and end positions of the CDS
cdsStart = 155;
cdsEnd = 3358;
% Extract the CDS from the sequence
cdsSequence = sequence(cdsStart:cdsEnd);
% Write the CDS sequence to a new file
fid = fopen(outputFileName, 'w');
fprintf(fid, '%s', cdsSequence);
fclose(fid);

I have the code above which is supposed to pull out the 155th term to the 3358th term in the text file that I have. For some reason when I run the code, it shows me the 153rd term to the 3356th term. Is something wrong with the code?

3 Comments
Show 1 older commentHide 1 older comment

SANGBIN LEE on 29 Feb 2024

KIF11.txt

thank you

Walter Roberson on 29 Feb 2024

Open in MATLAB Online

sequence = fscanf(fid, '%c');

beware: the character codes returned in sequence will include any end-of-line characters that might be there (possibly carriage return and line feed). Linear indexing into that is a bit uncertain because of the uncertainty over whether carriage returns are present or not.

Sign in to comment.

Sign in to answer this question.

Answer 1

Dyuman Joshi on 29 Feb 2024

1
Link

Direct link to this answer

https://au.mathworks.com/matlabcentral/answers/2088736-getting-the-nth-term-out-of-a-sequence#answer_1419076

Edited: Dyuman Joshi on 29 Feb 2024

Open in MATLAB Online

KIF11.txt

As @Walter has warned, a carriage return character (\r) is being read along with the data -

% Define the input and output file names
inputFileName = 'KIF11.txt';
outputFileName = 'CDS.txt';
% Read the sequence from the input file
fid = fopen(inputFileName, 'r');
sequence = fscanf(fid, '%c');
fclose(fid);
size(sequence)
ans = 1×2
           1        3736
%Expected - last character of the 1st line and first character of the 2nd line
%Output is not according to that
y = sequence(70:71)
y = 
    'T
     '
double(y)
ans = 1×2
    84    13

Alternatively, you can use textscan here -

Fid = fopen(inputFileName, 'r');
out = textscan(Fid, '%c')
out = 1×1 cell array
    {3682×1 char}
seq = out{1};
y = seq(70:71)
y = 2×1 char array
    'T'
    'G'

% Define the start and end positions of the CDS
cdsStart = 155;
cdsEnd = 3358;
% Extract the CDS from the sequence
cdsSequence = sequence(cdsStart:cdsEnd);
% Write the CDS sequence to a new file
fid = fopen(outputFileName, 'w');
fprintf(fid, '%s', cdsSequence);
fclose(fid);

1 Comment
Show -1 older commentsHide -1 older comments

John D'Errico on 29 Feb 2024

Edited: John D'Errico on 29 Feb 2024

Open in MATLAB Online

+1. I was going to point this out:

find(~ismember(sequence,'CAGT'))
ans =
Columns 1 through 8
71         142         213         284         355         426         497         568
Columns 9 through 16
639         710         781         852         923         994        1065        1136
Columns 17 through 24
1207        1278        1349        1420        1491        1562        1633        1704
Columns 25 through 32
1775        1846        1917        1988        2059        2130        2201        2272
Columns 33 through 40
2343        2414        2485        2556        2627        2698        2769        2840
Columns 41 through 48
2911        2982        3053        3124        3195        3266        3337        3408
Columns 49 through 54
3479        3550        3621        3692        3735        3736

So there are two invisible characters in there before 155. They fell where carriage return characters will lie. That explains why it looks like the sequence was read by exactly 2 characters off.

So by deleting those elements first, then an index into the repaired string would work.

Sign in to comment.

getting the nth term out of a sequence

3 Comments
Show 1 older commentHide 1 older comment

Answers (1)

1 Comment
Show -1 older commentsHide -1 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

getting the nth term out of a sequence

3 Comments Show 1 older commentHide 1 older comment

Answers (1)

1 Comment Show -1 older commentsHide -1 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

3 Comments
Show 1 older commentHide 1 older comment

1 Comment
Show -1 older commentsHide -1 older comments