How can I find the symbol given the gene id?

8 views (last 30 days)
ID and name conversion is one of the common tasks in Bioinformatics. In this problem, you will write a function symbol=geneidtosymbol(id,filename) that will return the symbol of a gene, given its GeneID. The GeneID to symbol conversion should be looked up from a file named "gene_info.txt". Each line in this file contains tab-delimited information for a gene. The first line of the file specifies what type of information is available in each column. Download and use the file available from http://sacan.biomed.drexel.edu/ftp/bmes201/final.20123/gene_info.txt (which contains the first 100 lines of the file available from: <ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info.gz)>.
If filename is not given, use gene_info.txt. If it is given (may be different than gene_info.txt), use the filename provided as input.
Here is what I have so far:
function out = geneidtosymbol(x)
fid=fopen('gene_info.txt','r'); %open file
if fid<0
fprintf('I am not able to open the pdb file');
out=[];
return;
end
symbol=[];
if ~feof(fid)
line=fgetl(fid);
str2num(line(3:10)) = x;
line=strsplit(line);
symbol=line{3};
end
out = symbol;
  2 Comments
Geoff Hayes
Geoff Hayes on 27 Nov 2014
S - why does your code just read the first line from the file? Don't you get an error with the line str2num(line(3:18)) = x? Please describe what you are attempting with these lines of code.
S
S on 29 Nov 2014
Here is what I have changed so far:
function symbol = geneidtosymbol(x)
fid=fopen('gene_info.txt','r'); %open file
if fid<0
fprintf('I am not able to open the file');
symbol=[];
return;
end
symbol=[];
while ~feof(fid)
line=fgetl(fid);
lookfor(x,line)= id;
line=strsplit(line);
symbol=line{3};
end
I added a while loop to read each line and I am trying to use the lookfor command to search for the input in the given file. The following commands should display the corresponding symbol. However I keep getting an error in line 11 which is
lookfor(x,line)= id;
What am I doing wrong?

Sign in to comment.

Accepted Answer

Geoff Hayes
Geoff Hayes on 29 Nov 2014
S - lookfor is used to search for a keyword in all help entries, not to search for a substring within another string. Your line of code
lookfor(x,line)= id;
is probably generating the error Undefined function or variable 'id'. because you are trying to use the variable id before it has been defined. And even if it were, it is unclear why you are attempting an assignment. What is the intent of this line?
Since you want to find a string within another string, then you should be using strfind as
while ~feof(fid)
% get the next line of the file
line = fgetl(fid);
% does this line contain the gene id?
if strfind(line,x)>0
% split on the empty spaces
line=strsplit(line);
% third element is symbol
symbol=line{3};
% since symbol found, exit
break;
end
end
% close the file
fclose(fid);
Note that once we have found the symbol, since we assume only one per gene id, then we break out of the while loop and close the file.
Make sure you adjust your code to handle an input for a different data file as per the instruction If filename is not given, use gene_info.txt. If it is given (may be different than gene_info.txt), use the filename provided as input. So you will need to add the input parameter filename.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!