Info

This question is closed. Reopen it to edit or answer.

I keep getting an empty cell, how can I get a cell with a value in it for this problem?

1 view (last 30 days)
ChIP-seq is used to identify genomic segments bound by transcription factors. In a ChIP-seq experiment, you obtain the chromosome regions that the transcription factor FoxA is binding. Write a Matlab function locationtogenes(chr,start,finish,genefile) that takes a chromosome name, the start and finish of the chromosome region in base pairs, and a gene-location file; and determines which genes are located in that region. The gene-location file is a tab-delimited text file where each line contains chromosome-start-finish-genename. Find the genes whose position overlaps with the input range. Return these genes as a cell array. If no genes are found in an input range, return an empty cell array. An example gene-location file can be downloaded from http://sacan.biomed.drexel.edu/ftp/bmeprog/genelocs_sample.txt You should not download any files from the web in your code; you can assume that your function will be given a filename that already exists.
>> disp(locationtogenes('chr1',59000000,59247000,'genelocs_sample.txt'))
'JUN'
>> disp(locationtogenes('chr1',50000000,100000000,'genelocs_sample.txt'))
'RP4-784A16.2' 'MRPL37' 'JUN' 'LRRC8C'
Here is my code. I don't know where I am going wrong in this code.
function [ gene ] = locationtogenes( chr,start,finish,genefile )
%UNTITLED Summary of this function goes here
% Detailed explanation goes here
f=fopen(genefile,'r'); %open file
gene=[];
while ~feof(f)
line = fgetl(f);
locs=find(line==sprintf('\t'));
if numel(locs)<3; continue; end
linechr=line(1:locs(1)-1);
linestart=line(locs(1)+1:locs(2)-1);
linefinish=line(locs(2)+1:locs(3)-1);
linegenename=line(locs(3)+1:end);
%linestart=str2double(linestart);
%linefinish=str2double(linefinish);
if strcmpi(chr,linechr)==1 && strcmpi (start, linestart)==1 && strcmpi (finish, linefinish)==1
gene=linegenename;
end
end
end
  1 Comment
Star Strider
Star Strider on 6 Dec 2014
Edited: Star Strider on 6 Dec 2014
With due respect, ‘I keep getting an empty cell’ really doesn’t tell us a lot. What cell is empty, what is it supposed to contain, and what is supposed to fill it?
What is ‘genefile’? Did fopen return a value for ‘f’ >= 3?
What is in your ‘line’ variable?
What does your ‘locs’ variable contain (or supposed to contain)?
[A minor observation: line is a MATLAB plotting function. It’s best to not name your variables the same as built-in function names, since that can cause unpleasant surprises.]
Lastly, is there any documentation on the gene location files? It might help if we had access to them to see what they contain and how their authors best suggest using them.

Answers (1)

Geoff Hayes
Geoff Hayes on 6 Dec 2014
Have you stepped through the code to see what is wrong? I think that you are supposed to check whether the start and finish of the gene is within or overlaps with the input range. If the former, then the condition would be something like
str2num(linestart)>=start && str2num(linefinish)<=finish
In your case, you are comparing strings with numbers which will never satisfy to true, so that is why you are getting an empty cell array.
Also, consider using strsplit To break apart your line using the tab character as the delimiter.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!