I am given the following pseudo code for this problem. How can I convert this to Matlab?

3 views (last 30 days)
ChIP-seq is used to identify genomic segments bound by transcription factors. In a ChIP-seq experiment, you obtain the chromosome regions that the transcription factor FoxA is binding. Write a Matlab function locationtogenes(chr,start,finish,genefile) that takes a chromosome name, the start and finish of the chromosome region in base pairs, and a gene-location file; and determines which genes are located in that region. The gene-location file is a tab-delimited text file where each line contains chromosome-start-finish-genename. Find the genes whose position overlaps with the input range. Return these genes as a cell array. If no genes are found in an input range, return an empty cell array. An example gene-location file can be downloaded from http://sacan.biomed.drexel.edu/ftp/bmeprog/genelocs_sample.txt You should not download any files from the web in your code; you can assume that your function will be given a filename that already exists.
>> disp(locationtogenes('chr1',59000000,59247000,'genelocs_sample.txt'))
'JUN'
>> disp(locationtogenes('chr1',50000000,100000000,'genelocs_sample.txt'))
'RP4-784A16.2' 'MRPL37' 'JUN' 'LRRC8C'
This is the pseudo code:
Open file.
Initialize the variable to be returned to an empty cell array.
while it is not the end of file:
Read the next line from file.
If it failed to read the next line, stop the loop.
Find the positions of tab characters.
(a single tab character can be specified by sprintf('\t'))
Using these positions determine the chromosome name,
start and finish locations (use str2double), and gene name.
If the gene name matches the requested gene name and the start-finish
locations are within the requested start-finish range, add this
gene name to the end of the return variable.
Close the file before returning from the function.
Here is my code:
function [ out ] = locationtogenes( chr,start,finish,genefile )
%UNTITLED Summary of this function goes here
% Detailed explanation goes here
f=fopen(genefile,'r'); %open file
out=[];
while ~feof(f)
line = fgetl(f);
locs=find(line==sprintf('\t'));
if numel(locs)<3; continue; end
if strfind(line,chr)>0
chr1= line(locs(1)+1 : locs(2)-1);
if strfind(line,start)>0
start1= line(locs(2)+1 : locs(3)-1);
if strfind(line,finish)>0
finish1= line(locs(3)+1 : locs(4)-1);
if strfind(line,genename)>0
genename= line(locs(4)+1 : locs(5)-1);
break;
if start1==start && finish1==finish
out=genename
break;
end
end
end
end
end
end
end

Answers (0)

Categories

Find more on Genomics and Next Generation Sequencing in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!