Strip comments from code read from file identifier

6 views (last 30 days)
I need to write code which will strip comments from code read from file identifier. The problem is that I have to do it for all kind of comments (so for comments from different programming languages in. example: /*, %, // etc.), which are not only at the beginning of the line but also in the middle of it and at the end of it. What is more it needs to be written using only basic commands.
This is the code I have, which can only strip comments written after '//', and which are at the begginig of the line.
fid = fopen(fullfile(path,file), 'r');
file_content = textscan(fid,'%s','delimiter','\n');
fclose(fid);
fileID = fopen(fullfile(path,'result.txt'),'w');
text = file_content{1};
for i = 1 : size(text)
if size(strfind(text{i},'//')) == 0
fprintf(fileID,'%s\n',text{i});
end
end
fclose(fileID);
What needs to be changed?
I would greatly appreciate any kind of help

Answers (1)

Guillaume
Guillaume on 19 Nov 2019
The only way to do this is to write a parser that can parse whichever language you're planning to support. At the very least, your parser must be able to understand start and end of strings (according to the rule of whichever language).
Doing this with just "basic commands" (whatever that means) is not possible.
Just using the matlab language consider the following:
str = ["%this is not a comment", ...this is a comment
'...neither is this...', ...this is a comment
sprintf('%d no comment', 123)]; %this is a comment
If you're planning to support C or C++ it's even more complicated, depending on compiler options and the version of C++,
// An innocuous comment ??/
runme();
the runme() line is or isn't a comment. it isn't in C++17, it is in earlier versions with trigraph support turned on (which may be the default on some compilers).
  1 Comment
Steven Lord
Steven Lord on 19 Nov 2019
Don't forget block comments in MATLAB code. If you run this in a session of MATLAB where none of A, B, C, D, E, and F are already defined only A and F will be created. The lines defining B through E would be executed were they not between block comment operators. This means it's not sufficient to look at each line in isolation.
A = [1 2 3];
%{
B = magic(4);
C = 1:100;
D = plot(1:10);
E = exp(1);
%}
F = [4 5 6];

Sign in to comment.

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!