how to remove punctuation from Arabic text file

5 views (last 30 days)
Hello,I have a Arabic string and want to discard all punctuations. I want to keep only text and white space between words.For example this is my string: str='سلام. دوست خوب من!'. can I change codes below to do it?
str= fileread('D:/docc111.txt');
str1 = regexprep(str,'\s+',' ');%replace enter with white space
%or str1 = regexprep(str,'[\n\r]+',' ')
%str1 = 'Hello, I need 1 MATLAB code to discard all punctuation, and signs from 9 text files.'
Lstr1=length(str1);
str_space='\s'; %String of characters
str_caps='[A-Z]';
str_ch='[a-z]';
str_nums='[0-9]';
ind_space=regexp(str1,str_space);%Match regular expression
ind_caps=regexp(str1,str_caps);
ind_chrs=regexp(str1,str_ch);
ind_nums=regexp(str1,str_nums);
mask=[ind_space ind_caps ind_chrs ind_nums];
num_str2=1:1:Lstr1;
num_str2(mask)=[];
str3=str1;
str3(num_str2)=[];
chars = [str3];
%insert space after first index and after last index in chars
charsWithWhitespace = [' ', chars(1:end), ' '];
newTest = sprintf(strrep(charsWithWhitespace, '\n', ' '));
fid = fopen('myySE1.txt','w');
fprintf(fid, '%s',charsWithWhitespace);
fclose(fid);

Answers (1)

Walter Roberson
Walter Roberson on 31 Jul 2016

Categories

Find more on Cell Arrays in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!