emoji displaying and handling problem

25 views (last 30 days)
Recently i'm working on processing txt files containing some emojis and meet some problems in file reading.
For demosntration, I put some random emojis in test.txt and use fopen and fread to read its contens.
The txt file is saved in UTF-8 and CRLF mode.
i use
fin = fopen('test.txt', 'rb', 'n', 'UTF-8');
content = fscanf(fin, '%c');
fclose(fin);
disp(content);
only some of the emojis displays correctly (as shown below).
i tried change read option from 'rb' to 'r'
or
fin = fopen('test.txt', 'rb', 'n', 'UTF-8');
while ~feof(fin)
content = fgetl(fin);
disp(content);
end
fclose(fin);
or
fin = fopen('test.txt', 'rb', 'n', 'UTF-8');
content = fread(fin, '*char')';
fclose(fin);
disp(content);
They all don't work. So i can't continue writting, save or other operations targeted on 'content'.
fprintf('%s',content);
doesn't show correctly too.
I also tried saving test.txt encoding to UTF-8 with BOM or CL, still no differnece.
Another test
fid = fopen('test.txt','w','n','UTF-8');
cotents = "🆘🅾️🆚✴️🈹☸️✡️♈⛎🈷️🫠📊🌚😈Ⓜ️🎬➡️⬇️";
fprintf(fid,"%s",cotents);
fclose(fid);
fid = fopen('test.txt','r','n','UTF-8');
content = fscanf(fid, '%c');
fclose(fid);
disp(content);
The above code handles writting correctly but fails in reading.
It seems it's not related to display or fprintf or sprinf, the contents is already changed after reading operation.
Grealy appreciated if someone helps.
How can i read and store these characters properly.
Again, thanks.
P.S. Here is my feature('locale') output:
ctype: 'zh_CN.UTF-8'
collate: 'zh_CN.UTF-8'
time: 'zh_CN.UTF-8'
numeric: 'en_US_POSIX.UTF-8'
monetary: 'zh_CN.UTF-8'
messages: 'zh_CN.UTF-8'
encoding: 'UTF-8'
terminalEncoding: 'GBK'
jvmEncoding: 'UTF-8'
status: 'MathWorks locale management system initialized.'
warning: ''

Accepted Answer

Tushar
Tushar on 3 Jun 2023
Hi Nathan,
Please check the following points. They may help in resolving your issue:
  1. The issue you're experiencing with reading emojis from a text file in MATLAB is likely due to the way MATLAB handles Unicode characters during file reading and displaying. By default, MATLAB uses a limited character encoding scheme called ISO-8859-1 (also known as Latin-1), which doesn't support all Unicode characters.
  2. Additionally, make sure that your MATLAB display settings are properly configured to handle Unicode characters. You can set the MATLAB Command Window font to a Unicode-compatible font by going to:
the "Home" tab ==>clicking on "Preferences==>" selecting "Command Window," ==> choosing a suitable font
that supports emojis.
3 Keep in mind that the correct display of emojis also depends on your operating system and the fonts installed on
your computer. If you're still encountering issues, consider using MATLAB's graphical capabilities (e.g., using the
imshow function) or exporting the content to a different format (e.g., saving it as an image or HTML file) for proper
emoji rendering.

More Answers (0)

Categories

Find more on Environment and Settings in Help Center and File Exchange

Products


Release

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!