Histogram of letters of a text

15 views (last 30 days)
Hi guys,
I want to create a histogram that shows how often a letter was used in a text but I have no idea how to count the letters or how to plot the histogram so i can see each letter at the x axis.
Does anyone have an idea how i can do it?
  1 Comment
Rik
Rik on 2 May 2020
You might also be interested in the Text Analytics Toolbox.

Sign in to comment.

Accepted Answer

Walter Roberson
Walter Roberson on 2 May 2020
Compare the current input character against the first possible letter that you want to count. If you get a match, increment the counter associated with that letter. Otherwise compare against the second possible letter, and if there is a match, increment the counter associated with that letter. And so on. Eventually move on to the next input character.
OR
Compare all of the input characters against the first letter you want to count. Set the counter associated with that letter to the number of matches you got; do the same thing for the second letter you want to count, and so on.
Hint: you can create a vector of the letters you want to count, and do the counting in a loop.
  2 Comments
Julozert
Julozert on 2 May 2020
So i have found a way to create my histogram but is there a way to compare the letter count of 2 texts in one histogram?
Lets say I have counted the letter 'A' 1000 times in text1 and 2000 times in text 'B' and I want them in the same figure next to each other to compare how could I do that? I tried using hold on but it kinda looked weird and the bars were stacking at each other
Walter Roberson
Walter Roberson on 2 May 2020
You can use bar() with the 'grouped' option.
Use one column (important that it be column!) in Y for each bar-in-a-group that you want drawn.

Sign in to comment.

More Answers (1)

Rik
Rik on 2 May 2020
Once you have the text in a Matlab array it is stored as numbers, so you can use the normal tools.
lorem='Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.';
letters=unique(lorem);
letter_counts=histcounts(double(lorem),max(letters)-min(letters)+1);
letter_counts(letter_counts==0)=[];
bar(1:numel(letters),letter_counts)
set(gca,'XTick',1:numel(letters))
set(gca,'XTickLabels',num2cell(letters))

Categories

Find more on Data Distribution Plots in Help Center and File Exchange

Products


Release

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!