Main Content

Create Word Cloud from String Arrays

This example shows how to create a word cloud from plain text by reading it into a string array, preprocessing it, and passing it to the wordcloud function. If you have Text Analytics Toolbox™ installed, then you can create word clouds directly from string arrays. For more information, see wordcloud (Text Analytics Toolbox) (Text Analytics Toolbox).

Read the text from Shakespeare's Sonnets with the fileread function.

sonnets = fileread('sonnets.txt');
sonnets(1:135)
ans = 
    'THE SONNETS
     
     by William Shakespeare
     
     
     
     
       I
     
       From fairest creatures we desire increase,
       That thereby beauty's rose might never die,'

Convert the text to a string using the string function. Then, split it on newline characters using the splitlines function.

sonnets = string(sonnets);
sonnets = splitlines(sonnets);
sonnets(10:14)
ans = 5x1 string
    "  From fairest creatures we desire increase,"
    "  That thereby beauty's rose might never die,"
    "  But as the riper should by time decease,"
    "  His tender heir might bear his memory:"
    "  But thou, contracted to thine own bright eyes,"

Replace some punctuation characters with spaces.

p = ["." "?" "!" "," ";" ":"];
sonnets = replace(sonnets,p," ");
sonnets(10:14)
ans = 5x1 string
    "  From fairest creatures we desire increase "
    "  That thereby beauty's rose might never die "
    "  But as the riper should by time decease "
    "  His tender heir might bear his memory "
    "  But thou  contracted to thine own bright eyes "

Split sonnets into a string array whose elements contain individual words. To do this, join all the string elements into a 1-by-1 string and then split on the space characters.

sonnets = join(sonnets);
sonnets = split(sonnets);
sonnets(7:12)
ans = 6x1 string
    "From"
    "fairest"
    "creatures"
    "we"
    "desire"
    "increase"

Remove words with fewer than five characters.

sonnets(strlength(sonnets)<5) = [];

Convert sonnets to a categorical array and then plot using wordcloud. The function plots the unique elements of C with sizes corresponding to their frequency counts.

C = categorical(sonnets);
figure
wordcloud(C);
title("Sonnets Word Cloud")

See Also

|