Main Content


Combine multiple bag-of-words or bag-of-n-grams models



newBag = join(bag) combines the elements in the array bag by merging the frequency counts. The function combines the elements along the first dimension not equal to 1.

newBag = join(bag,dim) combines the elements in the array bag along the dimension dim.


collapse all

Create an array of two bags-of-words models from tokenized documents.

str = [ ...
    "an example of a short sentence"
    "a second short sentence"];
documents = tokenizedDocument(str);
bag(1) = bagOfWords(documents(1));
bag(2) = bagOfWords(documents(2))
bag=1×2 object
  1x2 bagOfWords array with properties:


Combine the bag-of-words models using join.

bag = join(bag)
bag = 
  bagOfWords with properties:

          Counts: [2x7 double]
      Vocabulary: ["an"    "example"    "of"    "a"    "short"    ...    ]
        NumWords: 7
    NumDocuments: 2

If your text data is contained in multiple files in a folder, then you can import the text data and create a bag-of-words model in parallel using parfor. If you have Parallel Computing Toolbox™ installed, then the parfor loop runs in parallel, otherwise, it runs in serial. Use join to combine an array of bag-of-words models into one model.

Create a bag-of-words model from a collection of files. The examples sonnets have file names "exampleSonnetN.txt", where N is the number of the sonnet. Get a list of the files and their locations using dir.

fileLocation = fullfile(matlabroot,'examples','textanalytics','data','exampleSonnet*.txt');
fileInfo = dir(fileLocation);

Initialize an empty bag-of-words model and then loop over the files and create an array of bag-of-words models.

bag = bagOfWords;

numFiles = numel(fileInfo);
parfor i = 1:numFiles
    f = fileInfo(i);
    filename = fullfile(f.folder,;
    textData = extractFileText(filename);
    document = tokenizedDocument(textData);
    bag(i) = bagOfWords(document);
Starting parallel pool (parpool) using the 'Processes' profile ...
Connected to the parallel pool (number of workers: 4).

Combine the bag-of-words models using join.

bag = join(bag)
bag = 
  bagOfWords with properties:

          Counts: [4x276 double]
      Vocabulary: ["From"    "fairest"    "creatures"    "we"    ...    ]
        NumWords: 276
    NumDocuments: 4

Input Arguments

collapse all

Array of bag-of-words or bag-of-n-grams models, specified as a bagOfWords array or a bagOfNgrams array. If bag is a bagOfNgrams array, then each element to be joined must have the same value for the NgramLengths property.

Dimension along which to join models, specified as a positive integer. If dim is not specified, then the default is the first dimension with a size that does not equal 1.

Output Arguments

collapse all

Output model, returned as a bagOfWords object or a bagOfNgrams object. The type of newBag is the same as the type of bag. newBag has the same data type as the input model and has a size of 1 along the dimension being joined.

Version History

Introduced in R2018a