Extract word matrix and context matrix from output of trainWordEmbedding / word2vec

15 views (last 30 days)
When I use trainWordEmbedding on a set of documents to train a word embedding that I can then use word2vec with, I get an object "emb" as output that I can input into word2vec. Using word2vec I then get, for each word, the vectors that I can then further process.
However, I would like to also receive as output the underlying word matrix and context matrix (as well as the value of the loss of the training). Does anyone know how I can access these data?
  1 Comment
Christopher Creutzig
Christopher Creutzig on 26 Nov 2018
What exactly do you mean by “word matrix” and “context matrix”?
I guess the “context matrix” is what (some) other people call the cooccurrence matrix in the skip-gram model? We do not currently have a way to compute that.

Sign in to comment.

Answers (1)

Jayanti
Jayanti on 14 Feb 2025 at 14:21
Hi Daniel,
By word matrix I assume you want the unique words in the document. When you use “trainWordEmbedding” to train a word embedding model on a set of documents, it returns an object called “emb”. This object includes a property named “Vocabulary”, which contains the unique words from the model, stored as a string vector. You can access these unique words using the following code:
emb = trainWordEmbedding(filename);
words = emb.Vocabulary;
By context matrix I assume you mean cooccurrence matrix. However, I couldn't find specific documentation on accessing a co-occurrence matrix directly through the “trainWordEmbedding” or “word2vec”.
Hope this will be helpful!

Categories

Find more on Text Analytics Toolbox in Help Center and File Exchange

Products


Release

R2017b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!