Text Analytics Toolbox

Analyze and model text data

Text Analytics Toolbox provides algorithms and visualizations for preprocessing, analyzing, and modeling text data. Models created with the toolbox can be used in applications such as sentiment analysis, predictive maintenance, and topic modeling.

Text Analytics Toolbox includes tools for processing raw text from sources such as equipment logs, news feeds, surveys, operator reports, and social media. You can extract text from popular file formats, preprocess raw text, extract individual words, convert text into numerical representations, and build statistical models.

Using machine learning techniques such as LSA, LDA, and word embeddings, you can find clusters and create features from high-dimensional text data sets. Features created with Text Analytics Toolbox can be combined with features from other data sources to build machine learning models that take advantage of textual, numeric, and other types of data.

MATLAB code that extracts text data from Microsoft Word documents into a datastore.

Import and Visualize Text

Import text data into MATLAB from single files or large collections of files, including PDF, HTML, and Microsoft^® Word files. Visually explore text data sets using word clouds and text scatter plots.

Extract Text Data from PDF, HTML, Microsoft Word, Microsoft Excel, and CSV Files

Documentation | Examples

Screenshot of the Preprocess Text Data Live Editor task with results displayed as a word cloud.

Clean and Preprocess Text

Apply high-level filtering functions to remove extraneous content, such as URLs, HTML tags, and punctuation. Correct spelling, filter stop words, and normalize words to root form.

Clean and Preprocess Text Data in Live Editor

Documentation | Examples

MATLAB code for creating a scatter plot and the created word embedding t-SNE plot.

Convert Text to Structured Format

Extract linguistic features by using a tokenization algorithm, calculate word frequency statistics to represent text data numerically, and train word embedding models such as word2vec and skip-gram.

Explore and Visualize Word Embeddings

Documentation | Examples

Workflow for performing transfer learning with FinBERT transformer model on text data to identify positive and negative attitudes.

Apply AI to Text Analytics

Fit a machine learning or deep learning model, such as LSA, LDA, and LSTM, to text data. Leverage transformer models, such as BERT, FinBERT, and GPT-2, to perform transfer learning with text data.

Train BERT Document Classifier

Documentation | Examples

Large Language Models

Connect MATLAB to the OpenAI™ Chat Completions API. Leverage the natural language processing capabilities of GPT models within your MATLAB environment, for tasks such as text summarization and chatting.

Large Language Models (LLMs) with MATLAB

Documentation | Examples

Illustration of cleaning text data for natural language processing. On the left: word cloud of raw data. On the right: word cloud of cleaned data.

Text Analytics for Engineers

Develop predictive maintenance schedules based on sensors and text log data. Automate requirement formalization and compliance checking.

Information Retrieval with Work Orders Data

Documentation | Examples

Use text analytics to summarize multiple documents into one document.

Document Analysis

Analyze text with topic modeling to discover and visualize underlying patterns, trends, and complex relationships. Summarize documents, extract keywords, and evaluate document importance and similarity.

Classify Text Data Using Convolutional Neural Network

Documentation | Examples

Word clouds separated into positive and negative words.

Sentiment Analysis

Identify the attitudes and opinions expressed in text data to categorize statements as being positive, neutral, or negative. Build models that can predict sentiment in real time.

Sentiment Analysis in MATLAB

Documentation | Examples