text analytics toolbox and MeCab
2 views (last 30 days)
Show older comments
Shuichi Obuchi on 25 Dec 2019
Commented: Shuichi Obuchi on 10 Mar 2020
I would like to add some words into MeCab dictionary that I suppose it is used behind Matlab textanalytics toolbox.
The tokenized procedure makes some words too short.
If you have any idea to solve my problem, it will be appricated.
Christopher Creutzig on 9 Mar 2020
Text Analytics Toolbox does not ship the tooling to compile an extended MeCab dictionary. But if you have one for your field (I know there are such compiled dictionaries for medical purposes, for example), you can use mecabOptions to have tokenizedDocument use it.
Alternatively, if you only have a handful of words you want to preserve, and are not worried about inflections, you can use "CustomTokens" to pass them to the tokenizer:
5 tokens: 日本 睡眠 学会 の ガイドライン
3 tokens: 日本睡眠学会 の ガイドライン
Find more on Characters and Strings in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!Start Hunting!