Reading mathematical formula's in pdf with matlab is inconsistent, how to generalize this?
    10 views (last 30 days)
  
       Show older comments
    
Hi all,
I'm trying to extract certain pieces of text (the 4.50% and the 22.50% in picture 1) from a pdf file with matlab. To do so I use the pdfRead function. To get the text as generic as possible I remove enters, double spaces, tabs and indents and make all text uppercase. In reading the file, I run into the following problem:
- some text in the file seems to be in math mode (see picture 1 and pay special attention to the two cases of "Notional Amount") : 
- It turns out this math mode is not consistent when reading it with pdfRead (see picture 2 and pay special attention to the two cases of "Notional Amount" (For readability I chose to show the file before removing enters, double spaces etc. however the problem is the same)).
- The spaces within the word "notional amount" here are in a different spot for every pdf file, this results in the fact that I cannot use 1 matlab code for multiple pdf files (I do need that).

- Besides this when copy pasting the part into my command window it appears different than it appears in the text (see picture 3)

My question consists of multiple questions:
- Why doesn't the text appear as text and how can I make it appear as text?
- How can I make this part generic such that I can read multiple pdf files with the same code?
Solutions I tried:
- Removing all spaces
- Saving it as a txt file and try to change font (the formula part didn't change)
- Use Python to try to adjust the file
Thanks in advance! 
0 Comments
Answers (1)
  Pranav Verma
    
 on 20 May 2021
        
      Edited: Pranav Verma
    
 on 21 May 2021
  
      Hi Milan,
The function you mentioned that you are using : pdfRead, does not seem to be present with the official MATLAB software. However I see a similar function in one of the MATLAB File Exchange submissions: "Read text from a PDF document". 
"Read text from a PDF document" is one of the several submissions in MATLAB File Exchange on MATLAB Central which is a forum for our product users to interact, exchange information and knowledge, without MathWorks' involvement. Feel free to contact the author of this submission directly for specific questions about the implementation.
1 Comment
  Walter Roberson
      
      
 on 21 May 2021
				See Also
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!

