Matching the first two words

Hi,
I have two cell arrays A and B. A(1)= American Express Corporation B contains cells with the following string "AMERICAN EXPRESS CO" the two companies are the same but they won't match unless I ask to match the first two words (with case insensitive matching). How can I find the cells in B that matches A(1) based on the first two words only? Best, Wesso

 Accepted Answer

A={'American Express Corporation','American1 Express Corporation','American2 Express Corporation'}
B={'AMERICAN EXPRESS CO','AMERICAN3 EXPRESS CO', 'AMERICAN1 EXPRESS CO' }
a=regexp(A,'\s+','split')
b=regexp(B,'\s+','split')
a1=upper(cellfun(@(x) [x{1},x{2}],a,'un',0))
b1=upper(cellfun(@(x) [x{1},x{2}],b,'un',0))
Now you can compare your companies

4 Comments

If you have
A={'abc', 'defg'}
B={'abcd' , 'efg'}
This method will not work
We can improve the code by adding special character like this
A={'American Express Corporation','American1 Express Corporation','American2 Express Corporation'}
B={'AMERICAN EXPRESS CO','AMERICAN3 EXPRESS CO', 'AMERICAN1 EXPRESS CO' }
a=cellfun(@strsplit,A,'un',0)
b=cellfun(@strsplit,B,'un',0)
a1=upper(cellfun(@(x) [x{1},'?' x{2}],a,'un',0))
b1=upper(cellfun(@(x) [x{1},'?' x{2}],b,'un',0))

I am facing another type of problems "Bay Banks," and "BAY BANKS" didn't match because they led to: 'BAYBANKS,' vs 'BayBanks' .How can get rid of commas and dots that come after a word?

regexp(str, '$[A-Za-z]+', 'match')
will return only the beginning of a string up to the first non-letter.
However, rather than making the string matching algorithm more and more complicated, I would rather focus my efforts in making sure the inputs were properly formatted in the first place.

Sign in to comment.

More Answers (0)

Categories

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!