extract dates from yahoo finance - please help!

I am trying to "scrape" stock data from yahoo finance. Although I am successful in extracting the stock prices, I am having difficulties in extracting the dates. I am using the following command without success:
dateField=regexp(historicalPriceFile, '<td class="yfnc_tabledata1" nowrap align="right">([\d\w-]+)</td>', 'tokens');
Please help!
Thank you in advance,
Chris

7 Comments

Could you give an example line to be matched?
Side note: \w includes digits, so [\d\w-] can be simplified to [\w-]
Thanks! I am going to try that although I am not sure it will work as I think I tried it already. Essentially, I need to capture the dates from something like this:
align="right">193.85</td><td class="yfnc_tabledata1" nowrap align="right">Dec 18, 2012</td><td class="yfnc_tabledata1"
To capture all the prices per share like 193.85 is easy; I use:(\d\\.,). Also, to capture one date like *Dec 18, 2012*is easy as I simply put that into brackets: (Dec 18, 2012) but I need to capture several of them as they correspond to the different prices per share.
Again, below ids what I use to capture all the prices per share, which works beautifully:
numField=regexp(historicalPriceFile, '<td class="yfnc_tabledata1" align="right">([\d\.,]+)</td>', 'tokens'); % extracting the numbers field to a cell array of cells
and this is what I am trying to use to capture the dates without success:
dateField=regexp(historicalPriceFile, '<td class="yfnc_tabledata1" nowrap align="right">([\d\w-]+)</td>', 'tokens'); % extracting the numbers field to a cell array of cells
Thank you again fro your advice.
Chris
The date you shows has spaces and commas, which [\d\w-] does not include. The date does not appear to have any dash. Perhaps [\w\s,]
..I meant to say I use I use:(\d\.,).
I did try [\w-] but it did not work (it did not capture the dates) .. Any advice?? please help!
[\w\s,] works for me.
Your sample date has no '-' in it, so no point having the '-' in the []. Your sample date has a space and comma in it, which are not matched by [\w-]
Thanks Walter for the advice, that might work!! I am going to try. It is because Yahoo Finance used to present dates with a "-" and now they changed format.. IN fact I did try "w" with and without "-" but no success..
Thanks!
it does work! thanks a lot!
chris

Sign in to comment.

Answers (1)

dateField=regexp(historicalPriceFile, '<td class="yfnc_tabledata1" nowrap align="right">([\w\s,]+)</td>', 'tokens');
... as discussed above.

Tags

Asked:

on 24 Dec 2012

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!