Separating Data into columns based on certain text variable in that data

Question

0 votes

I have a list of daily stock returns for each stock on the S&P 500. The returns are formatted so that each stock is grouped together like this:

1/1/2013 AAPL .005%

1/2/2013 AAPL -.1%

....

1/1/2013 GOOG .5%

1/2/2013 GOOG .25%

How would I sort this so that the each stock can be a column name and have its returns below it?

Thanks!

1 Comment
Show -1 older comments Hide -1 older comments

Bob Thompson on 2 Apr 2018

Is the string setup such that there is a space between the letters and the trailing percentage? If so, you could split the string at the two spaces using one of a couple of different commands (I can't think of what exactly they are off the top of my head, something like strsplit() and regexp()), and then use sorting or indexing based on the middle string which has your specific letter code.

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

dpb on 2 Apr 2018

Open in MATLAB Online

0 votes

>> data=readtable('jack.dat','format','%{MM/dd/yyyy}D %s %f%%')
data = 
     Var1        Var2     Var3 
  __________    ______    _____
  01/01/2013    'AAPL'    0.005
  01/02/2013    'AAPL'     -0.1
  01/01/2013    'GOOG'      0.5
  01/02/2013    'GOOG'     0.25
>> [u,~,ic]=unique(data.Var2);
>> stks=table(data.Var1(ic==1));   % get the dates first; must all be same
>> for i=1:length(u)               % get each individual stock
     stks(:,i+1)=table(data.Var3(ic==i));end
>> stks.Properties.VariableNames=[{'Date'}, u.']  % add useful names
stks = 
     Date       AAPL     GOOG
  __________    _____    ____
  01/01/2013    0.005     0.5
  01/02/2013     -0.1    0.25
>> clear data                      % done with it...

Above does assume that every stock has the identical timestamp; any missing will cause failure.

4 Comments
Show 2 older comments Hide 2 older comments

Ulrik Græsvik on 16 Feb 2022

Edited: dpb on 16 Feb 2022

Could you elaborate more on the preprocessing? I'm struggeling with the same issue with the same three columns (date, ticker and return).

All companies/tickers will have different timestamps as we are looking into IPOs and therefore they will start at different times.

dpb on 16 Feb 2022

Edited: dpb on 16 Feb 2022

Specific code would depend greatly on how you have the data available to start from and whether you know the overall first/last dates a priori or have to determine that from a collection of files as well.

Probably the simplest would be to first create a datetime array of the overall length from first to last and use it to build the date column in the output table. Then, when you read each dataset, convert its date field to datetime and use logical indexing to insert into the proper locations in the table with a missing value indicator elsewhere.

If you would attach a representative input file, I'd at least look at it and see if gives me any further hints -- I'm really busy right now so won't promise I can actually write code at least until weekend...

Sign in to comment.

Separating Data into columns based on certain text variable in that data

1 Comment
Show -1 older comments Hide -1 older comments

Accepted Answer

4 Comments
Show 2 older comments Hide 2 older comments

More Answers (0)

Categories

Tags

Community Treasure Hunt

Separating Data into columns based on certain text variable in that data

1 Comment Show -1 older comments Hide -1 older comments

Accepted Answer

4 Comments Show 2 older comments Hide 2 older comments

More Answers (0)

Categories

Tags

See Also

Community Treasure Hunt

1 Comment
Show -1 older comments Hide -1 older comments

4 Comments
Show 2 older comments Hide 2 older comments