analyzing assymetric data sets

Dear all,
I have some data on prices for a specific good across time and across countries. The first problem is that the start and end date for each country is different. For example
Austria Belgium
"2/11/08" "07/12/08"
"30/11/08" "04/01/09"
"28/12/08" "01/02/09"
"25/01/09" "01/03/09
"22/02/09" "29/03/09"
The second problem is that the time span for each country is different. For example the data for France are available for 39 periods of 4 weeks(or 28 days) The data for Belgium are available for 36 periods of 4 weeks.
The third problem is that I have jumps which means that in some cases the next date is not always every after 4 weeks. Put differently, the distance that separates apart two successive dates in not always 28 days but in some cases it is 29 , 27 or 34.
Is there anything I can do (any function perhaps?) to solve these 3 problems. If I do not solve these problems I will not be able to use the data set for analysis. Please be as specific as you can
Thanks

10 Comments

You have not said anything about how you intend to analyze the data, so it is difficult for us to make suggestions.
thank you Walter
To be more specific, I have data on prices for 12 good. these prices evolve over time acrosss regions. So for each product category I want to run a regression of prices on some other variabes .
So my regression for one product category will be p_{ij}_t= a +regressors +error term.
where p_{ij}_t is the absolute difference in price between two locations i and j at time t. Also i and j can be locations of the same country or locations at different countries.
My goal is to construct for a specific t a vector with all the pairs of locations . Put differently, I want to see how the price difference evolves over time for each pair of cities. But as i said this is a bit of a challenge as i have the 3 problems I mentioned above.
For a particular time, t, if the price exists in one location, but the price in another location starts after t or ends before t, what would you _like_ to have happen?
The jumps can be taken care of (approximately at least) by using interp1(), so the main thing you need to define is what you want done when data has not yet started or is already finished in another country.
thank you Walter. regarding you main question I do not know what I would like to happen if the price exists in one location, but the price in another location starts after t or ends before t. The reason is that I do not know which alternatives I have available so as to decide which of them to choose. I would be grateful to you if you could let me know about these alternatives or which is the best according to your opinion as I am totally clueless. This step is very important to me. If I do not find an approach the whole project will fail. Thank you again
You can extrapolate from the values that do exist for that one country, or you can return a constant result such as 0 or -inf or NaN at those locations, or you can just not produce a value for those locations, or you could extrapolate based upon the values that exist in that time frame over all of the countries.
Caution: extrapolation usually has quite a wide margin of error !!
So this means that a more symmetric sample so as to have similar start and end dates is the most preferable solutions compared to the ones you suggested. Am I right?
thanks
The most preferable is up to you, depending on your needs.
Generally speaking, in locations where there is no data, you need to refrain from performing a meaningful calculation there, or else you need to calculate new data there based upon existing data. Your situation is one in which there is no reasonable mathematical model to predict the past or future behavior with accuracy (there are too many factors, too much psychology and politics involved in the prices.)
I see your point and thank you for this. I want to ask you something regarding interpolation
Suppose that we have the following table
Country A Country B
'23-11-2008' '23-11-2008'
'28-12-2008' '21-12-2008'
'25-01-2009' 18-01-2009'
'22-02-2009' '15-02-2009'
'29-03-2009' '15-03-2009'
As you can see the start date is the same for both countries. In the second columns we have no jumps. Each next date is obtained by adding 28 days to the previous date.
On the contrary, in the first column we have jumps. Specifically, by adding 35 days to '23-11-2008' we jump to '28-12-2008'. So the interpolation method will tell us approximately what the price on '21-12-2008' would be. In other words, for country A we erase the date '28-12-2008' and replace it by '21-12-2008' on which date we know (via interpolation) the price. Should I do the same for the date '25-01-2009'; That is, should I replace it with 18-01-2009' calculating the price on that date with interpolation?
What I want to say is that the jump from '23-11-2008' to '28-12-2008' distorts all the dates from that time point onwards (compare the first with the second column). So does this mean that I have to apply interpolation for all the dates following '23-11-2008' for country A?
interp1() will interpolate for all dates based on the existing datapoints that you will provide, i.e. it will NOT interpolate the first date and THEN interpolate the others from the previously interpolated value.
Check the graph in the documentation: http://www.mathworks.co.uk/help/matlab/ref/interp1.html
thank you oleg

Sign in to comment.

Answers (0)

Categories

Find more on Language Fundamentals in Help Center and File Exchange

Tags

Asked:

on 4 Jun 2012

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!