is there fast way for this problem

hi, is there a fast way to deal with this problem:
I have 17770 files, each file has structure as:
id_user(integer), rate(integer),date
each file has unique id_user but may share these id_users with other files.
what I want is: each unique id _user be row in array such as: id_user1: rate1 date1 rate3 date3......raten daten id_user2: rate3 date3..rate20 date20........... etc..
I mean rate1 for ex. is rate of user1 in file 1 and rate3 in file 3 and so on. where id_user1 for ex. be in file 1 and file 3 id_user2 be in file 3 and file 20
I wrote code , it is work correctly, but very very slow, it need three hours to creat array with just 50 id_users(read all 17770). I have 480000 id's of users,I have to look for their ratings in 17770 files. I placed my files in folder, then read it one after one. I looked for unique id of users ,then use function (find)in all 17770. Each time looked for id of 30 users as ex. n all files ,then record their ratings in array , but this process take long time, where to accumulate data for 30 users take three hours .
any suggestions, advices may help me
thanks in advance

Answers (1)

are you preallocating? use the profiler to help locate the time where the code takes longest
profile on
% run your code
profile viewer
Run on a subset of your files for testing - then expand to include more files.

9 Comments

THANKS,
i did what suggest , and got this table but i know just the datenum fun., where I used in my code. But what the others functions. it seem that datenum take long time , if so can I use another function instead of datenum?
I got this table:
Function Name Calls Total Time Self Time*
datenum 1507 1178.451 s 0.843 s
timefun\private\dtstr2dtnummx (MEX-file)
1507 1171.309 s 1171.309 s
fullfile
35541 7.279 s 6.530 s
iscellstr
1507 4.767 s 4.767 s
timefun\private\cnv2icudf
1507 1.532 s 1.253 s
filesep
35541 0.455 s 0.455 s
ispc
35541 0.294 s 0.294 s
@(matched,replace)replace.(matched)
1507 0.171 s 0.171 s
...te@(matched,replace)replace.(matched)
1507 0.108 s 0.108 s
timefun\private\dtstr2dtnummx (MEX-file)507 1171.309 s 1171.309 s
see: http://undocumentedmatlab.com/blog/datenum-performance/
thanks,
i downloaded this function from
http://www.mathworks.com/matlabcentral/fileexchange/28093-datestr2num
I used the ex. in that link u sent
dateNum=dtstr2dtnummx({'2010-12-12 12:21:12.123'},'yyyy-MM-dd HH:mm:ss');
I got this error:
??? Undefined function or method 'dtstr2dtnummx' for input
arguments of type 'cell'.
I would like to say that date I used as:
'2005-09-06'
i.e i have not h,m,s
is this fun. work with just date?
hi,
i saw difference between datenum and dtstr2dtnummx in terms of time is just 7 s in total time.
I think this not important difference and do not affect running time of program potentially .
my code take three hours as running time , and save 7 s useless
any other suggestions?
thanks
The other functions will be called by functions you wrote. As well as the total time - look at the functions that called the most - any small saving in that will have a much larger effect.
Is your table above saying that you have 1507 calls to datenum - and that takes 1178 seconds!? That doesn't look right to me.... Or am I misinterpreting the table.
Also check you are pre-allocating any arrays.
yes,
there is 1507 and total time is 1178 s
but, I ran it for just 1000 files
regarding pre-allocate, I can just allocate the no. of rows but I do not guesses no. of columns.
what about dtstr2dtnummx?
why I got the above error when use it?
Pre-allocating the rows only is not good enough - you have to preallocate the full matrix - why cant you pre-allocate the columns?
On the error the function is not found by matlab - can you find it manually?
i can not konow the no. of columns because the code itself will accumulate the the information from all files ,
see
u1 id-mv1 id-mv2.......unknown how many id-mvs
u2 id-mv1 id-mv2 id-mv3............
the id's for each users unknown, also i can not guesses largest no. of id-mv.
regarding dtstr2dtnummx, i got if from the link:
http://www.mathworks.com/matlabcentral/fileexchange/28093-datestr2num, and added it manually.
thanks
either preallocate tobigger than you need and check that it does not grow - or do a preloop to determine size. it will speed things up.

This question is closed.

Tags

Asked:

on 9 Jan 2012

Closed:

on 20 Aug 2021

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!