Order of files pulled from Datastore

7 views (last 30 days)
I have created a datastore with around 1000 csv files, labeled filename_1, filename_2,...filename_1000. When I try and read the data from the datastore into a new table though, it reads in a weird order:
How can I get it to read the files in the typical 1,2,3,etc order?
Here is the rest of the code for reference:
Thanks! Austin

Accepted Answer

Walter Roberson
Walter Roberson on 5 Oct 2023
Edited: Walter Roberson on 5 Oct 2023
datastore are processed in the order listed in the Files property.
When you datastore() passing in a wildcard name or one or more directory names, the order that the Files property will be populated is not defined
  1 Comment
Walter Roberson
Walter Roberson on 5 Oct 2023
You have a few possibilities:
  1. Somehow construct an explicit list of files in the order you want, and datastore() that list instead of passing in a directory or wildcard; or
  2. after the original datastore is constructed, extract the Files property, do something to get the list sorted in the order you want, and set the results back as the Files property; or
  3. change your expectations that there is a "wrong" order to process the files in.
The File Exchange contribution natsortfiles might help you with sorting.
datastore() should not be expected to guess that you want the files to be processed in some particular order.
For example if you pass a files extension list to datastore() then should the order be "process all directories in the order given, looking for the first file extension, then process all of the directories again in the order given, looking for the second file extension"? Or should it be "process each directory in order; within each directory, process all files for the first file extension, then all files for the second file extension" ? Or should it be "process each directory in order; for any particular file "base" name, look for the base name with each of the given file extensions in order passed"? Or should it be "process each directory in order; for any particular name, if the file extension matches any of the passed file extensions, add it to the list" ?
If nested directories are provided, then should the complete parent folder be processed without descending into any subfolders, then descend each subfolder in order?" Or should each subfolder be processed as it is encountered alphabetically? Or should subfolders of a folder all be processed before the parent folder is processed?
If order is important, then use whatever facilities are needed to create an ordered list of files and pass the ordered list to datastore()

Sign in to comment.

More Answers (1)

dpb
dpb on 5 Oct 2023
"...around 1000 csv files, labeled filename_1, filename_2,...filename_1000"
It's sorted in ASCII order; hence filenames beginning with 0 come first, then numbers beginning with 1, etc., .... You should have used
N=1000;
fnames=compose('filename_%04d.csv',0:N).';
fnames([1:5 end-4:end])
ans = 10×1 cell array
{'filename_0000.csv'} {'filename_0001.csv'} {'filename_0002.csv'} {'filename_0003.csv'} {'filename_0004.csv'} {'filename_0996.csv'} {'filename_0997.csv'} {'filename_0998.csv'} {'filename_0999.csv'} {'filename_1000.csv'}
Or, there is <FileExchange sort_nat> which will make up for the original oversight... :)

Categories

Find more on File Operations in Help Center and File Exchange

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!