How to write a large .dat file in a loop?

Question

0 votes

Hello,

I am trying to convert a large amount data from one format (64 separate neuralynx .ncs files) into another (a .dat file with all 64 combined, in int16 format) for use with the Kilosort package. Kilosort then uses this huge file in some clever way. The .dat file should be 2D with 64 channels x number of samples per channel (about 3 hours at 20kHz).

Because my data is so large, I am unable to load it all into one matrix and then write it to a file (the total filesize for all 64 inputs is at least 30GB).

So, I have been trying to read each of my input files separately, convert the data type and then write it to a file. In the code below I managed to make it work for a huge .mat file using the matfile function, but I need .dat.

I then tried to do a similar thing using fwrite, where I tried to append each new .ncs readout. All the data gets written to file, but in an enormous 1-D list of numbers, and not in a 64-row array.

I tried to transpose my input in the hope that I would obtain separated rows that way, but that makes no difference.

How can I control how fwrite appends data to my file? Is it at all possible to make a 2D output like this?

Thanks,

Susan

% make .mat file
m = matfile([OutFolder,'\',RatID,'_',RecDate,'.mat'],'Writable',true);
% read .ncs files for all traces and write to .mat and .dat file
for ch = 1:64
    
    %load data
    InFile = [InFolder,'\','CSC',num2str(ch),'.ncs'];
    [~,~,samples] = readEegDataForKilosort(InFile); % gives 1D array of type double 
    int_samples = int16(samples); clear samples;
    
    %write to .mat file (not actually useful)
    m.WholeRec(1:length(int_samples),ch)=int_samples;
    
    %write to .dat file
    if ch == 1; % make file for 1st channel
    fileID = fopen([OutFolder,'\',RatID,'_',RecDate,'.dat'],'w');
    fwrite(fileID,int_samples','uint16');
    fclose(fileID);
    else % append for all next channels
    fileID = fopen([OutFolder,'\',RatID,'_',RecDate,'.dat'],'a');
    fwrite(fileID,int_samples','uint16');
    fclose(fileID);
    end
    clear int_samples
end

4 Comments
Show 2 older comments Hide 2 older comments

Susan Leemburg on 30 Jan 2021

Edited: dpb on 30 Jan 2021

I'm not sure I understand completely, so I apologize if I'm being a bit obtuse (I also accidentally deleted my earlier reply).

Are you saying that instead of trying to write one file, and then another etc, I should rather read e.g. the first 30min of all my channels (or whatever fits in memory), write that to my .dat file, followed by the next section and so on?

My goal is to make a file that I can use as input for Kilosort2 https://github.com/MouseLand/Kilosort. It should be a .dat file in int16, with all data from all channels (Nchannels x Nsamples). Kilosort reads this data in sections, but requires a single input file.

My data is brain activity recorded with a Neuralynx system on 64 channels simultaneously, which is saved as one .ncs file per channel. NCS is a proprietory file format and I use the import function provided by Neuralynx to read it into matlab (https://neuralynx.com/software/category/matlab-netcom-utilities). This gives me my data for each channel as doubles. I don't load all the channels at once, because my PC can't deal with >30GB in memory (the matlab doubles seem to be much larger than the original ncs).

I then reshape the data a bit so that it is simply a list of subsequent datapoints (the original import is in 512-sample long columns). This is what my readEegDataForKilosort function does. After this, I have a long 1-D array with doubles. However, if I save all my data as doubles, the files become very large, so I converted to int16 before saving using matfile. The resulting .mat file doesn't turn out to be very useful, so I don't think I will be doing that going forward.

dpb on 30 Jan 2021

Edited: dpb on 30 Jan 2021

Open in MATLAB Online

I'm trying to figure out for sure what the input file you need actually looks like, specifically.

I didn't find a description of that file format at the link; it's probably there, but isn't clear where that is.

Let's talk something small in size instead...if you had four channels and 3 observations, there would be twelve values. Are these to be arranged as a sequence of three (3) 4-vectors, sequentially in time as

Ch1O1 Ch201 Ch301 Ch401
Ch1O2 Ch202 Ch302 Ch402
Ch1O3 Ch203 Ch303 Ch403

? or as

Ch1O1 Ch102 Ch103
Ch2O1 Ch202 Ch203
Ch3O1 Ch302 Ch303
Ch4O1 Ch402 Ch403

?

In both cases I've introduced phantom records that would not be in a stream file simply to aid in readability.

The first writes each timestep for all chanels, the second writes all timesteps (observations "O") for each channel sequentially.

Or, does the input processor have, by any chance, the ability to tell it which order the data are in?

The second above is what you have written; a stream file will be just a sequence of bytes; to write in the order by timestep/observation you will have to have those data in memory for all channels for each timestep as it is written.

Susan Leemburg on 1 Feb 2021

I've checked with the people from Kilosort, and they told me that I need the data to be intermingled: first sample 1 for all channels, then sample 2... like in the first example.

I also enter my sampling rate and the number of channels in Kilosort, and I'm pretty sure that the individual traces are reconstructed based on those.

I should be able to get the correct kind of output by reading a portion of each channel, build that into a matrix (one channel per column), write the matrix to my .dat file and then repeat and append until I've written all my data, right?

I also think that a lot of my confusion comes from initally misunderstanding how these particular files work. I thought that, just like for e.g. .mat files and text files, the structure I write into the files with fwrite will just come out in the same shape when I read that file back. But that is clearly not totally the case. Not without some extra instructions anyway.

Sign in to comment.

Sign in to answer this question.

Follow Question

How to write a large .dat file in a loop?

4 Comments
Show 2 older comments Hide 2 older comments

Answers (0)

Categories

Products

Release

Tags

Community Treasure Hunt

How to write a large .dat file in a loop?

4 Comments Show 2 older comments Hide 2 older comments

Answers (0)

Categories

Products

Release

Tags

See Also

Community Treasure Hunt

4 Comments
Show 2 older comments Hide 2 older comments