create csv files by extracting data from a starting csv file

1 view (last 30 days)
Hi everyone, I am writing to you as not being familiar with matlab I do not know the commands and I would need your help to create a script.
I have a large csv file that cannot be opened with excel, within this file there are many lines that are composed of both numbers and letters, within these lines there are codes formed by letters and numbers (eg UPRE 14234) that identify the element that interests me and to which they belong, in total there are 21 different UPRE codes.
What I would like is to divide the initial csv file into 21 csv files each composed with the lines concerning only one of those UPRE codes.
How can I do? Can anyone tell me what functions to use and how to write the script?

Accepted Answer

Walter Roberson
Walter Roberson on 30 Oct 2020
I suggest you work with lower level utilities:
  • grep or egrep if you are using Mac or Linux
  • findstr or select-string if you are using MS Windows
These will be much more efficient than the equivalent MATLAB code.
Another option is to make use of the fact that Perl is shipped with MATLAB.
  18 Comments
Walter Roberson
Walter Roberson on 2 Nov 2020
I managed to upgrade my Windows to the latest release and all the other update housekeeping tasks (except I didn't finish defragmenting the drive) . I managed to install Mac OS Catalina and use that to download all of the Mac and Windows install files and install R2020b on Catalina, which gives me the files I need to flip over to Windows to install R2020b. So I have everything ready.
... but by then it was close to 4am so I started playing a game that I had been waiting on for the last year, as it needed Catalina and I had always been too busy answering questions to install Catalina before.
Walter Roberson
Walter Roberson on 3 Nov 2020
The below is tested.
Please note that you will need to add all strings that you want searched for into the UPREs cell array, and you must use exact matches.
The command that is created is quite sensitive to which quotes are used and how many are used. The rules for quoting strings in MS Windows are not well documented, and are quite different from Unix. The rules for handling double-quotes are particularly strange.
This code expects a text file, not a .xlsx file.
infile = 'exp_misure_prelievo.txt';
outdir = 'UPRE_files';
UPREs = {'UPRE_S14SPLO_901', 'UPRE_S14LCRN_901'};
if ~exist(outdir, 'dir'); mkdir(outdir); end
for K = 1 : length(UPREs)
this_UPRE = UPREs{K};
outfile = fullfile(outdir, [this_UPRE '.txt']);
cmd = sprintf('FINDSTR "|"""%s"""|" "%s" > "%s"', this_UPRE, infile, outfile);
[status, result] = system(cmd);
if status ~= 0
fprintf('Problem processing %s, output was:\n', this_UPRE);
fprintf('%s\n', result);
end
end
fprintf('Done\n');
I had to do a lot of work to get to the point of being able to debug this problem. I only boot Windows every few months, and it turned out to be a truly remarkable amount of work to get my Mac to share some files with Windows.

Sign in to comment.

More Answers (2)

Jon
Jon on 30 Oct 2020
Without knowing all of the details of what is in your files it is hard to give a specific answer. However for general advice I would suggest using the matlab function readcell. This will read the entire file into a cell array. You can then work with the data in that cell array relatively easily indexing by rows and columns. Maybe you can do the processing you want on this overall cell array and it isn't even necessary to create many smaller csv files unless you need them for some other task.
  1 Comment
Giuseppe D'Amico
Giuseppe D'Amico on 30 Oct 2020
the first line looks like this: ANNO|"MESE"|"GIORNO_H"|"ID_ELEMENTO"|"TIPO_ELEMENTO"|"VERSIONE"|"DATA_VAL_SAPR"|"COD_IMPIANTO"|"CODICE_UP"|"EEA"|"EUA"|"EEI"|"EUI"|"EEC"|"EUC"
from the second on they are like this:
2017|1|29-GEN-17 19:45:00|"PVP_S14SPLO_901"|"PVP"|1|19-MAG-17 10:59:40|"S14SPLO"|"UPRE_S14SPLO_901"|0|1476|0|60|0|12
UPRE_S14SPLO_901 this is the code that interests me.
I would need the 21 files to be able to rework them later.
I am attaching an image of the starting csv file.

Sign in to comment.


Giuseppe D'Amico
Giuseppe D'Amico on 2 Nov 2020
ok, I'll wait

Categories

Find more on Environment and Settings in Help Center and File Exchange

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!