MATLAB Answers

How to Read and plot CSV file and delete infinity values from CSV file

17 views (last 30 days)
Khalil
Khalil on 20 Feb 2020
Edited: per isakson on 1 Mar 2020
Hi
I want to read csv file and later plot it by using some formula.
I used csvread, but it doesn’t work as the file contains text (column title) and infinity values which MATLAB read as ∞.
I tried with readtable it can read (c=readtable('sensor.csv');. But again I can’t delete infinity values from the file. When I try with c(isnan(c))=0; or c(~any(isinf(c))) = 0; it always give error
“Undefined function 'isinf' or ‘isnan’ for input arguments of type 'table'”
Can anyone help how to read large csv file (xlsread doesn’t work, file size>1gb) and delete the infinity values which MATLAB read ∞.
Small section of large file is attached sensor.csv
Using Matlab R2017b

Accepted Answer

per isakson
per isakson on 1 Mar 2020
Edited: per isakson on 1 Mar 2020
Matlab provide many ways to read your csv-files. Here are two variants based on textscan, Read formatted data from text file or string .
sensor.csv contains "???", which I assume is your "infinity value". AFAIK, there is no way to make textscan() or any other reading function directly convert "???" to the numerical value, Inf.
In the script, R1, textscan() converts "???" to NaN and in a second step replaces NaN by Inf. That's fine if the file doesn't have missing values, since they also would be converted to Inf.
In the script, R2, the entire file is read to a character array, chr. Next all '???' are replaced by 'Inf' and finally textscan() parses chr. R2 requires more memory to store chr.
%% R1
fid = fopen( 'sensor.csv', 'r' );
cac = textscan( fid, '%f%f%f%f%f' ...
, 'HeaderLines',3, 'CollectOutput',true ...
, 'Delimiter',',', 'TreatAsEmpty','???' );
[~] = fclose( fid );
num = cac{1};
% num(isnan(num)) = inf; % why not just keep the NaNs
num(isnan(num)) = 0; % or replace NaNs by zero
display( num )
%% R2
chr = fileread( 'sensor.csv' );
% chr = strrep( chr, '???', 'inf' );
chr = strrep( chr, '???', '0' ); % or replace '???' by '0'
cac = textscan( chr, '%f%f%f%f%f' ...
, 'HeaderLines',3, 'CollectOutput',true ...
, 'Delimiter',',' );
num = cac{1};
display( num )
Both outputs
num =
0 Inf 1.8947 0.19107 -0.015871
4e-06 1.9911 1.9105 0.19107 0
8e-06 1.9911 1.8947 0.19107 0
1.2e-05 1.9751 1.9105 0.19107 0
1.6e-05 1.9911 1.9421 0.17514 0
2e-05 1.9751 1.9262 0.17514 0
2.4e-05 1.9911 1.8947 0.17514 0
2.8e-05 1.9911 1.8947 0.17514 0
3.2e-05 Inf 1.9105 0.19107 0.015871
3.6e-05 1.9911 1.9262 0.19107 0
4e-05 1.9911 1.9105 0.19107 0
4.4e-05 1.9911 1.9262 0.19107 0
4.8e-05 1.9911 1.8947 0.19107 0
>>
That's before I realised that you don't want Inf in your matrix.

  0 Comments

Sign in to comment.

More Answers (3)

Bhaskar R
Bhaskar R on 20 Feb 2020
opt = detectImportOptions('sensor.csv', 'MissingRule', 'fill', 'NumHeaderLines', 2);
T = readtable('sensor.csv', op);
header = {'Time', 'Sensor_A', 'Sensor_B', 'sensor_C', 'Sensor_D'};
T.Properties.VariableNames = header;

  1 Comment

Khalil
Khalil on 20 Feb 2020
first line giving me error message
"Error using detectImportOptions
'MissingRule' is not a recognized parameter. For a list of valid name-value pair arguments, see the documentation for
detectImportOptions."

Sign in to comment.


the cyclist
the cyclist on 20 Feb 2020
Edited: the cyclist on 20 Feb 2020
I find that sometimes with these finicky imports it can be helpful to use the Import Data Tool.
I used that as a basis to make the following import script:
% If dataLines is not specified, define defaults
if nargin < 2
dataLines = [4, Inf];
end
% Set up the Import Options and import the data
opts = delimitedTextImportOptions("NumVariables", 5);
% Specify range and delimiter
opts.DataLines = dataLines;
opts.Delimiter = ",";
% Specify column names and types
opts.PreserveVariableNames = true;
opts.VariableNames = ["Time", "Sensor A", "Sensor B", "sensor C", "Sensor D"];
opts.VariableTypes = ["double", "double", "double", "double", "double"];
% Specify file level properties
opts.ExtraColumnsRule = "ignore";
opts.EmptyLineRule = "read";
% Import the data
sensorDataTable = readtable(filename, opts);
% Convert to numeric
sensorDataArray = table2array(sensorDataTable);

  2 Comments

the cyclist
the cyclist on 20 Feb 2020
That script will put NaN where the input file has ????.
You could then do
sensorData(isnan(sensorData)) = Inf;
to convert to Infinity.
Walter Roberson
Walter Roberson on 25 Feb 2020
Putting an explicit function header on it:
function sensorDataArray = ReadSensorTable(filename, dataLines)
% If dataLines is not specified, define defaults
if nargin < 2
dataLines = [4, Inf];
end
% Set up the Import Options and import the data
opts = delimitedTextImportOptions("NumVariables", 5);
% Specify range and delimiter
opts.DataLines = dataLines;
opts.Delimiter = ",";
% Specify column names and types
opts.PreserveVariableNames = true;
opts.VariableNames = ["Time", "Sensor A", "Sensor B", "sensor C", "Sensor D"];
opts.VariableTypes = ["double", "double", "double", "double", "double"];
% Specify file level properties
opts.ExtraColumnsRule = "ignore";
opts.EmptyLineRule = "read";
% Import the data
sensorDataTable = readtable(filename, opts);
% Convert to numeric
sensorDataArray = table2array(sensorDataTable);
end

Sign in to comment.


Walter Roberson
Walter Roberson on 21 Feb 2020
https://www.mathworks.com/help/matlab/ref/rmmissing.html can be used since R2016b. That is, you use readtable() and let the ??? be replaced by nan; after that rmmissing will remove the rows that have any nan, which seems to be what you are asking for.
There is also fillmissing() which tries to deduce what reasonable values might be in place of each nan.

  4 Comments

Show 1 older comment
Walter Roberson
Walter Roberson on 25 Feb 2020
The techniques with import options showed by The Cyclist should translate the ??? to nan. You would rmmissing after that.
Khalil
Khalil on 25 Feb 2020
it gives this error "You can only call nargin/nargout from within a MATLAB function. "

Sign in to comment.

Sign in to answer this question.