MATLAB Answers

Need help transforming data from a table

4 views (last 30 days)
Zac Lingen
Zac Lingen on 11 Apr 2019
Answered: Peter Perkins on 16 Apr 2019
I had transaction data in a .CSV where each instance of a sale is a row of data with sku, quantity, and date are columns.
I want to create a new table with the same three columns, but so that the quantity is the sum of all instances for sales of the specific sku for that specific day. For instance, right now we would see two seperate rows for two sales of sku 1234 on March 13. I want the table to have only one row for that sku while the quantity column reads "2" as it is the sum of all sales for that sku on that day.
Thank you

  0 Comments

Sign in to comment.

Accepted Answer

Peter Perkins
Peter Perkins on 16 Apr 2019
Zac, if I understand correctly what you want to do, it's way simpler than all that.
I assume you know that your data file is kind of non-standard. It's hard to tell what you intend as what, but I took a guess. I recommend you take a look at detectimportoptions to deal with it, what I've shjown here is quick and dirty but not the most flexible way of reading that file.
>> fmt = '%f%f%s%s%s%s%f%f%{dd-MMM-yy}D%{dd-MMM-yy}D%{dd-MMM-yy}D%s%s%s';
>> t = readtable('sampleDataSet.csv','Format',fmt,'Delimiter','~','ReadVariableNames',false);
>> t = t(:,[3 7 9]);
>> t.Properties.VariableNames = {'SKU' 'Quantity' 'Date'};
>> t.SKU = categorical(t.SKU)
t =
20×3 table
SKU Quantity Date
_______ ________ _________
9303W 1 18-Mar-13
2031727 1 30-Apr-13
2030117 1 30-Apr-13
2038036 1 30-Apr-13
2038036 1 30-Apr-13
2032445 1 30-Apr-13
2035549 1 30-Apr-13
2046881 2 30-Apr-13
38H5 1 26-Apr-13
52C2 1 26-Apr-13
53Y8K 1 26-Apr-13
43Z38 1 26-Apr-13
9595 1 26-Apr-13
9595 1 26-Apr-13
2031510 1 26-Apr-13
53Y8 15 26-Apr-13
53Y8K 15 26-Apr-13
0001 6 26-Apr-13
1940 1 26-Apr-13
4001 1 26-Apr-13
Once you're read the file, there are several ways to do the grouped calculation you want, including findgroups/splitapply and groupsummary. The following uses varfun. It's one line:
>> varfun(@sum,t,'GroupingVariables',{'SKU' 'Date'})
ans =
17×4 table
SKU Date GroupCount sum_Quantity
_______ _________ __________ ____________
0001 26-Apr-13 1 6
1940 26-Apr-13 1 1
2030117 30-Apr-13 1 1
2031510 26-Apr-13 1 1
2031727 30-Apr-13 1 1
2032445 30-Apr-13 1 1
2035549 30-Apr-13 1 1
2038036 30-Apr-13 2 2
2046881 30-Apr-13 1 2
38H5 26-Apr-13 1 1
4001 26-Apr-13 1 1
43Z38 26-Apr-13 1 1
52C2 26-Apr-13 1 1
53Y8 26-Apr-13 1 15
53Y8K 26-Apr-13 2 16
9303W 18-Mar-13 1 1
9595 26-Apr-13 2 2

  0 Comments

Sign in to comment.

More Answers (1)

Mark Sherstan
Mark Sherstan on 11 Apr 2019
Give this a try! I am sure there are more effcient ways to achieve your outcome in terms of coding or other alternatives (e.g. a pivot table in Excel and then importing the data into MATLAB). I tried it with a few sets of numbers but give it a try on your data to validate that it makes sense and everything is functioning properly.
clear all
clc
% Import using CSV function into table
A = importfile('Book1.csv');
% Find the unique dates
idDate = unique(A.DATE);
% Loop through the unique dates
count = 1;
for ii = 1:length(idDate)
% Look at one specific day
tempDateIdx = find(idDate(ii) == A.DATE);
B = [A.SKU(tempDateIdx) A.QTY(tempDateIdx)];
% Find the unique SKU's
SKUs = unique(B(:,1));
% Loop through skus and store the sum
for jj = 1:length(SKUs)
idxSKUs = find(SKUs(jj) == B(:,1));
out{count,1} = SKUs(jj);
out{count,2} = sum(B(idxSKUs,2));
out{count,3} = idDate(ii);
count = count + 1;
end
end
% Output cell array
out
Autogenerated function:
function Book1 = importfile(filename, startRow, endRow)
%IMPORTFILE Import numeric data from a text file as a matrix.
% BOOK1 = IMPORTFILE(FILENAME) Reads data from text file FILENAME for the
% default selection.
%
% BOOK1 = IMPORTFILE(FILENAME, STARTROW, ENDROW) Reads data from rows
% STARTROW through ENDROW of text file FILENAME.
%
% Example:
% Book1 = importfile('Book1.csv', 2, 8);
%
% See also TEXTSCAN.
% Auto-generated by MATLAB on 2019/04/11 11:20:41
%% Initialize variables.
delimiter = ',';
if nargin<=2
startRow = 2;
endRow = inf;
end
%% Read columns of data as text:
% For more information, see the TEXTSCAN documentation.
formatSpec = '%s%s%s%[^\n\r]';
%% Open the text file.
fileID = fopen(filename,'r');
%% Read columns of data according to the format.
% This call is based on the structure of the file used to generate this
% code. If an error occurs for a different file, try regenerating the code
% from the Import Tool.
dataArray = textscan(fileID, formatSpec, endRow(1)-startRow(1)+1, 'Delimiter', delimiter, 'TextType', 'string', 'HeaderLines', startRow(1)-1, 'ReturnOnError', false, 'EndOfLine', '\r\n');
for block=2:length(startRow)
frewind(fileID);
dataArrayBlock = textscan(fileID, formatSpec, endRow(block)-startRow(block)+1, 'Delimiter', delimiter, 'TextType', 'string', 'HeaderLines', startRow(block)-1, 'ReturnOnError', false, 'EndOfLine', '\r\n');
for col=1:length(dataArray)
dataArray{col} = [dataArray{col};dataArrayBlock{col}];
end
end
%% Close the text file.
fclose(fileID);
%% Convert the contents of columns containing numeric text to numbers.
% Replace non-numeric text with NaN.
raw = repmat({''},length(dataArray{1}),length(dataArray)-1);
for col=1:length(dataArray)-1
raw(1:length(dataArray{col}),col) = mat2cell(dataArray{col}, ones(length(dataArray{col}), 1));
end
numericData = NaN(size(dataArray{1},1),size(dataArray,2));
for col=[1,2]
% Converts text in the input cell array to numbers. Replaced non-numeric
% text with NaN.
rawData = dataArray{col};
for row=1:size(rawData, 1)
% Create a regular expression to detect and remove non-numeric prefixes and
% suffixes.
regexstr = '(?<prefix>.*?)(?<numbers>([-]*(\d+[\,]*)+[\.]{0,1}\d*[eEdD]{0,1}[-+]*\d*[i]{0,1})|([-]*(\d+[\,]*)*[\.]{1,1}\d+[eEdD]{0,1}[-+]*\d*[i]{0,1}))(?<suffix>.*)';
try
result = regexp(rawData(row), regexstr, 'names');
numbers = result.numbers;
% Detected commas in non-thousand locations.
invalidThousandsSeparator = false;
if numbers.contains(',')
thousandsRegExp = '^[-/+]*\d+?(\,\d{3})*\.{0,1}\d*$';
if isempty(regexp(numbers, thousandsRegExp, 'once'))
numbers = NaN;
invalidThousandsSeparator = true;
end
end
% Convert numeric text to numbers.
if ~invalidThousandsSeparator
numbers = textscan(char(strrep(numbers, ',', '')), '%f');
numericData(row, col) = numbers{1};
raw{row, col} = numbers{1};
end
catch
raw{row, col} = rawData{row};
end
end
end
% Convert the contents of columns with dates to MATLAB datetimes using the
% specified date format.
try
dates{3} = datetime(dataArray{3}, 'Format', 'yyyy-MM-dd', 'InputFormat', 'yyyy-MM-dd');
catch
try
% Handle dates surrounded by quotes
dataArray{3} = cellfun(@(x) x(2:end-1), dataArray{3}, 'UniformOutput', false);
dates{3} = datetime(dataArray{3}, 'Format', 'yyyy-MM-dd', 'InputFormat', 'yyyy-MM-dd');
catch
dates{3} = repmat(datetime([NaN NaN NaN]), size(dataArray{3}));
end
end
dates = dates(:,3);
%% Split data into numeric and string columns.
rawNumericColumns = raw(:, [1,2]);
%% Create output variable
Book1 = table;
Book1.SKU = cell2mat(rawNumericColumns(:, 1));
Book1.QTY = cell2mat(rawNumericColumns(:, 2));
Book1.DATE = dates{:, 1};
% For code requiring serial dates (datenum) instead of datetime, uncomment
% the following line(s) below to return the imported dates as datenum(s).
% Book1.DATE=datenum(Book1.DATE);

  3 Comments

Zac Lingen
Zac Lingen on 11 Apr 2019
Thanks for your answer! I went ahead and gave it a shot and got this error.
shipTable = readTable('orders.csv');
%Declaring names for column and cutting out unneeded data
Sku = shipTable(:,'Var3');
QuantityShipped = shipTable(:,'Var7');
DateOrder = shipTable (:,'Var9');
skuTable = table(Sku,QuantityShipped,DateOrder);
disp(skuTable)
% Now begins my implementation of your code:
% Finding unique dates
idDate = unique(skuTable.DateOrder);
% Looping through unique dates
count = 1;
for i = 1:length(idDate)
tempDateIdx = find(idDate(i) == skuTable.DateOrder);
% Creating new table to hold values
refactoredTable = [skuTable.Sku(tempDateIdx) skuTable.QuantityShipped(tempDateIdx)];
% Find Unique skus
uniqueSkus = unique(refactoredTable(:,1));
% Loop through skus and store sum
for j = 1:length(uniqueSkus)
idxSKUs = find(uniqueSkus(j) == refactoredTable(:,1));
out{count,1} = uniqueSkus(j);
out{count,2} = sum(refactoredTable(idxSKUs,2));
out{count,3} = idDate(i);
count = count + 1;
end
end
% Outputs cell array
out
At the beginning of the first loop I am getting the following error: Error using tabular/length (line 189). Unedfined function 'length" for input arguments of type 'table'. Use the HEIGHT, WIDTH or SIZE functions instead.
Rewrote it to call height function rather than length and am returning the following error: Subscripting a table using linear indexing (one subscript) or multidimensional indexing (three or more subscripts is not supported. Use a row subscript and a variable subscript.
Thank you again for taking the time to review this.
Mark Sherstan
Mark Sherstan on 12 Apr 2019
I think I see whats going on but would you be able to share or make up a subset of your data for me to test the code on? Less than 10 entries should be more than enough.
Zac Lingen
Zac Lingen on 12 Apr 2019
Commenting again as my last one was flagged. Use ~ as a delimiter. Only care about columns 3 (sku), 7 (Quantity Shipped), and 9 (Date)

Sign in to comment.

Sign in to answer this question.