Extracting survey data from individual files and combining

1 view (last 30 days)
I have folder containing thousands of individuals files. In each file is text (representing a survey response) of the following format:
2022_1_29_14_29_51_913,0, 0
2022_1_29_14_29_56_761,1, 0
2022_1_29_14_30_4_268,2, 3
2022_1_29_14_30_8_6,4, 1
2022_1_29_14_30_11_87,5, 1
2022_1_29_14_30_16_668,6, 0
2022_1_29_14_30_19_470,7, 1
2022_1_29_14_30_23_586,8, 3
2022_1_29_14_30_32_501,16, 4
Where the 2nd to last number indicates the question number on a survey and the last number is the answer value for that question.
I need to combine all of these files into a single table, where each row represents one file (survey) and each answer (last number) is placed in a column that corresponds to the question number (2nd to last number).
I have no problem reading in the files. I do not know how to access just the last 2 values from each line, and use the 2nd to last number as the location of the last number in a table (with NaN values where there are skipped questions).
Any ideas would be very useful. Thank you.
  2 Comments
Erik J
Erik J on 18 Oct 2022
There are 16 possible questions, but not all files have all 16 answers because of skip logic in the survey. So, I need to use the 2nd to last number to place the last number in that column, and NaN elsewhere.
I have attached a couple files.

Sign in to comment.

Accepted Answer

Karim
Karim on 18 Oct 2022
Edited: Karim on 19 Oct 2022
I used the text files from your comment to update my answer, to (hopfully) better fit it to what you are looking for.
I was not sure what to do with question no "0", hence i added a column for that one as well. I added comments in the code to explain the idea.
Hope it helps :)
% list the files
MyFiles = ["https://www.mathworks.com/matlabcentral/answers/uploaded_files/1161118/003_003_0_2022_1_29_14_29_40_741_SA.txt";
"https://www.mathworks.com/matlabcentral/answers/uploaded_files/1161123/003_003_0_2022_2_6_12_31_6_722_SA.txt"];
MyFileNames = ["003_003_0_2022_1_29_14_29_40_741_SA";
"003_003_0_2022_2_6_12_31_6_722_SA"];
% allocate a table
sz = [numel(MyFiles) 18]; % 16 questions + survey name
varTypes = ["string",repmat("double",1,sz(2)-1)];
varNames = ["Survey","Q0","Q1","Q2","Q3","Q4","Q5","Q6","Q7","Q8","Q9","Q10","Q11","Q12","Q13","Q14","Q15","Q16"];
MyTable = table('Size',sz,'VariableTypes',varTypes,'VariableNames',varNames);
% set all answers tot nan
MyTable{:, 2:end} = NaN;
% have a look at the empty table
MyTable
MyTable = 2×18 table
Survey Q0 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 _________ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ <missing> NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN <missing> NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
for i = 1:numel(MyFiles)
% read new file
currTable = readtable(MyFiles(i),'Format','%s %f %f');
% use file name as survey name
MyTable{i, 1 } = MyFileNames(i);
% append the new data into the large table
Q_num = currTable{:,2}+2; % -> determine the column number in the table
Q_val = currTable{:,3}; % -> values for the answers
MyTable{i, Q_num } = Q_val.'; % -> note the transpose to switch from column to row vector
end
% have a look at the final table
MyTable
MyTable = 2×18 table
Survey Q0 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 _____________________________________ __ __ __ ___ ___ __ __ __ __ ___ ___ ___ ___ ___ ___ ___ ___ "003_003_0_2022_1_29_14_29_40_741_SA" 0 0 3 NaN 1 1 0 1 3 NaN NaN NaN NaN NaN NaN NaN 4 "003_003_0_2022_2_6_12_31_6_722_SA" 0 0 1 1 NaN 1 1 1 3 2 0 4 0 4 2 4 4

More Answers (0)

Tags

Products


Release

R2022b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!