How to extract rows of data according to text containing specific words in cells in Matlab

16 views (last 30 days)
Hi all,
I am quite stuck with a problem, I am trying to extract certain variables from large Excel files that classify organisms from multiple years in Excel so I can process it in MATLAB. I want to extract all columns from A to L and the row-number of the data I need starts from 657828: 1048576. I have tried the filter function in excel but it doesn't work so I am doing it in MATLAB. How I want to filter it includes the column j called object_annotation_hierachy and the precise species I am trying to filter out are the following:
Arthropoda_Crustacea_Maxillopoda_Copepoda_Calanoida_Calanidae
Arthropoda_Crustacea_Maxillopoda_Copepoda_Calanoida_Metridinidae
Arthropoda_Crustacea_Maxillopoda_Copepoda_Calanoida_Candaciidae
Arthropoda_Crustacea_Maxillopoda_Copepoda_Calanoida_Heterorhabdidae
Arthropoda_Crustacea_Maxillopoda_Copepoda_Calanoida_Euchaetidae
Arthropoda_Crustacea_Maxillopoda_Copepoda_Calanoida_Metridinidae
Arthropoda_Crustacea_Maxillopoda_Copepoda_Cyclopoida_Oithonidae
Arthropoda_Crustacea_Maxillopoda_Copepoda_Calanoida_Acartiidae
Arthropoda_Crustacea_Maxillopoda_Copepoda_Calanoida_Temoridae
All other species are a variations of this but I am trying to include all data with 'Copepoda' in the title.
Further I want to extract by year which is in the first column and called object_id with the name ['cruise2012'] up to 2016.
The code so far looks like this, however it does not work:
C=csvread('cruise_2004_2016_ZooScan_dataset.csv');
%R657828, C1048576 (just used in first line of code to show location)
copepods= contains(C.object_id=="cruise2012")&(C.object_annotation_hierachy,"Copepoda");
C1=C(copepods,:);
Any help would be much appreciated!
  5 Comments
Sophia
Sophia on 30 May 2023
Okay i have uploaded it and replaced it will dummy text, it is usually a lot longer. I have also split and numerized the taxonomy and code is in the sheet in case there is an easier way to do it. Because i have numerized the 1st and 10th row i used this code but it still doesnt seem to be working.
opts=detectImportOptions("Zoocam.xlsx");
opts.VariableTypes(2)={'double'};
opts.VariableTypes(19)={'double'};
opts.VariableTypes(20)={'double'};
C=readtable('Zoocam.xlsx', opts);
C.index=(C.object==2017 & C.object_annotation_hierarchy<18);
C_new=C(C.index==1,{'object','object_lat', 'object_lon','object_annotation_hierarchy', 'object_area'});
writetable(C_new,'2017_data.csv');

Sign in to comment.

Accepted Answer

Matt J
Matt J on 27 May 2023
Edited: Matt J on 30 May 2023
copepods= contains(C.object_id,"cruise2012") & ...
contains(C.object_annotation_hierarchy,"Copepoda");
  14 Comments

Sign in to comment.

More Answers (0)

Products


Release

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!