Importing multiple datasets from one text file
    10 views (last 30 days)
  
       Show older comments
    
Below is a text file that contains three separate data sets of varying columns and number of rows. I wanted to know if there is a way to read in each table as a separate structure so that I can convert each column to a variable for later calculation. Importdata only retrieves the first table and ignores everything that follows since the data is not of the same format. This text file came from the web in this configuration, and I want it so that the user does not have to manually copy each table, paste to a new .txt, then ingest each file separately with importdata. Is it worth writing extra code just to have all three tables consolidated in one file? Any suggestions are welcome.
Thanks,
ALT  DIR   SPD  SHR  TEMP  DPT   PRESS   RH ABHUM DENSITY I/R V/S  VPS  PW
GEOMFT DEG   KTS /SEC DEG C DEG C   MBS   PCT  G/M3   G/M3   N  KTS  MBS  MM
2372 259  18.1 .000  23.8   0.2  932.20  21  4.52 1090.86 270 673  6.19  0
2500 272  17.0 .054  23.4  -0.1  928.13  21  4.43 1087.43 269 672  6.06  0
3000 273  29.2 .041  20.5  -1.3  911.86  23  4.09 1079.29 265 669  5.55  1
3500 275  26.3 .010  19.3  -1.2  895.87  25  4.15 1064.64 262 667  5.60  1
4000 267  22.5 .017  17.9  -2.3  880.15  25  3.83 1051.01 257 666  5.14  2
4500 258  20.7 .013  16.4  -3.1  864.51  26  3.64 1037.75 253 664  4.86  3
5000 259  23.4 .009  15.0  -3.8  849.12  27  3.46 1024.46 249 662  4.61  3
5500 265  26.7 .014  13.6  -4.5  833.95  28  3.30 1011.15 245 661  4.36  4
6000 261  20.9 .021  13.1  -5.9  818.96  26  2.97  994.87 240 660  3.92  4
6500 258  20.2 .004  11.9  -6.0  804.23  28  2.96  981.21 237 659  3.89  5
7000 254  20.6 .005  11.8  -8.8  789.72  23  2.38  964.02 229 659  3.13  5
7500 278  21.1 .028  11.2 -14.5  775.42  15  1.52  949.19 221 658  1.99  5
8000 277  21.3 .001  10.5 -21.0  761.39   9  0.87  934.58 214 657  1.14  6
8500 271  20.9 .008   9.4 -22.2  747.54   9  0.79  921.19 210 655  1.03  6
9000 260  18.4 .015   8.4 -25.4  733.86   7  0.59  907.66 206 654  0.77  6
9500 250  15.7 .013   7.4 -27.3  720.44   6  0.50  894.29 202 653  0.65  6
10000 242  16.3 .008   6.7 -35.4  707.21   3  0.23  880.22 198 652  0.29  6
10500 238  18.0 .007   5.6 -30.8  694.24   5  0.36  867.41 195 651  0.46  6
11000 236  19.4 .005   4.5 -31.8  681.39   5  0.33  854.70 192 650  0.42  6
11500 249  19.9 .015   3.3 -36.5  668.69   3  0.21  842.65 189 648  0.26  6
12000 249  15.6 .015   2.2 -38.6  656.20   3  0.17  830.23 186 647  0.21  6
12500 244  16.2 .005   1.0 -30.9  643.91   7  0.36  817.95 185 646  0.46  6
13000 241  19.3 .011  -0.2 -26.1  631.82  12  0.57  806.04 183 644  0.72  6
13500 243  21.1 .006  -1.4 -23.9  619.87  16  0.70  794.23 181 643  0.88  6
14000 233  21.6 .013  -2.1 -26.8  608.15  13  0.54  781.38 178 642  0.68  6
14500 242  19.3 .013  -3.0 -25.9  596.57  15  0.59  768.94 175 641  0.73  6
15000 248  21.4 .010  -4.0 -27.5  585.18  14  0.51  757.22 172 640  0.63  7
16000 251  22.5 .003  -6.4 -27.4  562.98  17  0.52  734.92 167 637  0.64  7
17000 249  25.8 .006  -8.8 -29.3  541.35  17  0.44  713.04 162 634  0.54  7
18000 246  25.9 .003 -11.2 -30.7  520.46  18  0.39  691.92 157 631  0.47  7
19000 262  27.4 .013 -13.6 -34.0  500.12  16  0.28  671.20 151 628  0.34  7
20000 262  32.9 .009 -15.3 -48.3  480.41   4  0.06  649.02 145 626  0.07  7
21000 252  36.4 .012 -17.6 -43.8  461.40   8  0.10  628.91 141 623  0.12  7
22000 249  39.0 .005 -20.3 -43.9  442.92  10  0.10  610.17 137 620  0.12  7
23000 239  37.4 .012 -22.9 -43.6  424.99  13  0.11  591.56 133 617  0.13  7
24000 238  35.9 .003 -25.6 -41.0  407.63  22  0.15  573.54 129 613  0.17  7
25000 248  38.2 .011 -28.6 -39.5  390.81  34  0.17  556.55 125 610  0.19  7
26000 251  38.1 .004 -31.4 -40.5  374.47  40  0.16  539.62 121 606  0.17  7
27000 254  37.8 .003 -34.2 -46.3  358.72  28  0.08  522.92 117 603  0.09  7
28000 255  40.1 .004 -36.9 -50.3  343.38  23  0.05  506.30 113 599  0.06  7
29000 253  40.1 .002 -39.6 -46.2  328.51  49  0.09  489.95 110 596  0.09  7
30000 262  42.5 .012 -41.3 -55.7  314.23  19  0.03  472.12 105 594  0.03  7
31000 270  44.6 .011 -43.6 -56.2  300.40  23  0.03  455.87 102 591  0.03  7
32000 271  47.9 .006 -46.0 -54.5  287.00  37  0.03  440.13  98 588  0.04  7
33000 271  52.4 .008 -48.3 -56.0  274.19  40  0.03  424.80  95 585  0.03  7
34000 270  54.5 .004 -50.9 -59.0  261.77  37  0.02  410.29  92 581  0.02  7
35000 268  55.5 .004 -53.3 -61.2  249.69  37  0.01  395.66  88 578  0.02  7
36000 265  55.7 .005 -55.7 -63.8  238.17  35  0.01  381.60  85 575  0.01  7
37000 263  55.0 .003 -58.1 -66.8  227.02  31  0.01  367.75  82 572  0.01  7
38000 264  51.4 .006 -60.7 -69.2  216.27  31  0.01  354.67  79 568  0.00  7
39000 268  54.5 .008 -62.4 -70.8  205.94  31  0.00  340.47  76 566  0.00  7
40000 265  44.6 .017 -61.2 -72.9  196.04  19  0.00  322.22  72 568  0.00  7
41000 257  46.6 .011 -58.4 -74.7  186.74  10  0.00  302.88  67 571  0.00  7
42000 255  39.9 .012 -56.5 -80.7  177.95   3  0.00  286.14  64 574  0.00  7
43000 251  42.2 .006 -56.7 -83.3  169.65   2  0.00  273.03  61 573  0.00  7
44000 245  46.2 .010 -57.0 -83.5  161.69   2  0.00  260.60  58 573  0.00  7
45000 249  51.6 .011 -58.2 -84.4  154.13   2  0.00  249.84  56 571  0.00  7
TERMINATION       45718 GEOPFT  13935 GEOPM  147.9 MBS
TROPOPAUSE   38621  FEET   209.80 MB  -62.4 C  -70.7 C
MANDATORY LEVELS GEOPFT DIR KTS TEMP DPT PRESS RH
2592 273  21  22.0  -0.7  925.0  22 
4967 259  23  15.0  -3.8  850.0  27 
10262 249  16   6.1 -30.7  700.0   5 
18971 262  27 -13.7 -34.0  500.0  16 
24398 240  38 -27.0 -40.3  400.0  27 
30958 270  44 -43.7 -56.3  300.0  23 
34888 268  55 -53.3 -61.1  250.0  37 
39483 266  47 -61.8 -71.9  200.0  24 
45419 247  51 -58.8 -84.8  150.0   2 
SIGNIFICANT LEVELS
GEOMFT DIR KTS  TEMP   DPT PRESS   IR  RH
2372 259  18  23.8   0.2  932.2 270  21 
2390 272  11  25.0   0.5  931.7 269  20 
2426 272  14  23.8  -0.5  930.5 268  20 
2586 273  21  22.0  -0.7  925.3 268  22 
5105 261  24  14.6  -3.7  845.9 249  28 
5358 265  26  14.0  -3.7  838.2 247  29 
5556 265  26  13.5  -4.6  832.3 245  28 
5786 259  22  13.7  -5.4  825.3 242  26 
6718 251  17  11.1  -6.7  797.9 235  28 
6806 247  19  11.3  -6.5  795.3 234  28 
6910 248  20  11.8  -7.6  792.3 232  25 
7181 268  18  12.0 -14.6  784.5 222  14 
7446 278  20  11.1 -14.5  777.0 221  15 
7669 277  21  11.3 -18.1  770.7 217  11 
8923 262  18   8.3 -24.0  736.0 207   8 
9721 241  17   7.3 -32.1  714.6 200   4 
9885 238  17   6.9 -35.3  710.3 198   3 
12291 245  14   1.6 -38.9  649.1 184   3 
12944 241  19  -0.1 -27.0  633.2 183  11 
13692 243  23  -1.9 -22.4  615.4 181  19 
14083 233  20  -2.2 -27.7  606.2 177  12 
14166 235  20  -2.2 -27.7  604.3 176  12 
15717 253  20  -6.0 -27.7  569.2 169  16 
16233 255  22  -7.0 -26.7  557.9 166  19 
16465 254  23  -7.5 -32.8  552.8 163  11 
16660 253  23  -8.0 -28.7  548.6 164  17 
18140 246  25 -11.3 -30.8  517.6 156  18 
18477 252  26 -12.3 -31.7  510.7 154  18 
19235 263  27 -14.3 -34.5  495.4 150  16 
19437 269  25 -14.8 -35.6  491.4 149  15 
19576 267  25 -15.2 -37.3  488.7 148  13 
19691 264  27 -14.8 -42.9  486.4 147   7 
19708 264  27 -14.9 -44.4  486.1 147   6 
21380 251  37 -18.5 -44.5  454.3 139   8 
23390 236  36 -23.9 -45.9  418.2 131  11 
23802 237  35 -25.0 -42.3  411.1 129  18 
25167 250  37 -29.1 -39.6  388.1 125  35 
25230 249  37 -29.2 -39.2  387.0 124  37 
25867 250  37 -31.1 -40.2  376.6 122  40 
26472 254  38 -32.7 -45.3  367.0 119  27 
27312 255  38 -35.2 -47.8  353.8 116  26 
27632 256  38 -36.0 -49.9  348.9 115  22 
28124 255  40 -37.2 -50.6  341.5 113  23 
28568 253  40 -38.4 -50.0  334.9 111  28 
29089 253  40 -39.9 -45.9  327.2 110  52 
29119 253  40 -39.9 -45.8  326.8 109  53 
29297 253  40 -40.5 -45.8  324.2 109  56 
29754 261  42 -40.7 -49.4  317.7 107  38 
38551 262  50 -62.3 -70.6  210.5  78  31 
38621 262  50 -62.4 -70.7  209.8  77  31 
38935 267  54 -62.5 -71.0  206.6  76  30 
40463 258  45 -61.0 -73.9  191.7  70  16 
41687 256  39 -56.8 -77.8  180.7  65   5 
45858 247  53 -59.1 -85.0  147.9  54   2
0 Comments
Accepted Answer
  dpb
      
      
 on 2 Apr 2015
        
      Edited: dpb
      
      
 on 3 Apr 2015
  
      textscan will do it fine with some effort...basically three separate calls with 'headerlines',2 and returning the array for each call as a separate variable would be my first try. If you're lucky the failure of the first '%f' format will leave the file pointer at the right location for the next and then again.
Presuming the number of columns is fixed but the number of rows is variable, write a format string for each as
fmt=repmat(1,N,'%f');
where N is 14, 7 and 8, respectively for the three sections. If you need to parse the numeric values from the header/trailer of the intermediate case, then that'll have to be done specifically for the format of the text, of course, instead of treating those lines also as headerlines.
ADDENDUM
OK, I pasted the text into a file. Ignoring the trailing data within the footer which can be parsed if desired, the block data can be read pretty easily...
>> N=[14,7,8];  % the number of columns for each section
>> H=[2,5,3];   % the number of header lines (note added in the trailer here
>> fid=fopen('jeff.txt');
>> for i=1:3
     fmt=repmat('%f',1,N(i));  % build the format string for the number columns
     a(i)=textscan(fid,fmt,'headerlines',H(i),'collectoutput',1); % read section
   end
>> fid=fclose(fid);
>> a
a = 
  [57x14 double]    [9x7 double]    [53x8 double]
>>
3 Comments
  kelian dascher-cousineau
 on 13 Apr 2017
				Would there be a way to automatically identify the number of data blocks, the number of headers and columns within the text file?
  dpb
      
      
 on 13 Apr 2017
				Depends on what you mean by "automatic". There is no magic bullet that will return that information for any arbitrary file structure, no.  importdata lets you look at a file manually which may be all you need once but just a one-liner for any file you choose from any source is pretty much an infinite problem space.
More Answers (1)
  Konstantinos Sofos
      
 on 2 Apr 2015
        Hi,
I assume that dlmread will do the job that you need from the time that all of your data (as i see) are numerics.
M = dlmread(filename)
Regards
2 Comments
  dpb
      
      
 on 4 Sep 2019
				If it is just written sequentially into the file, then, sure.  You'll get back one array of the full size, though, so you'll have to know a priori how many records belong in each, or, if one column is some sort of a time stamp, process it to find beginning of next data section (presuming it starts over from zero or similar,not just clock time--of course, the latter might have larger gap between sections).
We really can't do more than guess without specifics of file in detail...
See Also
Categories
				Find more on Workspace Variables and MAT Files in Help Center and File Exchange
			
	Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!



