custom byte swapping of binary file
5 views (last 30 days)
Show older comments
As far as I know fread won't solve this problem (easily) because it applies a single remapping to the entire file. I have a binary file with fields of different byte lengths and types (2 & 4 byte integers as well as floats) that needs to be byte swapped.
My only idea right now is to use fread with and set the position of each entry (break for change in byte swapping type). While this could work it's not very desirable for the project due to the length of each data file and number of file identifiers that would be floating around - plus they'd need to be created dynamically depending on the amount of data written into a given multiplexed file. The project is trying to SEGY data from a custom multiplexing structure written by a post doc that's probably retired by now and no one can find the description of the multiplexing structure. I'm most familiar with MatLab (why I'm trying to do it here) but would like to get the end script to C. The alternative is Unix scripting via some package. I haven't found a nice package to do something like this so if you know of something down this route I'm all ears.
Running 2011b on OS X 10.6.
Thanks, Peter
1 Comment
Geoff
on 21 May 2012
Yep I would do this in C... But you could do it in MatLab if you wanted.. Just read the entire file as binary and massage accordingly. When you say "byte swapped" I assuming you mean changing the "endian-ness".
The best way to infer the structure of a binary file format is by examining it with a hex editor. There ought to be a nice free hex editor for Mac about one Google away.
If you have real example data to match with your stored data, this makes the job a lot easier.
Answers (2)
Geoff
on 21 May 2012
Hey Peter, do you actually know the format of your data? I was under the impression that you didn't know what the binary structure was. Look, if that's the case then this should be simpler than you think.
Don't do multiple file operations.. Just do one. It's called a slurp.
FID = fopen( 'mydata.dat', 'rb' );
data = fread(FID);
fclose(FID);
Now, if you have a particular structure that repeats in the file, make a mapping of how the bytes in this structure should be modified:
bytemap = [1,3,2,7,6,5,4];
Then rearrange your data bytes to align it with this structure (assuming its size is a multiple of that structure size) -- each column will represent one instance of the struct:
data = reshape(data, numel(bytemap), []);
Now, remap the rows based on bytemap:
data = data(bytemap,:);
And, if you like, reshape it back to a vector... or whatever you wanna do:
data = data(:); % <-- optional, really...
FID = fopen( 'remapped.dat', 'wb' );
fwrite( FID, data );
fclose(FID);
3 Comments
Geoff
on 21 May 2012
Cool. Well, I figured your problem would not be as simple as I described, but exploiting MatLab's matrix functions is a reasonable foundation... Like you say, you just need to throw it at the right sections of the data... And MatLab's helpful that way too. Find your channel start/end ranges and just index those bytes directly out. Searching for recognisable patterns is perfectly valid, and you can use regexp() or strfind() for that. If you get false hits it'll be pretty obvious.
The other way of course is to decipher the basic structure of the rest of the file... There's only a handful of options for storing arbitrary binary data... You either store the size of a structure/chunk as you go (eg AVI files), have a hard-coded structure size (eg BMP files), or have a specific pattern that denotes the end of data and/or beginning of new data. If you get stuck on reverse engineering parts of the file format, get in touch with me. I've had a bit of experience with this kind of thing and might be able to help.
Jan
on 21 May 2012
I do not understand the question. Do you want to read a IEEE-LE ordered file on a IEEE-BE machine? Then you can specify the ordering in fopen. If you want to define the ordering for a specific element only, you can use:
FREAD(FID, SIZE, PRECISION, MACHINEFORMAT)
2 Comments
Jan
on 21 May 2012
1. I'd omit the 'b' in the FREAD and add it to the FOPEN. 2. I expect, that not FREAD is the problem, which limits the speed, but "data=[data, temp]". Letting a variable grow repeatedly is a bad idea. You can search in this forum for "pre-allocation" to learn more about this.
3. In "fread(FID, 4:7)" you read a [4x5x6x7] array. Is this intended?
4. I still do not understand, why the swapping of the bytes is required.
See Also
Categories
Find more on Data Type Conversion in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!