What is a byte buf in MATLAB, and how do I pass it to read from the GZIP_INPUT_STREAM to retrieve a fixed number of bytes?
I'm suffering from the same problem. I have large files (+12GB over 4 GZ files) to dynamically stream in, dependent on the next header, sometimes up to 64MB chunks. I'm aware I could gunzip all the files, but this read in VERY slow.
Using the Java objects:
FILE_INPUT_STREAM = javaObject( 'java.io.FileInputStream', DATA_PATH_CHAR );
GZIP_INPUT_STREAM = javaObject( 'java.util.zip.GZIPInputStream', FILE_INPUT_STREAM );
I can use the standard read method one byte at a time. This is insignificantly faster than reading the files from the hard drive.
DATA_BYTES_UINT8 = zeros( DATA_LENGTH, 1, 'uint8' );
DATA_BYTES_UINT8(:) = arrayfun( @(b)read(GZIP_INPUT_STREAM), DATA_BYTES_UINT8 );
The fastest method, however, requires the most memory, which isn't feasible for these files.
BYTE_ARRAY_OUTPUT_STREAM = javaObject( 'java.io.ByteArrayOutputStream' );
INTERRUPTIBLE_STREAM_COPIER = com.mathworks.mlwidgets.io.InterruptibleStreamCopier.getInterruptibleStreamCopier();
copyStream( INTERRUPTIBLE_STREAM_COPIER, GZIP_INPUT_STREAM, BYTE_ARRAY_OUTPUT_STREAM );
DATA_BYTES_INT8 = toByteArray( BYTE_ARRAY_OUTPUT_STREAM );
The Oracle documentation describes that the read method for a GZIPInputStream accepts a byte [buffer], integer offset, and integer length of bytes to read. What is a byte buf in MATLAB, and how do I pass it to read from the GZIP_INPUT_STREAM to retrieve a fixed number of bytes? I'll document my failed attempts, in hopes it will draw more searches toward an eventual solution.
- I tried MATLAB Java Arrays of bytes.
JAVA_ARRAY = javaArray( 'java.lang.Byte', DATA_LENGTH, 1 );
BYTES_READ = read( GZIP_INPUT_STREAM, JAVA_ARRAY, 0, DATA_LENGTH );
- I've tried Java Byte Buffers.
BYTE_BUFFER = javaMethod( 'allocate', 'java.nio.ByteBuffer', DATA_LENGTH );
BYTES_READ = read( GZIP_INPUT_STREAM, BYTE_BUFFER, 0, DATA_LENGTH );
- I've tried passing an array, knowing this would not work. Oddly enough, the BYTES_READ was NOT the same as DATA_LENGTH nor was it the current position in the GZIP file. Someone correct me, but I believe the property to describe the MATLAB array would "immutable?" Is there a way to make a (u)int8 array not immutable?
DATA_IN = zeros(DATA_LENGTH,1,'uint8');
BYTES_READ = read( GZIP_INPUT_STREAM, DATA_IN, 0, DATA_LENGTH );
- I've tried passing in a pointer.
DATA_PTR = libpointer( 'uint8Ptr', DATA_IN );
BYTES_READ = javaMethod( 'read', GZIP_INPUT_STREAM, DATA_PTR, 0, DATA_LENGTH );
- I've tried another Buffered input stream, and used the interruptible stream copier. While I received ONLY the remaining data, it was not bounded to DATA_LENGTH.
BUFFERED_INPUT_STREAM = javaObject( 'java.io.BufferedInputStream', GZIP_INPUT_STREAM, DATA_LENGTH );
BYTE_ARRAY_OUTPUT_STREAM = javaObject( 'java.io.ByteArrayOutputStream', DATA_LENGTH );
copyStream( INTERRUPTIBLE_STREAM_COPIER, BUFFERED_INPUT_STREAM, BYTE_ARRAY_OUTPUT_STREAM );
DATA_IN(:) = toByteArray( BYTE_ARRAY_OUTPUT_STREAM );
Edit 1: cleared up confusing statements Edit 2: added links to GZIPInputStream Edit 3: asked if passing MATLAB array was useless because it is "immutable"