how can I append a Parquet file?
39 views (last 30 days)
Show older comments
Hello,
I have a Parquet file that I wish to append. I looked at the documentation of parquetwrite but doesnt provide any info on appending. It looks like this was an option in the old interface setting the option 'AppendData' to true:
0 Comments
Accepted Answer
Kevin Gurney
on 10 Sep 2020
The version of parquetwrite introduced in R2019a does not currently support appending to preexisting Parquet files on disk.
The "AppendData" name-value pair that you referenced in the Parquet Support Package does not append to a preexisting file, but rather incrementally writes chunks of data to an open Parquet file output stream.
The Support Package uses a "stateful Writer object" in conjunction with multiple write() calls to achieve this. The Parquet file output stream is closed when a call to finish() is made.
There is currently no equivalent ParquetWriter object shipping in MATLAB.
----------
An alternative workflow to appending chunks of data to a preexisting Parquet file, would be to write out new Parquet files and then "emulate" the behavior of having one contiguous Parquet file using parquetDatastore.
If you write multiple Parquet files to disk in sequence (one for each chunk), which have consecutive numeric suffixes (e.g. data_01.parquet, data_02.parquet, ..., data_0N.parquet), you can use parquetDatastore to order these files as though they were one contiguous Parquet file. With this approach, you can call readall(parquetDatastore) to read the entire sequence of Parquet file "chunks" in one function call.
An example:
% Assuming the current directory contains data_01.parquet, data_02.parquet, ..., data_0N.parquet.
>> data = readall(parquetDatastore("data*.parquet"));
0 Comments
More Answers (0)
See Also
Categories
Find more on Data Import and Analysis in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!