Why does the "parquetwrite" function yield an error when used to write a column table containing mixed heterogenous primitive types in MATLAB R2023a?

I am trying to write a single column table of mixed primitive types wrapped in cells to a parquet file. Since everything in the table is still a "cell", my understanding is that this should work. My code is as follows:
cellTable = table({1, [1,2,3], "hello", ["hi", "bye"]}')
parquetwrite(filename, cellTable)
On running the above code, I get this error message:
Error using parquetwrite T.Var1{3} is a string array. Based on T.Var1{1}, expected either a double array or a scalar <missing> value.
Is this behavior expected?

 Accepted Answer

The "parquetwrite" function is working as intended.
It is not possible to write mixed primitive types in one variable to a Parquet column. This is because Parquet columns are strongly-typed, and you cannot write heterogenous primitive data to one column. When using cell arrays, we map them to Parquet LIST type, which are represented by two arrays in Parquet: the data array and an index array. The index array tells you how to partition the data array into rows. 
For instance, if you have this cell array:
>> cellArray =
  3×1 cell array
    {[    1]}
    {[2 3 4]}
    {[  5 6]}
 This gets mapped to a Parquet LIST column with this data array and index array:
>> data = [1 2 3 4 5 6]
>> index = [0 1 4 6] % note arrow uses 0-based indexing
In other words, the data is stored in a contiguous array, so it cannot contain mixed primitive types. This is why we cannot write a cell array containing both doubles and strings to a Parquet column.

More Answers (0)

Categories

Products

Release

R2023a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!