Explore and Import Data from Cassandra Database Table
This example shows how to explore the structure of an Apache™ Cassandra® database and import data from a Cassandra database table into MATLAB® using a Cassandra database connection with the Apache Cassandra database C++ interface. The Cassandra database stores database tables according to the partition key. The partition key affects how the data is filtered in the database.
In this example, the Cassandra database contains the employees_by_job
database table with employee data and the job_id
partition key.
Create Cassandra Database Connection
Create a Cassandra database connection using the configured data source CassandraDataSource
and a blank user name and password. The apacheCassandra
function returns conn
as a connection
object.
datasource = "CassandraDataSource"; username = ""; password = ""; conn = apacheCassandra(datasource,username,password);
Explore Cassandra Database Structure
View the keyspaces in the Cassandra database using the Cassandra database connection. The returned string array contains the keyspaces. A keyspace contains one or more database tables and defines how the database replicates the data in the tables.
conn.Keyspaces
ans = 6×1 string
"employeedata"
"system"
"system_auth"
"system_distributed"
"system_schema"
"system_traces"
Return the names of the Cassandra database tables in the employeedata
keyspace. t
is a string array that contains the names of the database tables in the employeedata
keyspace.
keyspace = "employeedata";
t = tablenames(conn,keyspace)
t = 3×1 string
"employees_by_id"
"employees_by_job"
"employees_by_name"
Return the names of the Cassandra database columns in the employees_by_job
database table.
tablename = "employees_by_job";
cols = columninfo(conn,keyspace,tablename);
Display the first few names of the Cassandra database columns in the employees_by_job
database table.
head(cols)
ans=8×4 table
Name DataType PartitionKey ClusteringColumn
________________ ________ ____________ ________________
"job_id" "text" true ""
"hire_date" "date" false "DESC"
"employee_id" "int" false "ASC"
"commission_pct" "double" false ""
"department_id" "int" false ""
"email" "text" false ""
"first_name" "text" false ""
"last_name" "text" false ""
cols
is a table with these variables:
Name
— Cassandra database column nameDataType
— Cassandra Query Language (CQL) data type of the Cassandra database columnPartitionKey
— Partition key indicatorClusteringColumn
— Clustering column indicator
The value in the PartitionKey
variable indicates whether the database column is a partition key. The column job_id
(job identifier) is a partition key in this database table.
Import Data from Cassandra Database
Import data from the employees_by_job
database table into MATLAB. This database has data about shop clerks, so use the partition key value SH_CLERK
.
keyValue = "SH_CLERK"; results = partitionRead(conn,keyspace,tablename, ... keyValue);
Display the first few rows of the returned employee data.
head(results)
ans=8×13 table
job_id hire_date employee_id commission_pct department_id email first_name last_name manager_id office performance_ratings phone_number salary
building room
__________ ___________ ___________ ______________ _____________ __________ __________ __________ __________ ________________ ___________________ ______________ ______
"SH_CLERK" 03-Feb-2008 183 NaN 50 "GGEONI" "Girard" "Geoni" 120 "West" 121 {[ 2]} "650.507.9879" 2800
"SH_CLERK" 13-Jan-2008 199 NaN 50 "DGRANT" "Douglas" "Grant" 124 "West" 139 {3×1 int32} "650.507.9844" 2600
"SH_CLERK" 19-Dec-2007 191 NaN 50 "RPERKINS" "Randall" "Perkins" 122 "West" 255 {[ 2]} "650.505.4876" 2500
"SH_CLERK" 21-Jun-2007 182 NaN 50 "MSULLIVA" "Martha" "Sullivan" 120 "East" 112 {2×1 int32} "650.507.9878" 2500
"SH_CLERK" 21-Jun-2007 198 NaN 50 "DOCONNEL" "Donald" "OConnell" 124 "East" 125 {3×1 int32} "650.507.9833" 2600
"SH_CLERK" 17-Mar-2007 195 NaN 50 "VJONES" "Vance" "Jones" 123 "West" 236 {5×1 int32} "650.501.4876" 2800
"SH_CLERK" 07-Feb-2007 187 NaN 50 "ACABRIO" "Anthony" "Cabrio" 121 "West" 345 {3×1 int32} "650.509.4876" 3000
"SH_CLERK" 11-Jul-2006 190 NaN 50 "TGATES" "Timothy" "Gates" 122 "East" 297 {3×1 int32} "650.505.3876" 2900
results
is a table that contains these variables:
job_id
— Job identifierhire_date
— Hire dateemployee_id
— Employee identifiercommission_pct
— Commission percentagedepartment_id
— Department identifieremail
— Email addressfirst_name
— First namelast_name
— Last namemanager_id
— Manager identifieroffice
— Office location (table that contains two variables for the building and room)performance_ratings
— Performance ratingsphone_number
— Phone numbersalary
— Salary
Close Cassandra Database Connection
close(conn)
See Also
apacheCassandra
| tablenames
| columninfo
| partitionRead
| close
Related Topics
- Import Data from Cassandra Database Table Using CQL
- Fill Values for Missing Data from Database Using Apache Cassandra Database C++ Interface