Main Content

Import Large Data from MongoDB Using MongoDB C++ Interface

This example shows how to import a large set of flight data from a MongoDB® collection into the MATLAB® workspace using the MongoDB C++ interface. To avoid out-of-memory issues when retrieving many documents, use a loop to import large data in batches.

Create MongoDB C++ Interface Connection

Create a MongoDB connection to the database mongotest using the MongoDB C++ interface. Here, the database server dbtb01 hosts this database using port number 27017.

server = "dbtb01";
port = 27017;
dbname = "mongotest";
conn = mongoc(server,port,dbname)
conn = connection with properties:
           Database: "mongotest"
           UserName: ""
             Server: "dbtb01"
               Port: 27017
    CollectionNames: [14×1 string]

conn is the connection object that contains the MongoDB connection. The object properties contain information about the connection and the database.

  • The database name is mongotest.

  • The user name is blank.

  • The database server is dbtb01.

  • The port number is 27017.

  • This database contains 14 document collections.

Verify the MongoDB connection.

isopen(conn)
ans = logical
   1

The database connection is successful because the isopen function returns 1. Otherwise, the database connection is closed.

Determine Number of Documents to Import

Find the total number of documents, specified as totaldocs, in the airlinesmall collection for the years 1997 through 2010. Use a MongoDB query to filter the flight data for the specified years.

collection = "airlinesmall";
mongoquery = "{""Year"":{""$gte"":1997,""$lte"":2010}}";
totaldocs = count(conn,collection,Query=mongoquery);

Retrieve Large Data in Batches

Estimate the batch size to be 15,000 documents. Define the MATLAB workspace variable for storing the retrieved data.

batchsize = 15000;
flightdata = [];

You can change the batch size depending on the performance and memory capacity of your system.

Use a while loop to retrieve flight data from the collection. The variable flightdata accumulates each batch of retrieved data.

% Track number of documents read
index = 0;

while index < totaldocs
    
    % Retrieve documents in a batch
    localdata = find(conn,collection,Query=mongoquery, ...
        Skip=index,Limit=batchsize);
    
    % Store retrieved documents locally
    flightdata = [flightdata; localdata];
    
    % Move to the next batch
    index = index + batchsize;
    
end

Display information about the flightdata variable. The retrieved data is a structure array that contains 75,603 structures. Each structure contains 30 fields of flight data.

whos flightdata
  Name                Size                Bytes  Class     Attributes

  flightdata      75603x1             248848692  struct              

Close MongoDB C++ Interface Connection

close(conn)

See Also

| | | |

External Websites