How do I efficiently make a large SQL query into Matlab?

Question

1 vote

I have a SQL table that contains data that I need for analysis which is about 60 million records and 5 fields. What is the best way to get that into Matlab variables or a single variable. I have plenty of RAM (64GB) but I find that the import is taking a very long time and often encounters errors.

Is the quickest way to import all date into a single cell? a single table?
With a single pull allowing 'FetchInBatches' to be set to yes to avoid JVM errors?
Should I loop through sections of the table and import it in batches that way?
Should I loop and do each field individually so that I can download to the right data type?

Thanks in advance for your help, Brian

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Sign in to answer this question.

Sign in to follow activity

Answer 1

Rahul Goel on 29 Oct 2015

Open in MATLAB Online

1 vote

Hi Brian,

It will help the community assist you better if you could share more details about the setup you are using. For example, the database and the driver, also bitness: 32 or 64-bit(in some cases).

Following are some observations with MySQL database using JDBC driver:

MySQL JDBC driver by default retrieves the complete 'ResultSet' and stores it in the memory. This can lead to the heap space issues if you are working with the 'ResultSets' that have large number of rows or large values which cannot be allocated a heap space in your JVM for the memory required.

One of the workaround to this limitation is to reconstruct the SQL query to include 'LIMIT' and 'OFFSET' constructs.

The LIMIT clause is used to limit the number of results returned in a SQL statement. So if you have 1000 rows in a table, but only want to return the first 10, you would do something like this:

SELECT column FROM table
LIMIT 10

Now suppose you wanted to show results 11-20. With the OFFSET keyword its just as easy, the following query will do:

SELECT column FROM table LIMIT 10 OFFSET 10

Use the following MATLAB code which makes use of these SQL constructs:

>> LIMIT = <limit>; %Change the LIMIT to select the one with better response time
>> for fetchCount = 1: N
        conn = database(instance,username,password,Name,Value);
        query = ['select * from <table > ' int2str(LIMIT) ' OFFSET ' int2str((fetchCount-1) * LIMIT)];
        curs = exec(conn, query);
        rowlimit = <row_limit>; %Change rowlimit to select the one with a better response time
        while ~strcmp(curs.Data, 'No Data')
            curs = fetch(curs,rowlimit);
            data = curs.Data;
        end
        close(conn);
   end

As mentioned in the comments from the above code, please select the limit and row_limit which gives a better performance. The values for limit and row_limit follows the bell curve trend and there exists the optimal value which gives better performance. For instance, you can use multiples of 10^N depending on the number of rows.

Hope this helps.

1 Comment
Show -1 older comments Hide -1 older comments

KAE on 7 Apr 2021

Edited: KAE on 7 Apr 2021

Open in MATLAB Online

Should the query be

query = ['SELECT * from <table > LIMIT ' int2str(LIMIT) ' OFFSET ' int2str((fetchCount-1) * LIMIT)];   

How would a WHERE clause be added?

Sign in to comment.

How do I efficiently make a large SQL query into Matlab?

0 Comments
Show -2 older comments Hide -2 older comments

Answers (1)

1 Comment
Show -1 older comments Hide -1 older comments

Categories

Products

Tags

Community Treasure Hunt

How do I efficiently make a large SQL query into Matlab?

0 Comments Show -2 older comments Hide -2 older comments

Answers (1)

1 Comment Show -1 older comments Hide -1 older comments

Categories

Products

Tags

See Also

Community Treasure Hunt

0 Comments
Show -2 older comments Hide -2 older comments

1 Comment
Show -1 older comments Hide -1 older comments