Forum: Building VoltDB Clients

Post: query failing complaining of temp table memory

query failing complaining of temp table memory
Apr 30, 2012
Am trying to port an existing mysql query to voltdb:
count(distinct A.col)
join A1 on (A1.col1 = A.col1)
join A2 on (A2.col2 = A1.col2)
join A3 on (A3.col3 = A2.col3)
A.col4 = 'something' and
A.col5 between A3.start_date and A3.end_date
group by
This runs in mysql with reasonable performance and 'explain'.
When run in voltdb, I get this error:
More than 100 MB of temp table memory used while executing SQL. Aborting.
Data and where clause are identical in both test environments.
A related question: if I take the advice of breaking up the query, what sort of performance hit is there taking a query and making two (or more) queries from it, feeding results of each to the next? Does breaking them up in volt result in a performance penalty as some other db's would?
First, you need to include
May 3, 2012
First, you need to include the GROUP BY columns in the SELECT statement.
If two of these tables are partitioned and share a partition key, they can be joined on that partitioning key, otherwise they may need to be replicated tables:
Another consideration when encountering this error with a join is if you can reduce the number of records that the execution engine must store in temp tables by specifying a more optimal join order. For example, if A is the largest table, and the predicate A.col4 = "something" is still resulting in >100MB worth of records, but you had additional criteria on another table such as A3.col6 = "foo", then you could specify a join order that puts A3 first, and then only the "foo" records would be joined, and when you get to the large A table, the resulting join may fall under the 100MB limit such that you could get a result. Specifying join order is done in the declaration of a SQLStmt object, as shown in the Performance Guide, Chapter 3:
This query does bring up a general challenge that may seem counterintuitive. While the result is going to be small because of the aggregation, the database still needs to scan many records in order to determine the result. The same is true if you were to simplify this to SELECT COUNT(*) FROM A, where A contains >100MB. A pure COUNT can be defined as a view, and then you can query the view to avoid scanning the entire table, but for a COUNT(DISTINCT...), a VIEW will not work. In the end, when all else fails, it may be necessary to break up the query into smaller pieces which can be evaluated with <100MB temp tables. For example, first query SELECT DISTINCT col3 FROM A3, then iterate through each value and perform the same join but where A3.col3 = ?. The results can be returned to the client as separate VoltTable objects in an array, or you could combine them into a new VoltTable in java before returning the result of the stored procedure.