Forum: VoltDB Architecture

Post: Two quick questions for a total newbie: GROUP BY and Out of Memory

Two quick questions for a total newbie: GROUP BY and Out of Memory
YourLocalNerd
Nov 14, 2014
Hey guys, first post in the forum and I really like VoltDB so far.

Two quick questions I have regarding how VoltDB works:

How exactly does GROUP BY work in VoltDB? Does it have anything to do with table partitions? Does it do some sort of clusterwide search for table partitions?
I got really confused just trying to think this one out and couldn't make heads or tails out of the source code on Github. :/

Other question, what happens to VoltDB if the server runs out of memory? Does it write on another node? What if the entire cluster runs out of memory? Does it shutdown after a quick snapshot and command logging? Or does it stay suspended until memory is freed, as is the case with some other DBMS, like MySQL does with disk space?

Thanks for your answers in advance!
rmorgenstein
Nov 18, 2014
Answering in reverse order:

If a VoltDB process exhausts memory, then Linux generally steps in and starts killing processes ( these days I like to think of it as the OOM Fairy). If this happens to 1 machine (like if that one was underprovisioned) - you can expand memory or delete some data - then rejoin that server. If the whole database crashes down you have the full ability to recover from command logs (unless you turned them off) and nothing is lost.

We recommend that you do some capacity planning ahead of time and then use monitoring/alerting software to prevent this from happening.

See the VoltDB Planning Guide for sizing information and also the online sizing worksheet on http://<yourserver>:8080. For more information about Monitoring and recovery from failures, see the VoltDB Admin Guide.

----------

GROUP BY works on the table like it does in any database. Partitions come into play in the planning if the partition key can aid in the grouping. Here's an example. Take a look at examples/voter/ddl.sql. The VOTES table is partitioned on phone_number. If I run a query that has GROUP BY phone_number, then grouping happens on each partition, with the coordinator just assembling the final result.

3> explain select phone_number, count(*) from votes group by phone_number;
EXECUTION_PLAN
-------------------------------
RETURN RESULTS TO STORED PROCEDURE
RECEIVE FROM ALL PARTITIONS
SEND PARTITION RESULTS TO COORDINATOR
SEQUENTIAL SCAN of "VOTES"
inline Hash AGGREGATION ops: COUNT(

Now try the same thing grouping by state. Each partition has to send the count of rows to the coordinator, which then as to do a final summation from each partition before assembling the final result.


4> explain select state, count(*) from votes group by state;
EXECUTION_PLAN
-------------------------------
RETURN RESULTS TO STORED PROCEDURE
Hash AGGREGATION ops: SUM()
RECEIVE FROM ALL PARTITIONS
SEND PARTITION RESULTS TO COORDINATOR
SEQUENTIAL SCAN of "VOTES"
inline Hash AGGREGATION ops: COUNT(