Forum: VoltDB Architecture

Post: Partition access order

Partition access order
DerekPK
Dec 6, 2013
I am looking at this tutorial: http://voltdb.com/docs/tutorial/Part1.php

I got as far as tutorial 4, so I have a db with partitions. I performed the following query.

select * from people limit 5;
STATE_NUM COUNTY_NUM COUNTY POPULATION
---------- ----------- --------------- -----------
1 1 Autauga County 54571
1 3 Baldwin County 182265
1 5 Barbour County 27457
1 7 Bibb County 22915
1 9 Blount County 57322

(5 row(s) affected)


Then ran the same query again and got the following results.
4> select * from people limit 5;

STATE_NUM COUNTY_NUM COUNTY POPULATION
---------- ----------- --------------------------- -----------
2 13 Aleutians East Borough 3141
2 16 Aleutians West Census Area 5561
2 20 Anchorage Municipality 291826
2 50 Bethel Census Area 17013
2 60 Bristol Bay Borough 997

(5 row(s) affected)


My question is about the order that the data is accessed.
How is the decision made about which partition to run the query on?
Is it that both partitions are queried and the first to finish returns the data?

Derek.
xin
Dec 6, 2013
Hi DerekPK,

Generally speaking, if table "people" is partitioned, VOLTDB will plan your query as a multiple() partitions query. Basically, VOLTDB will scan 5 rows from each partition on table "people", send them to the coordinator node and return only 5 rows as the query result.

Next time if you want to understand how your query is being executed by VOLTDB, you can use system procedure "EXPLAIN" for AdHoc query or "EXPLAINPRO" for stored procedure. Samplex results of "EXPLAIN" for your query may be:

LIMIT with parameter
RECEIVE FROM ALL PARTITIONS
SEND PARTITION RESULTS TO COORDINATOR
SEQUENTIAL SCAN of "P1"
inline LIMIT with parameter

Please feel free to reply if you have more questions.

Thanks,
Xin
DerekPK
Dec 6, 2013
Thanks for the reply.

So is it the coordinator node that determines which set of five to return, I am curious because the same results are returned each time the query runs.
I suppose i'm wondering is there any way to predict which partition the results will be returned from OR is is just down to the internal workings of the coordinator node?


SELECT TOP 5 county from people;
COUNTY
---------------------------
Aleutians East Borough
Aleutians West Census Area
Anchorage Municipality
Bethel Census Area
Bristol Bay Borough

(5 row(s) affected)

SELECT TOP 5 county from people;
COUNTY
---------------
Autauga County
Baldwin County
Barbour County
Bibb County
Blount County

(5 row(s) affected)


Produces similar results:

Derek.
xin
Dec 6, 2013
Your two new queries produces similar results, but they are completely different.

As table "people" is partitioned, VOLTDB have no idea to return 5 rows from a single partition. So for AdHoc query, it is not possible to predicate which partition the results will be returned from. But if you have a filter like "WHERE people.partitionColumn = XX", this kind of filter will make this query to be single partition query and it will only query one partition.

If you are using stored procedure and it is single partitioned, your original query in that single partition stored procedure will return results from only one partition, even without filters on the partition column.

In order to get the same results for your query, adding ORDER BY "County" into your query will help, but will slow down your query performance.
DerekPK
Dec 6, 2013
Thanks for the speedy response.

That what I thought you would say, I just wanted to be sure:-)

Derek.