Forum: VoltDB Architecture

Post: How does partitioning work

How does partitioning work
Nitika
Jan 23, 2011
How does voltdb find out that a transaction is for a particular partition?
Single partition procedures
rbetts
Jan 23, 2011
Nitika,


Please see:


DesignProc#designappsinglesitesp


There is some additional information on writing procedures here:


UsingVoltDB/DesignProc


In brief, when you define the procedure, you include metadata that identifies a partitioning parameter for the procedure. For example, if your database has a "flights" table partitioned on flights.flightId, your procedure might require a flightId parameter. You would identify flightId as the procedure's partitioning parameter. VoltDB would then route execution of an invocation of this procedure to the partition corresponding to the supplied flightId value.


Thank you,
Ryan
choice of partition
Nitika
Feb 6, 2011
Ryan, Thanks for your reply.
This explains how application can instruct that stored procedure is for single partition.
What I want to know more is that incase a table is partitioned into 10 different partitions, now when we call a procedure with particular value
how it knows that it has to go which partition out of the 10 partitions.
1. Does SQL Engine evaluates at execution time or it is the job of optimizer?
2. How this logic works?
Dispatch of single partition procedures to partitions
rbetts
Feb 6, 2011
Ryan, Thanks for your reply.
This explains how application can instruct that stored procedure is for single partition.
What I want to know more is that incase a table is partitioned into 10 different partitions, now when we call a procedure with particular value
how it knows that it has to go which partition out of the 10 partitions.
1. Does SQL Engine evaluates at execution time or it is the job of optimizer?
2. How this logic works?


Going back to my summary example:


If your database has a "flights" table partitioned on flights.flightId, your procedure might require a flightId parameter. You would identify flightId as the procedure's partitioning parameter. VoltDB would then route execution of an invocation of this procedure to the partition corresponding to the supplied flightId value.


The procedure identifies which of its parameters is the partitioning function input. In this example, let's say flightId is the first parameter to the stored procedure. F(flightId) = partition_id.


When the procedure invocation arrives at the cluster, the cluster extracts the first parameter (flightId) from the stored procedure invocation. It calculates F(flightId) and routes the procedure to the replicas of the resulting partition_id.


Is that clearer?
Ryan.
Getting into more internals
Nitika
Feb 13, 2011
I understood that there is function to which value of partitioning parameter is passed and you get the partition on which stored procedure needs to run.


How does this function calculate?
1. Is there a system table which maintains the lower and upper bounds of each partition tables?
2. If yes, how these bounds are decided, does it get decided when the table gets created?
Sorry, I misunderstood your initial question.
rbetts
Feb 13, 2011
I understood that there is function to which value of partitioning parameter is passed and you get the partition on which stored procedure needs to run.


How does this function calculate?
1. Is there a system table which maintains the lower and upper bounds of each partition tables?
2. If yes, how these bounds are decided, does it get decided when the table gets created?


Currently, the mapping function is a simple hash.


You can find the code here:
https://source.voltdb.com/browse/Engineering/trunk/src/frontend/org/voltdb/TheHashinator.java?hb=true


==
EDIT: We've since switched to GitHub. New link: https://github.com/VoltDB/voltdb/blob/master/src/frontend/org/voltdb/TheHashinator.java
functioning of '>' operator
Nitika
Feb 25, 2011
Currently, the mapping function is a simple hash.


You can find the code here:
https://source.voltdb.com/browse/Engineering/trunk/src/frontend/org/voltdb/TheHashinator.java?hb=true


==
EDIT: We've since switched to GitHub. New link: https://github.com/VoltDB/voltdb/blob/master/src/frontend/org/voltdb/TheHashinator.java


1. Understood that data will be distributed in partitions for all inserts if there in procedure.
2. Now queries (Select statements) in Java procedure with '=' in where clause for Partition key can also select the particular partition.

Doubts
------------
1. How the partition selection decision is done for queries which use operaters like '>' or '<'. For '>' the data can span across partitions, so will the query has to span across multiple partitions?

Amit Kapila
re: How does partitioning work
tcallaghan
Feb 25, 2011
1. Understood that data will be distributed in partitions for all inserts if there in procedure.
2. Now queries (Select statements) in Java procedure with '=' in where clause for Partition key can also select the particular partition.

Doubts
------------
1. How the partition selection decision is done for queries which use operaters like '>' or '<'. For '>' the data can span across partitions, so will the query has to span across multiple partitions?

Amit Kapila


Amit,


There is a good description of single-partition and multi-partition stored procedures in our online documentation at DesignProc#designappsinglesitesp


Your SQL will determine if a stored procedure can be single-partition. When SELECTing from a partitioned table, if your SQL doesn't have an exact equality check in the WHERE clause ("... where partition_column = ? ...") VoltDB must check all partitions in the system. Therefore the '>' and '<' operators on partitioned tables will result in multi-partition stored procedures.


-Tim
guotong1988
Sep 28, 2015
Amit,


There is a good description of single-partition and multi-partition stored procedures in our online documentation at DesignProc#designappsinglesitesp


Your SQL will determine if a stored procedure can be single-partition. When SELECTing from a partitioned table, if your SQL doesn't have an exact equality check in the WHERE clause ("... where partition_column = ? ...") VoltDB must check all partitions in the system. Therefore the '>' and '<' operators on partitioned tables will result in multi-partition stored procedures.


-Tim



So all the other SQL may be not that fast in large concurrent situation ? for example :group by ? order by ? join ?

Thank you .