Forum: VoltDB Architecture

Post: Dos and Don'ts

Dos and Don'ts
steves
May 4, 2011
From the FAQ it says


"Do: Use multiple SQL queries in your stored procedures. Two SQL queries in one single*partition procedure can be 10x faster than two procedures with one query each."


I have a situation where UserID will be used to partition user responses within a survey. When I need to retrieve many different user's responses (i.e. response where UserID = 1, response when UserID =2 etc., is this still considered a single partition procedure?


I guess my confusion is on the definition of single partition procedure. I am certain that the data would reside on different nodes since it's partitioned by UserID.


Thanks.
Steve.
Single Part IIF: [...] WHERE PartitionKey = ? [...]
sebc
May 4, 2011
Hey Steve,


You're right to have a doubt. A statement can only be considered "single-partitioned" if your WHERE clause looks like "WHERE PartitionKey = ? [AND/OR...]"


And the partition selected comes from your procedure run call, which means you couldn't do something like the following:

package com.procedures;


import org.voltdb.*;


// This will not work: while each statement acts on a single partition, the procedure
// at large *may* act on more than one => it is not single-partitioned
@ProcInfo(
    partitionInfo = "MyTable.UserID: 0",
    singlePartition = true
)


public class GetMany extends VoltProcedure
{
    public final SQLStmt selectOne = new SQLStmt("SELECT * FROM MyTable WHERE UserID = ?;");


    public long run(int[] userIDs)
    {
        for(int i = 0; i < userIDs.length; i++)
            voltQueueSQL(selectOne, userIDs);
        return voltExecuteSQL(true);
    }
}
This will unfortunately not work (even if it was possible for you to pass an array as parameter 0 to be used for the partitioning, which the VoltCompiler will reject anyways): while each individual call will run into a specific partition, the batch of them might require multiple partitions (suppose UserID=1 is in partition 1 and UserID=2 is in partition 2, for instance). Therefore the transaction (procedure) at large must be multi-partitioned. Thus the procedure definition above would work but MUST be marked as @ProcInfo( singlePartition = false ); You'll notice that, because we don't support the "IN" statement, you'll also have to resort to the "batching" pattern above to perform the equivalent (there are other possibilities that might be more suited to your specific case, of course). Multi-partition procedures are slower and you want to limit their usage as much as possible, of course. That said, it is all a matter of call volume: if you do expect to be making those calls to retrieve multiple user records often, you might want to extract that batching/looping logic outside of the procedure and stick to:
package com.procedures;


import org.voltdb.*;


@ProcInfo(
    partitionInfo = "MyTable.UserID: 0",
    singlePartition = true
)


public class GetOne extends VoltProcedure
{
    public final SQLStmt selectOne = new SQLStmt("SELECT * FROM MyTable WHERE UserID = ?;");


    public long run(int userID)
    {
        voltQueueSQL(selectOne, userID);
        return voltExecuteSQL(true);
    }
}


And then call this repeatedly from your client API with the list of UserIDs to process. Here, each call will be single-partitioned. Retrieving the data for a given list will be slower because of all the roundtrips, but your overall DB performance might be greatly improved because all the load will not be single-partitioned.


There is no "hard" rule to decide which pattern to choose, but a good rule of thumb is to consider that anything you can qualify as being an "occasional" call is likely to be an okay candidate for multi-partition implementation.


Hope this clarifies things!


Seb
data lookup patterns
steves
May 5, 2011
Thanks Seb,
It makes sense.
One clarification though-- in the second pattern you have:


@ProcInfo(
partitionInfo = "MyTable.UserID: 0",
singlePartition = true
)


With MyTable.UserID: 0 --- the Zero here is the "partition column" not a partition # or node #, correct?
I would not want to track where the data actually resides in order to find it!


Thanks.
Steve.
MyTable.UserID: 0
sebc
May 5, 2011
Thanks Seb,
It makes sense.
One clarification though-- in the second pattern you have:


@ProcInfo(
partitionInfo = "MyTable.UserID: 0",
singlePartition = true
)


With MyTable.UserID: 0 --- the Zero here is the "partition column" not a partition # or node #, correct?
I would not want to track where the data actually resides in order to find it!


Thanks.
Steve.


Hey Steve,


That "0" is the index of the run method parameter that will serve to decide which partition to go to for execution.
Okay - I don't think I can properly explain that with words, so let's use examples:


Assume:


CREATE TABLE MyTable
(
  UserID int NOT NULL
, ...
, CONSTRAINT PK_MyTable PRIMARY KEY
  (
    UserID
  )
);
With MyTable partitioned by UserID.


The following procedures shells are all valid:


@ProcInfo ( singlePartition=true, partitionInfo = "MyTable.UserID: 0" )
public class MyProc extends VoltProcedure
{
  public run(int userID, int otherParam1, ... int otherParamN)
  {
    ...
  }
}
@ProcInfo ( singlePartition=true, partitionInfo = "MyTable.UserID: 1" )
public class MyProc extends VoltProcedure
{
  public run(String someParam, int userID, int otherParam2, ... int otherParamN)
  {
    ...
  }
}
@ProcInfo ( singlePartition=true, partitionInfo = "MyTable.UserID: 2" )
public class MyProc extends VoltProcedure
{
  public run(String someParam0, long someParam1, int userID, int otherParam3, ... int otherParamN)
  {
    ...
  }
}
As you see, I am not even mentioning what SQL statements I'm going to use in that procedure: that's to emphasize that the index defined in the partitionInfo refers to the position of the variable in the run signature that corresponds to the field being partitioned.


You pass the actual value you want, and internally, Volt applies its hashing algorithm on that value to figure out which partition the record with that value belongs to.


I honestly don't know the exact hashing algorithm, but suppose for a number it is a simple modulo operation, and for a specific deployment you have a total of 3 partitions in your database. Then you get the following:


PartitionSelected = UserID % 3


But as you see, YOU do not select the specific partition (that would not be very helpful when writing your client app): this is managed for you by the server and allows you to benefit from partitioning while having a client application that's totally agnostic to that fact. The only thing you need is to tell VoltDB, through @ProcInfo, which run(...) parameter will be used for this hashing.


Hope that makes sense?


Seb
Dos and Don'ts
steves
May 5, 2011
Perfect. That's what I was hoping to hear.
Thanks Seb.