Forum: Building VoltDB Applications

Post: Stored procedure cross-partition results

Stored procedure cross-partition results
pmilner
Jun 16, 2014
Hi,
I have a problem getting correct data from a Stored Procedure, using VoltDB 4.0 0-gab8313d-local on 64-bit Linux.
I can't give you the real VoltDB configs because they can't be published, but they boil down to the following.
The relevant table is:

CREATE TABLE TABLE1 (
DEVICE VARCHAR(23) NOT NULL,
USER VARCHAR(23) NOT NULL);

The table is configured with:
<partition table="TABLE1" column="DEVICE"/>

I then have a stored procedure which executes the following query:

SELECT COUNT(DEVICE) FROM TABLE1 WHERE DEVICE=? AND USER=?
having taken 2 String params.
The procedure is declared with:
PARTITION PROCEDURE MyProcedure ON TABLE TABLE1 COLUMN DEVICE;

I then add some data:
INSERT INTO TABLE1 VALUES('d1','u1');
INSERT INTO TABLE1 VALUES('d2','u1');

When I put this in a partitioned VoltDB cluster of 3 servers with K-Safety=0 and run the stored procedures as follows:
exec MyProcedure 'd1' 'u1';
exec MyProcedure 'd2' 'u1';
I get a result of 0 for one and 1 for the other.

Each of those rows has been put into a different partition (verified with "exec @Statistics TABLE 0"), which is fine, but I don't understand why I get a different result, given the query that it's running.
If I change the data to force both rows to be in the same partition, I get result of 1 for both, which is what I would expect, but in a cluster it seems to behave differently.

Sorry I can't give you the real data but does this suggest anything to anyone? Am I missing some subtlety about querying over partitioned databases?
Thanks in advance,
Paul
pzhao
Jun 16, 2014
Paul,

Thanks for your question. I've engaged an engineer for some help on this issue. We are investigating and will get back to you shortly.
As for 'are you missing anything?', you are definitely not.

-Peter Zhao
pmartel
Jun 16, 2014
I am having trouble reproducing this.
Here's the DDL-centered try I made:

CREATE TABLE table1 (
DEVICE VARCHAR(23) NOT NULL,
USERNAME VARCHAR(23) NOT NULL,
);

PARTITION TABLE table1 ON COLUMN DEVICE;


CREATE PROCEDURE DO AS SELECT COUNT(DEVICE) FROM table1 WHERE DEVICE=? and USERNAME=?;

PARTITION PROCEDURE DO ON TABLE table1 COLUMN DEVICE;

I got a volt compile error when I tried to use the SQL function name USER as a column name.
Does the behavior correct itself when you rename the USER column to avoid the SQL keyword?
Here's what I tried in sqlcmd:

sqlcmd
SQL Command :: localhost:21212
1> insert into table1 values ('d1', 'u1');


(1 row(s) affected)
2> insert into table1 values ('d2', 'u1');


(1 row(s) affected)
3> exec DO 'd2' 'u1';
C1
---
1



(1 row(s) affected)
4> exec DO 'd1' 'u1';
C1
---
1



(1 row(s) affected)
pmilner
Jun 17, 2014
Hi
thanks for your replies. The "USER" column name is not actually what I had in my schema, I was just trying to avoid posting the real DDL.
Did you try to run this in a cluster, and confirm that the 2 rows were in different partitions?
Also I notice you created the SP using "CREATE PROCEDURE AS..." - mine is created as a Java class and precompiled as part of a catalogue JAR...
Sorry, I didn't mention that before, I don't know if it makes a difference to how SPs work. The source code for it would be:

[...]
public class MyProcedure extends VoltProcedure {

    public final SQLStmt selectSQL = new SQLStmt("SELECT COUNT(DEVICE) FROM TABLE1 WHERE DEVICE=? AND USERNAME=?");
    public VoltTable[] run(String user, String device) throws VoltAbortException {
        voltQueueSQL(selectSQL,device,user);
        VoltTable[] selectedData = voltExecuteSQL(true);
        return selectedData;
    }
}


One thing I've just realized is that my params to the run() method are "user,device", but the SELECT params are "DEVICE" then "USERNAME".
Could I be running into the note in section 3.24 "Partitioning Stored Procedures" on this page which states that
"the PARTITION PROCEDURE statement assumes that the partitioning column value is the first parameter to the procedure"

"DEVICE" is my partitioning column. I will rearrange my params to adhere to that. I had assumed this refers to the ordering of params to the SQL but maybe that's a wrong assumption.

Any thoughts? Are there any Java SP configurations, hints or annotations I should be adding?
Best regards
Paul
pmilner
Jun 17, 2014
I've just rejigged things to take account of the thing mentioned above - that the partitioning column needs to be the FIRST param to the stored procedure (unless you specifically say otherwise in the PARTITION PROCEDURE statement). The problem has now gone away. Thanks repliers for highlighting this area and prompting a review of the Java stored procedure.
pzhao
Jun 17, 2014
Paul,

Thats great! Just to clarify, the default partitioning column value is the first parameter. However, it does not need to be the first parameter.

So in your original code,
...
public VoltTable[] run(String user, String device) throws VoltAbortException {
...

the partitioning column is user. Your PARTITION PROCEDURE statement in your DDL should look like the following (as taken from other Paul's code).
PARTITION PROCEDURE DO ON TABLE table1 COLUMN DEVICE;
OR
PARTITION PROCEDURE DO ON TABLE table1 COLUMN DEVICE PARAMETER 0;


In order to define the partitioning column other then the first parameter in your stored procedure,
PARTITION PROCEDURE DO ON TABLE table1 COLUMN DEVICE PARAMETER 1;

will allow for your original code to work properly. Just a reminder, the index used in the PARTITION PROCEDURE is zero-based index.

If you need further assistance, please let us know.

-Peter Zhao