Forum: Building VoltDB Applications

Post: Avoiding copy/paste in stored procedures?

Avoiding copy/paste in stored procedures?
phealy
Mar 21, 2014
Any suggestions to avoid copying SQLStmt's and helper methods into every stored procedure which needs to use them?

This is a problem with half a dozen procedures but becomes unmanageable when we have dozens of them.

We need to be able to update these procedures on-the-fly.

Thanks,

Paul
bballard
Mar 21, 2014
Hi Paul,

There are two ways you can re-use SQLStmt objects.

One is you can have a "Base" Stored Procedure that extends VoltProcedure and defines a number of reusable SQLStmt objects, and then has a run() method that does nothing. Your other stored procedures can then extent that base class rather than VoltProcedure, and they will inherit the SQLStmt objects. You will still need to declare this procedure in the DDL so that it is included in the catalog.

The second method is to define the reusable SQLStmt objects in a regular class that is not a stored procedure and does not extend VoltProcedure. You can then import that class in your stored procedure and reference the SQLStmt objects. In this case, you still need to register the class in the DDL so that it is included in the catalog file, but you would use the following syntax:

IMPORT CLASS org.mycompany.ClassName;

For more on the IMPORT CLASS syntax, see Using VoltDB, App B - IMPORT CLASS: https://voltdb.com/docs/UsingVoltDB/ddlref_importclass.php

Ben
phealy
Mar 25, 2014
Thanks Ben - will probably try a mixture of both of these.

For reference here is a link to an older conversation about this topic: https://forum.voltdb.com/showthread.php?383-Helper-functions

Paul
bballard
Mar 25, 2014
Thanks for sharing that link, Paul. That is from a while ago when we did not have some features such as the IMPORT CLASS statement, and the only way to make java code that was outside of a stored procedure accessible was to include it within a "dummy" stored procedure.

A mixture of these techniques can certainly be used, as you see appropriate. Basically, if you want to reference java code in a stored procedure it needs to be available within the catalog.jar file. The IMPORT CLASS statement will take any class in the classpath and include it in the catalog file when it is compiled. The CREATE PROCEDURE FROM CLASS... statement will also do that, but it expects it to be a class that extends VoltProcedure.
phealy
Apr 1, 2014
I think there is a missing '_' from the wildcardmatcher used to pick out class names on the import class statement (this works without the underscore in the class name):

$ cat import.sql
create table t (i INTEGER);

IMPORT CLASS i_mported;
$ cat i_mported.java
public class i_mported {
public int f() { return 1; }
}
$ javac i_mported.java
$ voltdb compile --classpath=. -o imported.jar import.sql
ERROR: [null]: Invalid IMPORT CLASS match expression: 'i_mported'
ERROR: [null]: VoltDB DDL Error: "Invalid IMPORT CLASS match expression: 'i_mported'" in statement starting on lineno: 3
ERROR: Catalog compilation failed.
------------------------------------------
Project compilation failed. See log for errors.
------------------------------------------
$


Workaround is:
$ cat import.sql
create table t (i INTEGER);

IMPORT CLASS i*mported;
$ voltdb compile --classpath=. -o imported.jar import.sql
------------------------------------------
Successfully created imported.jar
Includes schema: import.sql

[MP][WRITE] T.insert
INSERT INTO T VALUES (?);

------------------------------------------

Added the following additional classes to the catalog jar:
i_mported

... etc
bballard
Apr 1, 2014
Hi Paul,

Thanks for bringing this to our attention. Can you use a broader wildcard as a workaround? I'll have our engineering team review this as a potential bug in the wildcard matcher.

Thanks,
Ben
phealy
Apr 1, 2014
Thanks Ben - still working on the IMPORT CLASS approach, and am getting this error now - its an unexpected error with a voltQueueSQL from inside an IMPORT'ed CLASS (the sql works when moved/called from the outer VoltProcedure):

Exception in thread "main" org.voltdb.client.ProcCallException: VOLTDB ERROR: UNEXPECTED FAILURE:
java.lang.NullPointerException
at org.voltdb.ProcedureRunner.getCleanParams(ProcedureRunner.java:754)
at org.voltdb.ProcedureRunner.voltQueueSQL(ProcedureRunner.java:538)
at org.voltdb.ProcedureRunner.voltQueueSQL(ProcedureRunner.java:546)
at org.voltdb.VoltProcedure.voltQueueSQL(VoltProcedure.java:269)
at Test_rebalance.rebalance(Test_rebalance.java:71)
at Test_doRebalance.run(Test_doRebalance.java:22)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
...

There isn't a parameter to the call, which might be the problem - in sqlcmd it is:
1> SELECT entity, COUNT(entity) FROM mapping GROUP BY entity;
ENTITY C2
-------------- ---
A 4
B 2



(2 row(s) affected)
2>
bballard
Apr 1, 2014
Hi Paul,

The VoltProcedure.voltQueueSQL() method is an object method, not static, so when it is called within a normal stored procedure, it is equivalent to "this.voltQueueSQL(...);". From the imported class, even if it also extends VoltProcedure, it's not the same object, and probably would be lacking some initialization (and isn't the right object anyway) and therefore results in a NullPointerException.

If you need to call voltQueueSQL() from within the imported class, I think it needs a reference to the VoltProcedure object that is calling it.

Imported class method:

void doSomething(VoltProcedure p, String i, ...) {
p.voltQueueSQL(i,...);
...
}

Calling procedure:

a.doSomething(this,s,...);

The same would be the case for voltExecuteSQL() calls.
phealy
Apr 2, 2014
Hi Ben,

Thanks - I am handling the volt*() calls like that already.

Here is a minimum reproduction test case:

$ cat storedproc.sql
CREATE TABLE mapping
(remote_entity VARCHAR(255) NOT NULL
,local_entity VARCHAR(255) NOT NULL
,range_start INTEGER DEFAULT '0' NOT NULL
,range_stop INTEGER DEFAULT '0' NOT NULL
,PRIMARY KEY (remote_entity, local_entity)
);

IMPORT CLASS balance;
CREATE PROCEDURE FROM CLASS storedproc;
$ cat balance.java
import org.voltdb.*;
import org.voltdb.VoltProcedure.VoltAbortException;

public class balance {
public final SQLStmt countMapping = new SQLStmt(
"SELECT remote_entity, COUNT(remote_entity) " +
"FROM mapping "+
"GROUP BY remote_entity;");

public VoltTable[] rebalance(VoltProcedure vp) throws VoltAbortException {
vp.voltQueueSQL(countMapping);
return vp.voltExecuteSQL();
}
}
$ cat storedproc.java
import org.voltdb.*;

public class storedproc extends VoltProcedure {
public VoltTable[] run() throws VoltAbortException {
balance b = new balance();
return b.rebalance(this);
}
}
$ cat client.java
import org.voltdb.*;
import org.voltdb.client.*;

public class client {
public static org.voltdb.client.Client db = ClientFactory.createClient();

public static void main(String[] args) throws Exception {
db.createConnection("localhost");
db.callProcedure("storedproc");
}
}
$


Which throws this error:
$ java client
Exception in thread "main" org.voltdb.client.ProcCallException: VOLTDB ERROR: UNEXPECTED FAILURE:
java.lang.NullPointerException
at org.voltdb.ProcedureRunner.getCleanParams(ProcedureRunner.java:754)
at org.voltdb.ProcedureRunner.voltQueueSQL(ProcedureRunner.java:538)
at org.voltdb.ProcedureRunner.voltQueueSQL(ProcedureRunner.java:546)
at org.voltdb.VoltProcedure.voltQueueSQL(VoltProcedure.java:269)
at balance.rebalance(balance.java:11)
at storedproc.run(storedproc.java:6)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.voltdb.ProcedureRunner.call(ProcedureRunner.java:288)
at org.voltdb.iv2.ProcedureTask.processInitiateTask(ProcedureTask.java:112)
at org.voltdb.iv2.MpProcedureTask.run(MpProcedureTask.java:156)
at org.voltdb.iv2.MpRoSite.run(MpRoSite.java:269)
at org.voltcore.utils.CoreUtils$2$1.run(CoreUtils.java:290)
at java.lang.Thread.run(Thread.java:724)

at org.voltdb.client.ClientImpl.callProcedure(ClientImpl.java:252)
at org.voltdb.client.ClientImpl.callProcedureWithTimeout(ClientImpl.java:206)
at org.voltdb.client.ClientImpl.callProcedure(ClientImpl.java:187)
at client.main(client.java:9)
bballard
Apr 2, 2014
I'm able to reproduce this. I'll get back to you later today.
bballard
Apr 2, 2014
Hi Paul,

I tried some variations and found a few things. I'm going to revise what I wrote above on 3/21 about there being two ways to reuse SQLStmt objects, there is really only one way that works and is a good idea.

The best way to reuse code like this is to use a base class and extend it with a subclass. The subclass will inherit the SQLStmts from the base class. In the subclass, you use voltQueueSQL and voltExecuteSQL normally, no need to pass around instances of VoltProcedure. If you want to reuse code that queues or executes statements, you can put that code in a method in the base class, and call it from the subclass, when it's compiled, it's basically all in the same class so it's ok. The thing to be aware of even with this technique is that each stored procedure in VoltDB is partitioned (or not) and is also identified as read-only or read-write. There are optimizations that will take advantage of these properties. This is important because if your base class contained a collection of SQLStmt's that included writes, then any subclass would be marked as Read-Write, even if it only does Reads. So just be aware of this, and it might mean you want more than one base class.

The other way that worked, but is a very bad idea, was to make the SQLStmt's static, and import and reference them from another procedure. The problem with this is that now there is no connection between the statements and the procedures, so the compiler doesn't know if they are read-only or read-write, or which statements are going to be involved when a stored procedure is invoked. Basically, we should prevent that from working because it could lead to determinism issues or other potential messes.

So extending a base class is the way to go. A SQLStmt won't be compiled into the catalog if it isn't part of a stored procedure class, so that was one of the issues with the sample code you sent. What sort of code would you import rather than making it a real stored procedure - this could be math formulas, utilities, whatever else that isn't SQLStmt related.
DerekPK
Apr 28, 2014
What are your thoughts on the following?

I have multiple procedures that I need to reuse in one place.
For the purpose of this question I have created code to show my intentions.

I have two classes GetFromTableOne and GetFromTableTwo, both extend VoltProcedure.

The contents of both run methods from the above classes have been moved into new methods named extendedRun.

This will allow these classes to work as normal.

The new procedure that wants to reuse both of the existing procedures is shown below in the class GetFromTwoTables which extends from VoltProcedure.
In this class I create two new classes that extend the existing procs GetFromTableOne and GetFromTableTwo.
In these classes I created extendedRun methods that call the super class methods.

The code compiles, so syntactically it is fine, from a Volt point of view will it work?

Condensed and simplified sample code below:
//**************************************************************************************************
public class GetFromTableOne extends VoltProcedure
{
public final SQLStmt getFromTableOne = new SQLStmt("SELECT * from TableOne where personId = ?);

public VoltTable[] extendedRun(String personId) throws VoltAbortException
{
voltQueueSQL(getFromTableOne, personId);
VoltTable[] resultSet = voltExecuteSQL();

return voltExecuteSQL(true);
}

public VoltTable[] run(String personId) throws VoltAbortException
{
return extendedRun(personId);
}
}
//**************************************************************************************************
public class GetFromTableTwo extends VoltProcedure
{
public final SQLStmt getFromTableOne = new SQLStmt("SELECT * from TableTwo where personId = ?);

public VoltTable[] extendedRun(String personId) throws VoltAbortException
{
voltQueueSQL(getFromTableOne, personId);
VoltTable[] resultSet = voltExecuteSQL();

return voltExecuteSQL(true);
}

public VoltTable[] run(String personId) throws VoltAbortException
{
return extendedRun(personId);
}
}
//**************************************************************************************************
public class GetFromTwoTables extends VoltProcedure
{
public class ExtenedGetFromTableOne extends GetFromTableOne
{
public VoltTable[] extendedRun(String personId)
{
return super.extendedRun(personId);
}
}

public class ExtenedGetFromTableTwo extends GetFromTableTwo
{
public VoltTable[] extendedRun(String personId)
{
return super.extendedRun(personId);
}
}


public VoltTable[] run(String personId) throws VoltAbortException
{
ExtenedGetFromTableOne getFromTableOne = new ExtenedGetFromTableOne();
ExtenedGetFromTableTwo getFromTableTwo = new ExtenedGetFromTableTwo();

VoltTable[] getTableOneData = getFromTableOne.extendedRun(personId);
VoltTable[] getTableTwoData = getFromTableTwo.extendedRun(personId);

/*I will marge both results and then return the merged table*/
return BothTablesData;

}

}
//**************************************************************************************************
bballard
Apr 29, 2014
Hi Derek,

This seems like a lot of boilerplate java code to queue and execute 2 SQL statements, but I assume you have your reasons for that. I don't see any reason why it wouldn't work correctly because by including inner classes that extend the GetFromTableOne and GetFromTableTwo classes, the GetFromTwoTables class will include all of the SQLStmt's and therefore the voltdb compiler will build plans for them.

The only other observation I have is that this will queue and execute the statements for table one and two as separate batches. It would be faster if you queued both statements and then executed them together. That would also eliminate the need to merge the two resulting VoltTable[]'s. You might want to consider just using the inner classes to inherit the SQLStmt's, but do the queuing and executing in the run() method of GetFromTwoTables. Just a thought.

Ben
pmartel
Apr 29, 2014
In my opinion, while this might work, it could be made safer -- to more closely mimic the code path of the mainstream use cases.
Since there are multiple VoltProcedure instances -- 1 outer and 2 inner -- the question arises as to which instance is the most suitable for invoking voltQueueSQL/voltExecuteSQL on -- the instance of the outer class that is being directly invoked and doing the coordination OR the instances of the
inner class that defined (via inheritence) the SQLStmts. I think that the safer answer is "the outer class". I think that the reason for this is that this
outer class (at least potentially) inherits dynamic state that can track the status and progress of the procedure/transactions.

The inner extension classes are a creative way of bundling the SQLStmts in a way that can be reused by multiple stored procedures, but once they have served this purpose at compile time, I think that they mostly just threaten to get in the way at runtime.

Though I haven't tested this, I'm not sure the base classes are required to be VoltProcedure classes or even to be instantiated. This may work:

public class GetFromTableOneStmt // reusable
{
// This could be extended to any number of related SQLStmts.
public static final SQLStmt getFromTableOne = new SQLStmt("SELECT * from TableOne where personId = ?");

public static void queueStmts(VoltProcedure proc, personId)
{
// This could be repeated over a "batch" of SQLStmts,
// possibly with conditional logic to control which statements get queued.
proc.voltQueueSQL(getFromTableOne, personId);
}
}

Then, IF the statement (batch) is interesting as a stand-alone invocation, you could define:

public class GetFromTableOne extends VoltProcedure
{
public static class ExtendedGetFromTableOneStmt extends GetFromTableOneStmt
{
}

public VoltTable[] run(String personId) throws VoltAbortException
{
GetFromTableOneStmt.voltQueueSQL(this, personId);
return voltExecuteSQL(true);
}
}

The usage from other VoltProcedure classes that possibly combine other ...Stmt classes "via extension" would be similar.
I am not sure whether the voltcompiler recognizes the "identity" of these SQLStmts when they are referenced
from different VoltProcedures or if it processes the SQLStmts as if they were defined in a different duplicate class for each inner class extension.
If the latter is true, the SQLStmts could be shared freely by single-partition and multi-partition procedures.
If the former, there may be "state" associated with the SQLStmts that prevents their proper sharing between single-partition and multi-partition procedures, so you might need to work around by defining identical SP and MP wrapper classes for the same SQLStmt(s). I'll look into this.