Forum: Managing VoltDB

Post: How to merge the data of 2 VoltTables?

How to merge the data of 2 VoltTables?
pavangadiya
Aug 27, 2015
Hello,

I want to merge the data of 2 VoltDB tables. I am using Java in my Client application.

Here is my scenario:

I have two tables. On every table, I am calling a select statement(Ad-Hoc Query ..Eg : SELECT col1, col2, col3 from Table1 where col4 = ? order by col1 ASC; ) using client.callProcedure() and storing the results in VoltTable objects V1 & V2.

Now, I want to merge the results of both the VoltTable objects v1 & v2.

How can I do that ?I know, in traditional RDBMS it can be done by Joins.

Does Voltdb support Joins? If yes, then how can I use Joins in Voltdb ? Is there any other alternative which fulfills my requirements?

Any help much appreciated!

Regards,
Pavan
pzhao
Aug 27, 2015
Paven,

You could simply just iterate through v1 and v2 and place them in a new VoltTable. Alternatively, VoltDB supports sql joins, inner/outer and left/right, and union if that is a possibility for you. Please refer to documentation here.

Peter Zhao
pavangadiya
Aug 28, 2015
Hello pzhao,

Thank you for the reply. I tried with the joins but , VoltDB does not support full outer joins. So, join is not fitting for my requirement because I have to display the data of both the tables.

Alternatively, I am iterating through v1 & v2 .. but,I am not getting how to place them(how to concatenate the results) in new VoltTable (v3). Could you please explain more.

Thank You!

Pavan
bballard
Aug 31, 2015
Hi Pavan,

It is true, VoltDB does not support full outer joins, only left or right outer joins. I logged a ticket for this here: https://issues.voltdb.com/browse/ENG-8932. There are some difficult edge cases. Can you provide an example of your desired full outer join query? Is it matching on one column? Are the tables replicated, or partitioned?

If it helps, I wanted to see if I could do a SQL workaround using UNION. Unfortunately this only worked if the tables were replicated.

46> create table t1 (id int, val varchar(10));
Command succeeded.
47> create table t2 (id int, val varchar(10));
Command succeeded.
48> insert into t1 values (1,'Hi');
(Returned 1 rows in 0.01s)
49> insert into t1 values (2,'Hi');
(Returned 1 rows in 0.00s)
50> insert into t2 values (1,'Hello');
(Returned 1 rows in 0.01s)
51> insert into t2 values (3,'Hello');
(Returned 1 rows in 0.01s)
52> (select t1.id, t1.val, t2.id, t2.val from t1 left outer join t2 on t1.id = t2.id) union (select t1.id, t1.val, t2.id, t2.val from t1 right outer join t2 on t1.id = t2.id WHERE t1.id is null);
ID VAL ID VAL
----- ----- ----- ------
1 Hi 1 Hello
2 Hi NULL NULL
NULL NULL 3 Hello

If instead I partitioned the tables on the ID columns, the union query would result in this:
Unexpected Ad Hoc Planning Error: java.lang.RuntimeException: Error compiling query: org.voltdb.planner.PlanningErrorException: Statements are too complex in set operation using multiple partitioned tables.

However, you might use a modification of this workaround to use 2 SQL statements to get the results, and then either iterate through both tables on the client, or you could use the following technique to merge the contents of two VoltTables:

public VoltTable[] run() {
voltQueueSQL(leftOuterJoin);
voltQueueSQL(rightOuterJoinWhereLeftIsNull);
VoltTable[] results = voltExecuteSQL(true);
VoltTable t0 = results[0];
VoltTable t1 = results[1];


VoltTable t2 = new VoltTable(
new VoltTable.ColumnInfo("id1",VoltType.BIGINT),
new VoltTable.ColumnInfo("val1",VoltType.STRING),
new VoltTable.ColumnInfo("id2",VoltType.BIGINT),
new VoltTable.ColumnInfo("val2",VoltType.STRING));

while (t0.advanceRow()) {
t2.addRow(
new Object[] {
t0.getLong(0),
t0.getString(1),
t0.getLong(2),
t0.getString(3) });
}
while (t1.advanceRow()) {
t2.addRow(
new Object[] {
t1.getLong(0),
t1.getString(1),
t1.getLong(2),
t1.getString(3) });
}

return new VoltTable[] {t2};

}


Hope this helps,
Ben
bballard
Aug 31, 2015
Pavan,

You may also be able to add the rows from t1 to the t0 VoltTable, so you're not having to copy all of the rows from both queries into a new VoltTable as my example showed. For example:

while (t1.advanceRow()) {
t0.addRow(
new Object[] {
t1.getLong(0),
t1.getString(1),
t1.getLong(2),
t1.getString(3) });
}

return new VoltTable[] {t0};
pavangadiya
Sep 7, 2015
Hello Ben,

Sorry for the bit late reply & thanks for your reply.

My tables are partitioned on one column(A column which is having a BigInt value). Now, I am trying other workaround instead of Joins & Unions.

My alternative is also similar to your alternative.

while (t1.advanceRow()) {
t0.add(t1);
}
System.out.println(t1.toFormattedString());


Now, I want to sort the data of t1 table based on one column (now table t1 has the collective data of both the tables t0 & t1). Eg : VoltTable t1 hase three columns & I want to sort the t1 on column1(Integer value).

How can I do that ? I am trying to store the whole data of voltTable(t1) to an array & then will apply a sort logic on column1.

Could you please suggest any suitable alternative to sort the table t1 on column1 using VoltdB API or any other methods.

Thank You!

Pavan
bballard
Sep 8, 2015
Hi Pavan,

This might be overkill, but I had a code example on github (https://github.com/VoltDB/app-fantasy-sports/blob/master/client/src/benchmark/Ranker.java) which is client code, but something similar could work in a java stored procedure to produce a new VoltTable, or you might prefer to use it on the client similar to the example.

There is a wrapper class for a VoltTable called ScoreTable, which implements Comparable. Then a PriorityQueue of ScoreTables is used to do a max heap sort (like a merge sort) of the pre-sorted VoltTables. The queries are in Descending order, and the Comparator is coded to match that. If you wanted ascending order you would change both the query and the comparator. The comparator wrapper class is specific to the columns and direction you want to sort by, although it might be possible to write a configurable one. This pulls one row at a time, in order, using advanceRow() directly from the VoltTables. The VoltTables can be huge, and there can be as many of them as you want. This scales well.

If that is overkill, the simpler but less scalable way would be to use t0.fetchRow(i) and t1.fetchRow(i) to get VoltTableRow objects and put them into something like a SortedMap, then get the entryset to iterate through the VoltTableRow objects in the combined order.

Best regards,
Ben