Forum: Building VoltDB Clients

Post: using csvloader / jdbcloader

using csvloader / jdbcloader
nchakraborty
Dec 23, 2015
Hi All,

Very new to the fastest db forums.

I am trying to build an application around volt.

The issue is loading bulk data into the db.
csvloader and jdbcloader can be used easily as a utility for voltdb, but what can we do if we want to load data from csv in client applications.

Suppose I want to write a Python client application, is there any way we can load data using csvloader in this python script.
I can try calling the os commands easily, but I was thinking in the lines of VOLT having a COPY command internally as a SQL statement.



Other questions :
The python client examples shown, only display these steps :
Creating connection
constructing the procedure object(the procedure needs to already present in db)
calling the procedure object
Checking the results given back by procedures.

Is there a traditional way where we can achieve this like other db's
create connection
contruct a sql statement
call the sql statement
check the results returned by sql statement

It seems even a select is done by contructing a procedure and calling it now.


Thanks
bballard
Dec 23, 2015
The csvloader utility is implemented in java, so it is possible to invoke it directly from a java application. The csvloader bin script just sets some environment variable and then calls the org.voltdb.utils.CSVLoader main method. See here: https://github.com/VoltDB/voltdb/blob/master/bin/csvloader

Likewise, jdbcloader is the class org.voltdb.utils.JDBCLoader, as you can see in the script here: https://github.com/VoltDB/voltdb/blob/master/bin/jdbcloader

Alternatively, you can write your own java code to read a CSV file and load data into the database using TABLENAME.insert default procedures. One of our example applications does this: https://github.com/VoltDB/app-metro/blob/master/client/src/benchmark/MetroBenchmark.java#L32

It should not be difficult to do something similar from Python, as I'm sure there are CSV parsing libraries that could be utilized, but we do not have such an example.

You could also use subprocess in Python to run command-line tools. https://docs.python.org/2/library/subprocess.html
nchakraborty
Dec 24, 2015
The csvloader utility is implemented in java, so it is possible to invoke it directly from a java application. The csvloader bin script just sets some environment variable and then calls the org.voltdb.utils.CSVLoader main method. See here: https://github.com/VoltDB/voltdb/blob/master/bin/csvloader

Likewise, jdbcloader is the class org.voltdb.utils.JDBCLoader, as you can see in the script here: https://github.com/VoltDB/voltdb/blob/master/bin/jdbcloader

Alternatively, you can write your own java code to read a CSV file and load data into the database using TABLENAME.insert default procedures. One of our example applications does this: https://github.com/VoltDB/app-metro/blob/master/client/src/benchmark/MetroBenchmark.java#L32

It should not be difficult to do something similar from Python, as I'm sure there are CSV parsing libraries that could be utilized, but we do not have such an example.

You could also use subprocess in Python to run command-line tools. https://docs.python.org/2/library/subprocess.html


Thanks.

What about the other question.

Is there a traditional way where we can achieve this like other db's
create connection
contruct a sql statement
call the sql statement
check the results returned by sql statement

Or everytime we need a already created select procedure to interact with Volt.
jhugg
Dec 24, 2015
You can call SQL using the native clients by calling the built-in @AdHoc procedure documented here https://docs.voltdb.com/UsingVoltDB/sysprocadhoc.php.

That looks like:

client.callProcedure("@AdHoc", "select * from table limit 1 order by key");

You can also use built in default procs to perform simple CREATE, READ, UPDATE, DELETE ops on tables.
https://docs.voltdb.com/UsingVoltDB/ChapSimplify.php#SimpleAutoProcs

That looks like:

client.callProcedure("FOO.insert", value1, value2, value3);

Finally, you can use a number of other clients for more natural interaction with VoltDB, such as JDBC or our SQL console.

https://docs.voltdb.com/UsingVoltDB/ProgLangjdbc.php
nchakraborty
Dec 24, 2015
Thanks a ton John and bballard.

Exactly what I was looking for.

However, I am still stuck at my first questions.


I wonder if we have an example for this scenario :
Using csvloader inside of java file to load data from csv files.

I need to create a java file which will will use the
import org.voltdb.utils.CSVLoader; class and load data into db.

Later this file will be transformed into a stored proc named load_csv which will accept parameters.

this is what I have tried as of yet, but no success.

import org.voltdb.*;
import org.voltdb.utils.CSVLoader;
import java.io.IOException;

public class load_csv extends VoltProcedure {



public VoltTable[] run()
throws VoltAbortException {
try {String[] str=null;
CSVLoader.main(str);
}
catch(IOException e) {
}
catch (InterruptedException e) {
}
return null;
}

}

Also, is there any API which helps with inbuilt classes and methods of CSVLoader.java class

Thanks
bballard
Dec 24, 2015
Hi,

VoltDB doesn't have anything similar to PostreSQL's "COPY" syntax where you can load csv files with a SQL statement.

You should not try to run csvloader from within a stored procedure. That would be slow and non-deterministic. See: https://docs.voltdb.com/UsingVoltDB/DesignProc.php#DesignProcDeterminism

What is it you are trying to do with loading files? I don't understand why using csvloader or a java client won't work?

Thanks,
Ben
nchakraborty
Dec 24, 2015
I need to develop a python client application which is going to load historical data for some tables.
The historical data is going to be huge.

If its not recommended than I can avoid using this method, maybe I need to call subprcocess to load the data using this utility.

How about having a SQL COPY command as a feature, can we add it to wish list.
bballard
Dec 24, 2015
Hi,

To control the process of which files to load when, and check their success from Python, probably the subprocess method would work. Test using csvloader from the command line and you will see it produces three output files in the current directory. You may want to check these files to verify the status of the file load.

The other alternative would be to write a custom Python client that reads CSV and passes the values from each line to the appropriate stored procedure. This may be simpler if there is an easy python library that handles the CSV parsing for you, but we don't have any examples of this.

In terms of a wish list feature, the main area of ambiguity to me is how you tell it where the file is. Supposing there was a system procedure you could call, and pass in a parameter for the path to the file, the file might exist at that path on one of the servers, or perhaps on all of them.

The closest feature we have that loads a local file is the @UpdateApplicationCatalog procedure, which you can call from within sqlcmd, and it will load the local files that you pass in as parameters. From a java client you can also call @UpdateApplicationCatalog, but in this case the parameters are a byte[] and a String that contain the contents of the files in question. So either way, this doesn't give us a good model for how this might work from Python.

I think perhaps a better feature request would be if we implemented a CSVLoader example in Python, or provided a function on the Python client library that could load CSV files.

Let me know what you think I should put in the feature request.

Thanks,
Ben
nchakraborty
Dec 26, 2015
Thanks bballard.

A feature request for an example and python function to load data from csv would be helpful.