Results 1 to 9 of 9

Thread: using csvloader / jdbcloader

  1. #1
    New Member
    Join Date
    Dec 2015
    Posts
    6

    using csvloader / jdbcloader

    Hi All,

    Very new to the fastest db forums.

    I am trying to build an application around volt.

    The issue is loading bulk data into the db.
    csvloader and jdbcloader can be used easily as a utility for voltdb, but what can we do if we want to load data from csv in client applications.

    Suppose I want to write a Python client application, is there any way we can load data using csvloader in this python script.
    I can try calling the os commands easily, but I was thinking in the lines of VOLT having a COPY command internally as a SQL statement.



    Other questions :
    The python client examples shown, only display these steps :
    Creating connection
    constructing the procedure object(the procedure needs to already present in db)
    calling the procedure object
    Checking the results given back by procedures.

    Is there a traditional way where we can achieve this like other db's
    create connection
    contruct a sql statement
    call the sql statement
    check the results returned by sql statement

    It seems even a select is done by contructing a procedure and calling it now.


    Thanks

  2. #2
    Super Moderator
    Join Date
    Dec 2011
    Posts
    217
    The csvloader utility is implemented in java, so it is possible to invoke it directly from a java application. The csvloader bin script just sets some environment variable and then calls the org.voltdb.utils.CSVLoader main method. See here: https://github.com/VoltDB/voltdb/blo.../bin/csvloader

    Likewise, jdbcloader is the class org.voltdb.utils.JDBCLoader, as you can see in the script here: https://github.com/VoltDB/voltdb/blo...bin/jdbcloader

    Alternatively, you can write your own java code to read a CSV file and load data into the database using TABLENAME.insert default procedures. One of our example applications does this: https://github.com/VoltDB/app-metro/...hmark.java#L32

    It should not be difficult to do something similar from Python, as I'm sure there are CSV parsing libraries that could be utilized, but we do not have such an example.

    You could also use subprocess in Python to run command-line tools. https://docs.python.org/2/library/subprocess.html

  3. #3
    New Member
    Join Date
    Dec 2015
    Posts
    6
    Quote Originally Posted by bballard View Post
    The csvloader utility is implemented in java, so it is possible to invoke it directly from a java application. The csvloader bin script just sets some environment variable and then calls the org.voltdb.utils.CSVLoader main method. See here: https://github.com/VoltDB/voltdb/blo.../bin/csvloader

    Likewise, jdbcloader is the class org.voltdb.utils.JDBCLoader, as you can see in the script here: https://github.com/VoltDB/voltdb/blo...bin/jdbcloader

    Alternatively, you can write your own java code to read a CSV file and load data into the database using TABLENAME.insert default procedures. One of our example applications does this: https://github.com/VoltDB/app-metro/...hmark.java#L32

    It should not be difficult to do something similar from Python, as I'm sure there are CSV parsing libraries that could be utilized, but we do not have such an example.

    You could also use subprocess in Python to run command-line tools. https://docs.python.org/2/library/subprocess.html
    Thanks.

    What about the other question.

    Is there a traditional way where we can achieve this like other db's
    create connection
    contruct a sql statement
    call the sql statement
    check the results returned by sql statement

    Or everytime we need a already created select procedure to interact with Volt.

  4. #4
    VoltDB Team
    Join Date
    Nov 2009
    Posts
    201
    You can call SQL using the native clients by calling the built-in @AdHoc procedure documented here https://docs.voltdb.com/UsingVoltDB/sysprocadhoc.php.

    That looks like:

    client.callProcedure("@AdHoc", "select * from table limit 1 order by key");

    You can also use built in default procs to perform simple CREATE, READ, UPDATE, DELETE ops on tables.
    https://docs.voltdb.com/UsingVoltDB/...impleAutoProcs

    That looks like:

    client.callProcedure("FOO.insert", value1, value2, value3);

    Finally, you can use a number of other clients for more natural interaction with VoltDB, such as JDBC or our SQL console.

    https://docs.voltdb.com/UsingVoltDB/ProgLangjdbc.php

  5. #5
    New Member
    Join Date
    Dec 2015
    Posts
    6
    Thanks a ton John and bballard.

    Exactly what I was looking for.

    However, I am still stuck at my first questions.


    I wonder if we have an example for this scenario :
    Using csvloader inside of java file to load data from csv files.

    I need to create a java file which will will use the
    import org.voltdb.utils.CSVLoader; class and load data into db.

    Later this file will be transformed into a stored proc named load_csv which will accept parameters.

    this is what I have tried as of yet, but no success.

    import org.voltdb.*;
    import org.voltdb.utils.CSVLoader;
    import java.io.IOException;

    public class load_csv extends VoltProcedure {



    public VoltTable[] run()
    throws VoltAbortException {
    try {String[] str=null;
    CSVLoader.main(str);
    }
    catch(IOException e) {
    }
    catch (InterruptedException e) {
    }
    return null;
    }

    }

    Also, is there any API which helps with inbuilt classes and methods of CSVLoader.java class

    Thanks

  6. #6
    Super Moderator
    Join Date
    Dec 2011
    Posts
    217
    Hi,

    VoltDB doesn't have anything similar to PostreSQL's "COPY" syntax where you can load csv files with a SQL statement.

    You should not try to run csvloader from within a stored procedure. That would be slow and non-deterministic. See: https://docs.voltdb.com/UsingVoltDB/...rocDeterminism

    What is it you are trying to do with loading files? I don't understand why using csvloader or a java client won't work?

    Thanks,
    Ben

  7. #7
    New Member
    Join Date
    Dec 2015
    Posts
    6
    I need to develop a python client application which is going to load historical data for some tables.
    The historical data is going to be huge.

    If its not recommended than I can avoid using this method, maybe I need to call subprcocess to load the data using this utility.

    How about having a SQL COPY command as a feature, can we add it to wish list.

  8. #8
    Super Moderator
    Join Date
    Dec 2011
    Posts
    217
    Hi,

    To control the process of which files to load when, and check their success from Python, probably the subprocess method would work. Test using csvloader from the command line and you will see it produces three output files in the current directory. You may want to check these files to verify the status of the file load.

    The other alternative would be to write a custom Python client that reads CSV and passes the values from each line to the appropriate stored procedure. This may be simpler if there is an easy python library that handles the CSV parsing for you, but we don't have any examples of this.

    In terms of a wish list feature, the main area of ambiguity to me is how you tell it where the file is. Supposing there was a system procedure you could call, and pass in a parameter for the path to the file, the file might exist at that path on one of the servers, or perhaps on all of them.

    The closest feature we have that loads a local file is the @UpdateApplicationCatalog procedure, which you can call from within sqlcmd, and it will load the local files that you pass in as parameters. From a java client you can also call @UpdateApplicationCatalog, but in this case the parameters are a byte[] and a String that contain the contents of the files in question. So either way, this doesn't give us a good model for how this might work from Python.

    I think perhaps a better feature request would be if we implemented a CSVLoader example in Python, or provided a function on the Python client library that could load CSV files.

    Let me know what you think I should put in the feature request.

    Thanks,
    Ben

  9. #9
    New Member
    Join Date
    Dec 2015
    Posts
    6
    Thanks bballard.

    A feature request for an example and python function to load data from csv would be helpful.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •