Announcement

Collapse
No announcement yet.

Bulk Insert

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bulk Insert

    Hey everybody,

    after i inserted some data to my voltDB i was trying to improve the insert procedure.

    I wanted to combine a bunch(thousand or tenthousand) of inserts to one single statement.

    How would you do that?

    I tried to build the sql string and than create an SQLStmt object, but VoltDB tells me something like "you cant create a SQLStmt after initializing".

    Then I tried to define a statement like that "INSERT INTO roh VALUES(?);" and then fill in the values.

    Didnt work eather, of course it says "wrong count of columns"

    So How would you do that? Did I get something wrong? Any workarounds?

    thx for any suggestions
    johannes

  • #2
    re: Bulk Insert

    Johannes,

    VoltDB can insert data into partitioned tables very quickly (usually over 100,000 inserts per second per server), so you can write a client application to asynchronously call the insert stored procedure in a tight loop. This should allow you to load a lot of data in a short amount of time.
    In a stored procedure you can batch up multiple SQL statements (up to 1,000 at a time) by calling voltQueueSQL() for each statement then a single call to voltExecuteSQL().

    I tried to build the sql string and than create an SQLStmt object, but VoltDB tells me something like "you cant create a SQLStmt after initializing".

    All SQL in VoltDB is created and compiled in advance, when creating your application catalog. You cannot create SQL dynamically at runtime.

    Then I tried to define a statement like that "INSERT INTO roh VALUES(?);" and then fill in the values.
    Didnt work either, of course it says "wrong count of columns"

    You can do that, but you want to explicitly list the columns being inserted into and use a ? for each value, as in:

    "INSERT INTO table (column1, column2, column3) VALUES (?, ?, ?);"

    Can you post your stored procedure code?
    -Tim

    Comment


    • #3
      re: re: Bulk Insert

      Originally posted by tcallaghan View Post
      Johannes,

      VoltDB can insert data into partitioned tables very quickly (usually over 100,000 inserts per second per server), so you can write a client application to asynchronously call the insert stored procedure in a tight loop. This should allow you to load a lot of data in a short amount of time.
      In a stored procedure you can batch up multiple SQL statements (up to 1,000 at a time) by calling voltQueueSQL() for each statement then a single call to voltExecuteSQL()..

      -Tim
      Hi Tim,

      thx for your response.

      I tried your first suggestion:

      In a stored procedure you can batch up multiple SQL statements (up to 1,000 at a time) by calling voltQueueSQL() for each statement then a single call to voltExecuteSQL().


      It wasn't very much faster.

      the client application calls the procedure for every single insert. The procedure puts the data into an hash and after 1000 inserts it calls "voltQueueSQL()" for every element of the hash and after that it calls once "voltExecuteSQL". I guess i got something wrong :)

      Here is my current procedure:

      Code:
      @ProcInfo(         partitionInfo = "ROH.RID: 0",         singlePartition = true ) public class InsertBunch extends VoltProcedure {          public HashMap inserts = new HashMap();         public final SQLStmt sql = new SQLStmt("INSERT INTO ROH(RID, SID, CID, times ) VALUES (?,?,?,?);");          public VoltTable[] run (long rid, long sid, long cid, long times) throws VoltAbortException {          Long[] tmp = {sid, cid, times};          inserts.put(rid, tmp);          if(inserts.size() == 1000){                  Set insertSet = inserts.keySet();                  for ( Iterator i = insertSet.iterator(); i.hasNext(); ){                          long reqid =  i.next();                          voltQueueSQL(sql, reqid, inserts.get(reqid)[0], inserts.get(reqid)[1], inserts.get(reqid)[2]);                  }                  voltExecuteSQL();                  inserts = new HashMap();         }          return null;         } }

      Comment


      • #4
        re: Bulk Insert

        Originally posted by whitesensless View Post
        Hi Tim,

        thx for your response.

        I tried your first suggestion:

        In a stored procedure you can batch up multiple SQL statements (up to 1,000 at a time) by calling voltQueueSQL() for each statement then a single call to voltExecuteSQL().
        ..


        I think you misinterpreted my suggestion, the following stored procedure "InsertOne" will perform extremely well as below. There is no need to batch up more than a single insert within the call.

        Can you try running this and let me know how it performs in your environment? When you respond please include the OS/RAM/CPU of the clients and servers.

        -Tim

        Code:
        @ProcInfo(
            partitionInfo = "ROH.RID: 0",
            singlePartition = true
        )
        
        public class InsertOne extends VoltProcedure {
            public final SQLStmt sql = new SQLStmt("INSERT INTO ROH(RID, SID, CID, times ) VALUES (?,?,?,?);");
        
            public VoltTable[] run (long rid, long sid, long cid, long times) throws VoltAbortException {
        
            voltQueueSQL(sql, rid, sid, cid, times);
        
            return voltExecuteSQL();
        }

        Comment

        Working...
        X