Results 1 to 4 of 4

Thread: Test Performance of System procedure LoadSinglepartitionTable

  1. #1
    New Member
    Join Date
    Mar 2017
    Posts
    8

    Test Performance of System procedure LoadSinglepartitionTable

    Hi all
    I'm now trying to write a code with System Procedure "LoadSinglePartitionTable", Here is the code I wrote.

    Code:
    HTML Code:
    private class SendDataThreadUsingLoader extends Thread {
            private int m_batchSize = 0;
            public SendDataThreadUsingLoader(int batchSize) {
                m_batchSize = batchSize;
            }
            @Override
            public void run() {
                try {
                    VoltBulkLoader loader = client.getNewBulkLoader("test", m_batchSize, false, null);
    
                    int rowCnt = 1;
                    while (true) {
                        try {
                            content = dataQueue.poll(60, TimeUnit.SECONDS);
                            if (content == null)
                                break;
                            Object[] data = {content};
    
                            loader.insertRow(new Integer(rowCnt), data);
    
                        } catch (InterruptedException e) {
                            e.printStackTrace();
                        }
                    }
                    loader.flush();
                } catch (ExecutionException e) {
                    e.printStackTrace();
                } catch (InterruptedException e) {
                    e.printStackTrace();
                } catch (Exception e) {
                    e.printStackTrace();
                }
            }
        }
    Also, I have wrote a code by using define a procedure myself.
    Code:
     private class SendDataThread extends Thread {
            @Override
            public void run() {
                while(true) {
                    try {
                        content = dataQueue.poll(60, TimeUnit.SECONDS);
                        if (content == null)
                            break;
                    } catch (InterruptedException e) {
                        e.printStackTrace();
                    }
                    try {
                   
                           client.callProcedure(new NullCallback(), "voltdbprocedure", content);
                    } catch (Exception e) {
                        e.printStackTrace();
                    }
                }
            }
        }
    Code:
    public class voltdbprocedure extends VoltProcedure{
        static final long VOTE_SUCCESSFUL = 0;
    
        private final SQLStmt insertVoteStmt = new SQLStmt(
                "insert into test (content) values (?);");
    
        public long run(String content) {
            voltQueueSQL(insertVoteStmt, EXPECT_SCALAR_MATCH(1), content);
            voltExecuteSQL(true);
            return VOTE_SUCCESSFUL;
        }
    }

    The code is quite simple, and I just want to test the performance between them. Finally I found that , the performance of System Procedure is less than procedure I write. I'm confused that they are all Async procedure call , why system procedure has less performance ? the destination table and input data are the same in two tests.
    Test Result:
    Set batchSize to 1 to call system procedure per row data. performance is about 17000 TPS.
    while in another test, myself procedure "voltprocedure" get about 23000 TPS.

    Could anyone tell me the root cause about this ?

  2. #2
    Super Moderator
    Join Date
    Dec 2011
    Posts
    224
    Hi Simon,

    Is your table partitioned by the content column? Also, is your procedure partitioned?

    The bulk loader API can be faster than calling individual insert procedures (at least when the batch size is higher than 1) because for the same overhead cost of invoking a procedure and having it executed, rather than inserting only one row, it can insert multiple rows. The marginal cost of inserting a few more rows within a procedure is very small, measured in microseconds.

    I'm not surprised that the bulk loader API is slower when the batch size is 1. There is a higher overhead in that case because it must form a batch and load the records into a VoltTable object, then call the LoadSinglePartitionTable procedure. The procedure iterates through the VoltTable and inserts the rows. By setting the batch size to 1, you are calling just as many procedures, but with the additional overhead of packaging up the row into a VoltTable and reading from it, rather than using primitives or simpler object arrays.

    I suspect if you increase the batch size to 10 or 100 or 1000, you will find a point where it outperforms the individual procedure calls. Sometimes it is dramatic, sometimes it is a more modest improvement. That may depend also on the datatypes, the sizes of the columns, and the number of columns.

    The other situation where individual procedure calls outperform bulk loader is when there are data errors, because of the additional retries and error handling that the bulk loader api uses, which is more expensive than individual procedure calls.

    Best regards,
    Ben

  3. #3
    New Member
    Join Date
    Mar 2017
    Posts
    8
    Hi Ben

    Thanks for your response, I have partitioned by the content and partitioned the procedure.

    The performance I talked here is how many rows inserting into a table per seconds, rather than the procedure calls (TPS) show in the monitor web page. Now my understanding is that , system procedure like LoadSinglePartitionTable need " get data -> form a batch -> build a VoltTable -> procedure call". and individual insert procedures " get a data -> procedure call ". the overhead is in "form a batch -> build a VoltTable" if I just set batch size to 1, in this case, may be " form a batch per rows". When I increase batch size, I will see that the TPS get down, while the data throughput may go up. is my understanding right ?

    So according to my understanding, among using loader, individual procedures and default TABLENAME.insert, loader has the best the performance of data throughput right ?

    Thanks,
    Simon

  4. #4
    New Member
    Join Date
    Mar 2017
    Posts
    8
    Quote Originally Posted by bballard View Post
    Hi Simon,

    Is your table partitioned by the content column? Also, is your procedure partitioned?

    The bulk loader API can be faster than calling individual insert procedures (at least when the batch size is higher than 1) because for the same overhead cost of invoking a procedure and having it executed, rather than inserting only one row, it can insert multiple rows. The marginal cost of inserting a few more rows within a procedure is very small, measured in microseconds.

    I'm not surprised that the bulk loader API is slower when the batch size is 1. There is a higher overhead in that case because it must form a batch and load the records into a VoltTable object, then call the LoadSinglePartitionTable procedure. The procedure iterates through the VoltTable and inserts the rows. By setting the batch size to 1, you are calling just as many procedures, but with the additional overhead of packaging up the row into a VoltTable and reading from it, rather than using primitives or simpler object arrays.

    I suspect if you increase the batch size to 10 or 100 or 1000, you will find a point where it outperforms the individual procedure calls. Sometimes it is dramatic, sometimes it is a more modest improvement. That may depend also on the datatypes, the sizes of the columns, and the number of columns.

    The other situation where individual procedure calls outperform bulk loader is when there are data errors, because of the additional retries and error handling that the bulk loader api uses, which is more expensive than individual procedure calls.

    Best regards,
    Ben
    Hi Ben
    according to my understanding, among using loader, individual procedures and default TABLENAME.insert, loader has the best the performance of data throughput right ?

    Thanks,
    Simon

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •