Announcement

Collapse
No announcement yet.

Benchmark: MULTI HOSTS IS SLOWER THAN SINGLE HOST??

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Benchmark: MULTI HOSTS IS SLOWER THAN SINGLE HOST??

    Hi all,
    i am a newbie of VoltDB.

    I write a new Java app, just like helloworld example, which insert data to a table, and search data from this table.
    - Table: SUB_INFOR (SUB_ID, ISDN, PRODUCT_CODE, STATUS), partition by ISDN. Table has 100.000 rows
    - A text file with 100.000 ISDN, to search additional information of this ISDN from above table.

    With a single host A, time is ~30 seconds
    With 3 hosts (with hardware same as A), time is ~100 seconds

    I tried both single-partition and multi-partition of table SUB_INFOR, the result is same :(

    Deployment file:
    <?xml version="1.0"?>
    <deployment>
    <cluster hostcount="3" sitesperhost="2" kfactor="0" />
    <httpd enabled="true">
    <jsonapi enabled="true" />
    </httpd>
    </deployment>


    Can anyone explain why??

    Thanks in advance!

  • #2
    Hi Leobon,

    It looks like that the procedure you use may have not been partitioned. Were you using a stored procedure or were you using adhoc queries? Can you share the queries?

    What were you measuring in the 30 second and 100 second duration?

    Thanks,
    Ning

    Comment


    • #3
      Hi nshi,
      Here is my ddl:
      CREATE TABLE SUB_INFOR (
      SUB_ID VARCHAR(10),
      ISDN VARCHAR(15) NOT NULL,
      PRODUCT_CODE VARCHAR(20) NOT NULL,
      STATUS VARCHAR(1),
      PRIMARY KEY (ISDN)
      );

      PARTITION TABLE SUB_INFOR ON COLUMN ISDN;

      CREATE PROCEDURE FROM CLASS voltdbapptest.procedures.Insert;
      CREATE PROCEDURE FROM CLASS voltdbapptest.procedures.Select;
      CREATE PROCEDURE FROM CLASS voltdbapptest.procedures.Delete;

      PARTITION PROCEDURE Insert ON TABLE SUB_INFOR COLUMN ISDN;
      PARTITION PROCEDURE Select ON TABLE SUB_INFOR COLUMN ISDN;


      I used procedures, eg: response = client.callProcedure("Select", isdn);

      Here is my queries:
      INSERT INTO SUB_INFOR VALUES (?, ?, ?, ?);
      SELECT SUB_ID, PRODUCT_CODE, STATUS FROM SUB_INFOR WHERE ISDN = ?;


      Time is counted from before-accessing VoltDB to end-accessing VoltDB, (including time to read from file and write to file, but this time isn't problem).
      Last edited by leobon; 02-17-2014, 09:46 PM.

      Comment


      • #4
        Hi leobon,

        The procedures and tables are properly partitioned, so they should be fine.

        Originally posted by leobon View Post
        I used procedures, eg: response = client.callProcedure("Select", isdn);
        You were calling the procedure synchronously, which means that the call will block until the response is received. If your client only has one thread calling Select, this essentially limit the throughput to 1 call at a time. The database is idle most of the time in this case. For example, if the latency for calling Select once is 1ms, then you can only to 1 second / 1 ms = 1000 calls in a second at most.

        With multiple hosts, some percentage of your request may be rerouted to different hosts. This adds another network round-trip to each rerouted request, thus increasing the latency by a little bit. I think that is why your client got slower with multiple hosts.

        There are two things you can do to improve the performance dramatically,
        1. use asynchronous invocation or multiple threads on the client, e.g. http://voltdb.com/docs/PerfGuide/Hello2Async.php
        2. connect to all hosts using the same client instance. This will route requests to the proper hosts, saving a network round-trip.
        Ning

        Comment


        • #5
          Thanks for your response, nshi!

          Originally posted by nshi View Post
          There are two things you can do to improve the performance dramatically,
          1. use asynchronous invocation or multiple threads on the client, e.g. http://voltdb.com/docs/PerfGuide/Hello2Async.php
          2. connect to all hosts using the same client instance. This will route requests to the proper hosts, saving a network round-trip.
          For:
          1. I'll try
          2. What do you mean "connect to all hosts using the same client instance"??

          Comment


          • #6
            Originally posted by leobon View Post
            2. What do you mean "connect to all hosts using the same client instance"??
            That means calling createConnection multiple times on the same client instance to connect to all the hosts in your cluster. You can see an example at https://voltdb.com/docs/PerfGuide/Hello2Connect.php.
            Ning

            Comment


            • #7
              Hi nshi,
              i tried two ways you've recommended.
              The result is good.
              Thanks!

              Comment


              • #8
                Hello,

                the thread was really helpful to me.
                But my question is: Why is asynchronous so much faster than synchronous with also one server? I have 12.199 Transactions/s with Async and 2.325 Transactions/s with Sync. It's a simple Java programm which imports ~900.000 data items. In the asynchronous program I'm also waiting till the last callback receives the client, because otherwise some data is lost. But I don't understand the great time difference.

                Thanks,
                Sabrina

                Comment


                • #9
                  Hi Sabrina,

                  There is an explanation of Synchronous vs. Asynchronous procedure calls in the Performance Guide here: https://voltdb.com/docs/PerfGuide/Hello2Async.php

                  Take a look especially at Figures 2.1 and 2.2. Synchronous calls are blocking so there is a complete round trip before the next call is sent (unless you have a multi-threaded client, but even then there is waiting and it can take a lot of threads to generate a continous high velocity stream of requests to the database). Asynchronous calls allow a single-threaded client to do just that, to continuously send requests, and continuously receive responses on the callback thread.

                  Ben

                  Comment


                  • #10
                    Hello Ben,

                    thanks for your reply. Your explanation is clear, but when I am using 1 server, with 3 partitions, I only can be 3 times faster with async, or? But in my case the asynchronous is 6 times faster than the synchronous? ;) It is noch visible for me.

                    Can you give me a detailed explanation?

                    Thanks,
                    Sabrina

                    Comment


                    • #11
                      Hi Sabrina,

                      We recommend setting sitesperhost to somewhere between 1/2 to 3/4 of the total number of cores (counting 2x for CPUs with hyperthreading), but it's always good to experiment to see where you get the best results. In many cases sitesperhost=3 is 3x faster than sitesperhost=1.

                      The client side can often be the bottleneck too, especially if you use synchronous calls because they are blocking so the calling thread cannot send another request until the response has been received. The entire round-trip time for a response is essentially in the critical path. You can add more threads to parallelize the work, but it may need hundreds of threads to generate requests as fast as the database can process them. Another option is a single-threaded client that uses asynchronous calls. Because they are not blocking, a single thread can send many requests per second. The responses are received on another thread. This generally can generate requests faster than the database can process them, until you get to high levels of throughput (>200K/sec) where you may need more threads or more client instances.

                      If you were to use synchronous calls and start with just 1 thread, then add threads incrementally, you would see each thread adding 1x to the throughput until you reached a point of diminishing returns and it flattened to a constant rate of throughput, which would be the full capacity of the database. But if you use asynchronous calls, in most cases you immediately jump to the full capacity of the database. We call it "fire-hosing" the database.

                      Ben

                      Comment

                      Working...
                      X