No announcement yet.

Remediate to backpressure

  • Filter
  • Time
  • Show
Clear All
new posts

  • Remediate to backpressure

    My setup is 3 voltdb servers, kfactor 1, 8 sites per hosts (serves are running on VMs of 8 cores each). And i have a partitioned table (couple INTEGER columns only).
    My Java application is connected to all 3 servers. It calls asynchronously "in a loop manner" a single-partitioned stored procedure which performs an upsert operation.
    In practice, the application is calling asynchronously 48k/sec the stored procedure and the cluster seems to handle this fine as per the Management Center. However, I'm trying to understand the backpressure events that I'm seeing on the application side in the ClientStatusListenerExt because the cpu load on the servers are low always (i.e 10-15%).

    Could you please help to clarify why the server cpu load could be low and the server queue being full (telling that the server cannot handle the workload)?

    I have tried to tune setMaxOutstandingTxns() on the client config but no visible effect... So i may be trying to fine tune the wrong queue? (server queue vs partition queue)

    Best regards,

  • #2
    It could be networking - especially on VMs. Or maybe it is your client application code.
    To rule out your application code, can you run our new (in V6.1) contentionmark application with the client setting of "tuples=10000" and "servers=<node1,node2,node3>". Let us know what throughput you see.



    • #3
      Hi Ruth, thanks for your response.
      I performed the upgrade 5.8.1 -> 6.1 and then ran the contentionmark test on my setup /1/ The throughput is about 132k/s. I still see a multitude of backpressure conditions in the Status listener. Also, Cluster latency is high as compared to Stored procedures execution avg latency (about 70ms vs 0.01ms) ContentionMark_VMC.jpg
      Does this tend to indicate that the networking is a bottle neck here? or are these figures expected?

      Thanks a lot,
      Best regards,

      ./ Performing client...
      Command Line Configuration

      displayinterval = 5
      duration = 300
      password =
      ratelimit = 2147483647
      servers = dbs0.local,dbs1.local,dbs2.local
      tuples = 10000
      user =
      warmup = 5

      00:04:25 Period Throughput 134052/s, Cumulative Throughput 132644/s, Total Failures 0
      00:04:30 Period Throughput 117219/s, Cumulative Throughput 132358/s, Total Failures 0
      00:04:35 Period Throughput 135149/s, Cumulative Throughput 132409/s, Total Failures 0
      00:04:40 Period Throughput 139209/s, Cumulative Throughput 132530/s, Total Failures 0
      00:04:45 Period Throughput 136358/s, Cumulative Throughput 132597/s, Total Failures 0
      00:04:50 Period Throughput 126672/s, Cumulative Throughput 132495/s, Total Failures 0
      00:04:55 Period Throughput 142192/s, Cumulative Throughput 132660/s, Total Failures 0


      • #4
        What this says is that it can certainly go faster. This sounds like it might be your client. Is there more in your loop than you think? Take a look at contentionmark source ( There isn't much to it. IF you share your client source we can take a look. Another possibility is that you're sending or returning more data and it is just networking. Contentionmark sends 2 longs and returns a status, but no data. Are you returning a lot more?

        Basically, see if you can get your application to look more like ours and it will get higher throughput. Then, try ratelimiting it a little so that you're not firehosing. When you do this, your latencies will come down.


        • #5
          Thanks for the response. Yes i'm writing 5kB of data at 48ktps rate so it's a lot heavier that the contentionmark test. The java client object is also shared by several threads whereas in the contentionmark test, it's single thread as far as l looked at the code.
          I will investige more on the networking side, and play around with the rate limiter functionality also.
          Best regards