Forum: Building VoltDB Applications

Post: k-safety

k-safety
tuancao
May 17, 2010
Hi,

I have run the voter example with k-safety enabled. I used the same configuration Tim sent out a week ago (the last weekly update).

To recap, I used the following setup:
Servers: (6) Dell 2950 Servers, 2.66 Ghz Intel Xeon, 2 quad cores/server, 2x4MB Cache
Client: (1) same specification as servers
Switch: 2 Port Summit X450a Ethernet Switch (1 gigabit/sec).
VoltDB:(2) partitions per server

Results:
1. k=0, no rate limit on client, measured 226,657 TPS, average latency 166.64ms
2. k=0, client rate limited to 200,000 TPS, measured 193,231 TPS, average latency 20,19ms
3. k=2, client rate limited to 200,000 TPS, measured 69,964 TPS, average latency 2285,20ms
4. k=2, client rate limited to 100,000 TPS, measured 71,870 TPS, average latency 2408.83ms

So, with k=2, the TPS is reduced almost 3 times, the latency is increased by 2 orders of magnitude.

Could you tell me how you measure the latency?
Any idea about the 2 orders of magnitude increase in latency?

Thanks,
Tuan
re: k-safety
tcallaghan
May 17, 2010
Tuan,

The reason for the dramatic increase in latency is that in your Result #3 (k=2), you did not drop the rate limit on the client. Since you have your client "rate limited" to 200,000 TPS (or 100,000 TPS as in Results #4) you are fire-hosing your cluster. A fire-hosing client generates far more work than the cluster can handle.

The proper rate limit for your example would be <= 66,000 (200,000 / 3). The divisor in this figure is k + 1. Since you have k=2, VoltDB is storing 3 copies of all partitioned tables so your cluster has about 1/3 the number of unique partitions in which to perform stored procedure.

Please re-run your k=2 test with the client rate limited to 66,000 TPS and post your results.

-Tim
re-run
tuancao
May 17, 2010
Tuan,

The reason for the dramatic increase in latency is that in your Result #3 (k=2), you did not drop the rate limit on the client. Since you have your client "rate limited" to 200,000 TPS (or 100,000 TPS as in Results #4) you are fire-hosing your cluster. A fire-hosing client generates far more work than the cluster can handle...

-Tim


Hi Tim,

I re-ran my experiments, set client rate limit to 66,000 TPS.

[java] *************************************************************************
[java] System Statistics
[java] *************************************************************************
[java] - Ran for 120.63 seconds
[java] - Performed 7,878,156 Stored Procedure calls
[java] - At 65,309.52 calls per second
[java] - Average Latency = 170.14 ms
[java] - Latency 0ms - 25ms = 5,538,943
[java] - Latency 25ms - 50ms = 282,721
[java] - Latency 50ms - 75ms = 131,078
[java] - Latency 75ms - 100ms = 75,787
[java] - Latency 100ms - 125ms = 60,669
[java] - Latency 125ms - 150ms = 48,098
[java] - Latency 150ms - 175ms = 47,149
[java] - Latency 175ms - 200ms = 59,488
[java] - Latency 200ms+ = 1,450,560

The latency is 170.14ms, 8.5 times worst than the latency (20.19ms) in the case of k=0.

By the way, how do you measure the latency? Does that mean the time difference between client sends a request to one of the servers and receives a reply?

Tuan
re: k-safety
tcallaghan
May 17, 2010
Hi Tim,

I re-ran my experiments, set client rate limit to 66,000 TPS...

Tuan


Tuan,

The latency is measured as the time the stored procedures is queued in callProcedure() to when the response is received from the server.

Can you rerun your example a few times lowering the rate limit each time (try 60000, 50000, 40000, etc.) and post your results?

-Tim
re-run
tuancao
May 17, 2010
Tuan,

The latency is measured as the time the stored procedures is queued in callProcedure() to when the response is received from the server.

Can you rerun your example a few times lowering the rate limit each time (try 60000, 50000, 40000, etc.) and post your results?

-Tim



Hi Tim,

The latency got much better after I reduced client rate limits.

rate limit = 60,000
[java] *************************************************************************
[java] System Statistics
[java] *************************************************************************
[java] - Ran for 120.05 seconds
[java] - Performed 7,184,400 Stored Procedure calls
[java] - At 59,843.57 calls per second
[java] - Average Latency = 16.77 ms
[java] - Latency 0ms - 25ms = 6,781,505
[java] - Latency 25ms - 50ms = 34,438
[java] - Latency 50ms - 75ms = 19,475
[java] - Latency 75ms - 100ms = 16,747
[java] - Latency 100ms - 125ms = 15,580
[java] - Latency 125ms - 150ms = 13,636
[java] - Latency 150ms - 175ms = 13,022
[java] - Latency 175ms - 200ms = 12,229
[java] - Latency 200ms+ = 109,932

rate limit = 50,000
[java] *************************************************************************
[java] System Statistics
[java] *************************************************************************
[java] - Ran for 120.05 seconds
[java] - Performed 5,991,550 Stored Procedure calls
[java] - At 49,907.96 calls per second
[java] - Average Latency = 9.60 ms
[java] - Latency 0ms - 25ms = 5,792,425
[java] - Latency 25ms - 50ms = 12,251
[java] - Latency 50ms - 75ms = 9,144
[java] - Latency 75ms - 100ms = 7,554
[java] - Latency 100ms - 125ms = 6,840
[java] - Latency 125ms - 150ms = 6,697
[java] - Latency 150ms - 175ms = 4,125
[java] - Latency 175ms - 200ms = 3,353
[java] - Latency 200ms+ = 6,462

rate limit = 40,000
[java] *************************************************************************
[java] System Statistics
[java] *************************************************************************
[java] - Ran for 120.05 seconds
[java] - Performed 4,795,240 Stored Procedure calls
[java] - At 39,943.36 calls per second
[java] - Average Latency = 8.65 ms
[java] - Latency 0ms - 25ms = 4,649,065
[java] - Latency 25ms - 50ms = 7,239
[java] - Latency 50ms - 75ms = 5,131
[java] - Latency 75ms - 100ms = 4,112
[java] - Latency 100ms - 125ms = 3,806
[java] - Latency 125ms - 150ms = 2,758
[java] - Latency 150ms - 175ms = 1,831
[java] - Latency 175ms - 200ms = 1,686
[java] - Latency 200ms+ = 3,410
re: latency numbers + k-safety
tcallaghan
May 17, 2010
Hi Tim,

The latency got much better after I reduced client rate limits...



Those latency numbers look much better.

I have a task on my plate to create a benchmark to test client latency vs. client TPS, I will share my results when I have them.

-Tim