Forum: VoltDB Architecture

Post: Impacts of k-Safety

Impacts of k-Safety
dhyoon
Oct 15, 2010
My interest is how k-safety mechanism impacts throughput and latency.


I used tpc-c example, and the system has 8 servers and 4 clients (4 partitions per server).
In this case, the throughput was 86k txn/s when k=0, 43k txn/s when k=1, and 27k txn/s when k=2.
I assume that the throughput is primarily impacted by the number of unique partitions (it's 32, 16, and 10, when k is 0, 1, and 2, respectively).


To measure the latency of a transaction with k-safety, then I configured the system with 1 server, 1 client, and 1 partition per server (k=0). My guess is that this configuration doesn't allow any parallel transaction processing, hence, the TPS could be translated to latency easily. I used 2 servers, 1 client, and 1 partition per server when k=1, and 3 servers, 1 client, 1 partition per server when k=2.


The result was 3.2k txn/s or 0.3ms (k=0), 0.75k txn/s or 1.3ms (k=1), and 0.69 txn/s or 1.44ms (k=2). I think the penalty with k-safety is a bit high, so I doubt whether my assumptions (on single partition configuration) were right.


I also did the same experiment, but 2 partitions per server, then it resulted around 7k txn/s for all 3 cases (k is 0, 1, and 2, but higher k only slightly degrades throughput).


Can anybody explain the mechanism how k-safety impacts performance (in latency and throughput), also suggest a good way of measuring the latency of transactions with different kFactors if I was totally wrong?

Thanks.
re: Impacts of k-Safety
tcallaghan
Oct 19, 2010
Dyhoon,


Can you provide details on the hardware/OS of all the machines in your benchmark? Specifically CPU model + quantity, RAM amount + speed, networking equipment, etc.


You are correct in that the throughput of a k=1 cluster will be about 50% of k=0.


As for latency, be careful measuring latency as 1/(transactions-per-second). If you look in the VoltDB examples "Voter" application, you'll see a better way of measuring latency. We do not currently have these latency measurements in our "benchmarking" framework which I believe you used for your testing.


-Tim
Hey, My system has dual-core
dhyoon
Oct 20, 2010
Dyhoon,


Can you provide details on the hardware/OS of all the machines in your benchmark? Specifically CPU model + quantity, RAM amount + speed, networking equipment, etc.


You are correct in that the throughput of a k=1 cluster will be about 50% of k=0.


As for latency, be careful measuring latency as 1/(transactions-per-second). If you look in the VoltDB examples "Voter" application, you'll see a better way of measuring latency. We do not currently have these latency measurements in our "benchmarking" framework which I believe you used for your testing.


-Tim


Hey,


My system has dual-core Opteron processors -- I used 8 servers for TPC-C processing and 4 servers as clients. Each server has 4 logical processors (I believe 2 concurrent threads per core) and memory capacity is 32GB, but I only set up 4GB for TPC-C build.xml file.


So, my understanding is k=1 has 50% of throughput and k=2 has only 33% of throughput compared to k=0. My question is what caused this degradation. Is it because of the reduced number of unique partition?
I expected k-safety would degrade performance in many ways; every transaction needs communication with others, increasing the latency of each transaction, for instance.


Then, as long as we have the same number of unique partition, k=2 can perform the same as k=0? (For instance, what if we compare 24 servers with k=2 and 8 servers with k=0?)


-Doe Hyun
re: Impacts of k-Safety
tcallaghan
Oct 21, 2010
Hey,


My system has dual-core Opteron processors -- I used 8 servers for TPC-C processing and 4 servers as clients. Each server has 4 logical processors (I believe 2 concurrent threads per core) and memory capacity is 32GB, but I only set up 4GB for TPC-C build.xml file.


So, my understanding is k=1 has 50% of throughput and k=2 has only 33% of throughput compared to k=0. My question is what caused this degradation. Is it because of the reduced number of unique partition?
I expected k-safety would degrade performance in many ways; every transaction needs communication with others, increasing the latency of each transaction, for instance.


Then, as long as we have the same number of unique partition, k=2 can perform the same as k=0? (For instance, what if we compare 24 servers with k=2 and 8 servers with k=0?)


-Doe Hyun


Doe,


Thanks for the further clarification. You are correct that the total number of unique partitions directly affects the total TPS in a system. Thus, increasing the k-factor will reduce the number of unique partitions and overall TPS.


You are also correct that an increased k-factor will have some impact on system latency.


Can you provide the specific details of your Opteron processors? I have 3 servers in-house with Quad-Core Opteron 2376 processors and would like to compare my results with yours. You can get the details easily by running "/usr/sbin/dmidecode" on one of the boxes.


-Tim
system details
dhyoon
Oct 28, 2010
Doe,


Thanks for the further clarification. You are correct that the total number of unique partitions directly affects the total TPS in a system. Thus, increasing the k-factor will reduce the number of unique partitions and overall TPS.


You are also correct that an increased k-factor will have some impact on system latency.


Can you provide the specific details of your Opteron processors? I have 3 servers in-house with Quad-Core Opteron 2376 processors and would like to compare my results with yours. You can get the details easily by running "/usr/sbin/dmidecode" on one of the boxes.


-Tim


Hi, sorry for this late reply.


The system is HP ProLiant BL465c G1, the processor is AMD dual core Opteron 2216HE model running at 2.4GHz.
I can see four logical processors assuming hyperthreading is enabled.


-Doe Hyun
>>Then, as long as we have
gambitg
Oct 27, 2010
>>Then, as long as we have the same number of unique partition, k=2 can perform the same as k=0? (For instance, what if we compare 24 servers with k=2 and 8 servers with k=0?)


That is an interesting theory Doe. Would be interesting to try it out and see the results.
I doubt though it will behave that way. The throughput reduces because of the several writes that have to happen before the write is considered commmitted. (twice for k=1 and thrice for k=2). It will definitely get better because of increased parallelism but doubt 24 servers with k=2 will give the exact throughput as 8 servers with k=0.
More tests and questions
dhyoon
Oct 28, 2010
>>Then, as long as we have the same number of unique partition, k=2 can perform the same as k=0? (For instance, what if we compare 24 servers with k=2 and 8 servers with k=0?)


That is an interesting theory Doe. Would be interesting to try it out and see the results.
I doubt though it will behave that way. The throughput reduces because of the several writes that have to happen before the write is considered commmitted. (twice for k=1 and thrice for k=2). It will definitely get better because of increased parallelism but doubt 24 servers with k=2 will give the exact throughput as 8 servers with k=0.


Hi,


I didn't have chances to run with 24 servers, so I tested with 2 (k=0), 4 (k=1), and 6 (k=2) hosts.


In each case, I used 4 clients and 3 sites per host so total 6 unique partitions in all cases.


TPC-C results are


19894 tx/s when k=0
19701 tx/s when k=1
19157 tx/s when k=2


So, roughly the same throughput regardless of k-safety when the number of unique partitions is identical.


Well, the result was completely different when I used 1 unique partition (1 sites per host, 1 server with k=0, 2 servers with k=1, and 3 servers with k=2).


It was 3282 tx/s (k=0), 756 tx/s (k=1), and 693 tx/s (k=2).


When I changed sites per host to 2 (so 2 unique partitions), then throughput becomes comparable as shown in the 2/4/6 server case.


I wonder why k=0 is not substantially better than k=1 or k=2 when the number of unique partitions is same, and what makes the 1 unique partition case so much different.


-Doe Hyun
re: Impacts of k-Safety
tcallaghan
Oct 28, 2010
>>Then, as long as we have the same number of unique partition, k=2 can perform the same as k=0? (For instance, what if we compare 24 servers with k=2 and 8 servers with k=0?)


That is an interesting theory Doe. Would be interesting to try it out and see the results.
I doubt though it will behave that way. The throughput reduces because of the several writes that have to happen before the write is considered commmitted. (twice for k=1 and thrice for k=2). It will definitely get better because of increased parallelism but doubt 24 servers with k=2 will give the exact throughput as 8 servers with k=0.


Gambitg,


I ran 3 benchmarks on a different application we call 20-index on a 6 node cluster of VoltDB v1.2.03, results are as follows:


k=0, 498,233 TPS
k=1, 263,705 TPS
k=2, 178,260 TPS
I do believe that these numbers would hold up with a larger number of servers, if/when I get the whole 12-node cluster I'll rerun and share my numbers.


-Tim