Forum: Building VoltDB Applications

Post: Performance for reading goes down when kfactor>0

Performance for reading goes down when kfactor>0
Anton
Apr 24, 2015
Hi all,

I have created simple test - reading several tables. And I found out next:
- when kfactor=0 - reading lasts about 14 seconds
- when kfactor=1 - reading lasts about 37 seconds.

I only change kfactor and restart database. Other parameters (data, partitions, schema - all) are identical. Could you please explain such wierd thing ?

Thank you.
pzhao
Apr 24, 2015
Anton,

Kfactor controls the number of copies of the data a cluster keeps. This provides availability and durability in the event of a node going down. If kfactor is 0, there's no copy of the data and a failure on a node will result in cluster failure. If kfactor is 1, there's a copy of the data and a failure on a node will not result in cluster failure.

To help explain what is happening between difference in kfactor, lets assume sites/host=2, nodes=2 and kfactor=0, your data will be split into 4 unique partitions ( (2 nodes * 2 sites/node) / (0 kfactor + 1) ) among 2 nodes. Voltdb partition data automatically, but for simplicity, lets say partition 1/2 is on node 1 and 3/4 is on node 2. When you execute a query with no where clause, every partition needs return its' data back to coordinator then returned back to the user. All 4 partitions are actively processing the workload.

Now lets change the kfactor=1. This means your unique partitions = 2 ((2 nodes * 2 sites/node) / (1 kfactor + 1)) but you have a copy of the data. The partitions is now 1/1/2/2. VoltDb will automatically partition the data, that is node 1 and 2 will have the a copy of the data, 1/2. When you execute a query with no where clause, only 2 of the partitions needs return its' data back to coordinator then returned back to the user. This actually will take longer as only 2 partitions are working on 50% of the data as opposed to the 4 partitions working on only 25% of the data. This is what your experiencing.

Assuming your hardware is capable, achieving the same kfactor=0 latency is done by doubling the sites/host. That is sites/host=4, nodes=2, and kfactor=1, yields 4 unique partitions (( 2 nodes * 4 sites/host) / ( 1 kfactor + 1)).

May I ask what kind of application you are building? Typically, VoltDB applications would not require a full table scan of multiple tables.

Peter Zhao
Anton
Apr 27, 2015
Thenk you for answer - I understand why writing demands more time. It is clear. And when I read from several partitions.
But why reading for single partition demands more time ? I created reading SP for single partition - I did not try to get information from several partiotions.
pricesj
Apr 27, 2015
Hi Anton, just to rephrase what Peter said.

When you change the kfactor to 1, you halved the number of unique partitions, which has the net effect of halving the number of execution engines and hence parallel queries. So if you have hardware capacity, double your sites per host and you should see the query execution time similar to before.

Stephen
Anton
Apr 27, 2015
I see. Thank you.