Forum: VoltDB Architecture

Post: Synch Delay Overrun

Synch Delay Overrun
henning
May 16, 2010
What happens if a transaction does not arrive "in time" at a site, i.e. after the time window for its time stamp is closed on the site that it was sent to and other transactions have already been processed?


Is the the timestamp strictly relative to all other transactions or an absolute time that is tried to kept in sync across sites?


Thanks,
Henning
Not how it works anymore...
jhugg
May 16, 2010
I'm presuming this is based on one of the H-Store papers where transaction ordering is described as a delay of 10ms.


Current VoltDB uses a different scheme to ensure that transactions can't be executed out of order. I will simplify a bit, but try to explain.


Assume there are 3 nodes A,B,C assigning timestamps (transaction ids). A site can execute work Tx from A, if the site has seen a Ty from B and a Tz from C, such that Ty > Tx and Tz > Tx. We assume that TCP connections send values in order, so if the site has seen a transaction id from site B, it's not going to receive an earlier transaction in the future. This scheme actually has latency advantages. If all three nodes are sending work furiously, then the site doesn't have to wait 10ms to execute. If one of the nodes isn't being sent work, it will periodically (5ms or so) send empty work to all sites, just so the scheme doesn't break.


This is complicated slightly by our failure detection and replication code. We do ensure that if a transaction is applied at one partition/site pair, it will be applied at all replicas of that partition.
:-)
henning
May 18, 2010
Wow, that was a very, very interesting bit of information that was missing to get a clear picture of the inner workings of VoltDB. Thanks!


Pretty darn elegant!


Yes that was from the H-Store paper. I would love to see a paper of this sort/size/stance about VoltDB.


"Complicated slightly" is a euphemism, is it?
Partition Sensitivity
henning
Jun 1, 2010
Just reread and rethought.

"it will periodically (5ms or so) send empty work to all sites, just so the scheme doesn't break."


That makes VoltDB rather sensitive to network digestion or partitioning I guess. How do you overcome this?
I choose my words poorly.
jhugg
Jun 1, 2010
Just reread and rethought.



That makes VoltDB rather sensitive to network digestion or partitioning I guess. How do you overcome this?


If a message doesn't come, nothing breaks; it just waits for another message. Usually this empty work is important when the workload is too light, hence the need to artificially push it along.


In the case where the pipe to an execution site is congested AND a foreign node's initiator hasn't sent any work to it in 5ms, then it's possible the empty work will be delayed several milliseconds and the clogged site will run slightly slower. There was some new code in the 0.9.01 release that made this empty work sending process much more reliable as well. In the end, it's not something I worry about a lot.


Network partitions are a problem in VoltDB in general, but not more so because of any of this. For now we expect you to run nodes on a single switch for this reason. One thing we're talking aboout is to not allow clusters with <= n/2 nodes to do work (optionally). This would prevent a cluster from ever splitting into two, still working clusters. We also expect to support redundant switches/networks in a future version (optionally).