Forum: VoltDB Architecture

Post: Understanding Commit and Sync up between partitions.

Understanding Commit and Sync up between partitions.
interntest
Apr 9, 2015
If I understand correctly, VoltDB uses some form of consistent hashing to have an arrangement of sites. So if I have K-safety set to 1, for any set of data, I will be having a primary partition and a replica partition. Now,

Q1: Am I right in saying that "when a primary partition receives a query, it first tees that query to its replica. Only upon hearing from the replica about the changes and comparing it with the changes incorporated in the primary partition, it either commits (if changes are same throughout) or rollbacks (mismatch) ; and subsequently acknowledges the client". If however, this is not the procedure then kindly shed some light on it.

Q2: In continuation the Q1, since primary partition tees up the query to all its replica, so it must have knowledge on the location of all its replica. But what if the primary partition goes down, how will other know which replica to promote?

Q3: Lets take an example of Chord (distributed hash table) where each node (host machine) are part of an identifier circle. So in case of VoltDB, what are the nodes on that identifier circle (it may not exist, but I'm just taking an analogy) - the host machine, the primary partition or the replica partition? I believe its primary partition. If so, then does all these replicas also form some sort of secondary ring (identifier circle)?
jhugg
Apr 9, 2015
Q1: Am I right in saying that "when a primary partition receives a query, it first tees that query to its replica. Only upon hearing from the replica about the changes and comparing it with the changes incorporated in the primary partition, it either commits (if changes are same throughout) or rollbacks (mismatch) ; and subsequently acknowledges the client". If however, this is not the procedure then kindly shed some light on it.


Single partition writes are sent to the primary first. The primary replica establishes a definitive ordering of operations, which is distributed to any replicas. The replicas and primary nodes process the operations in parallel. Responses from the replicas are sent to the primary and collected with the primary response. If all responses match, then the response is sent back to the client as committed. If one of the replicas fails, then the primary will wait for all live replicas to respond before forwarding the answer to the client.

But what if the answers are different between the primary and the replica? By design, they should never be different. VoltDB assumes replicas have the same state as the primary, and will apply the same deterministic operations in the same order to that state. The only time responses can be different is if user procedures use non-deterministic logic in their procedures, such as using an unseeded PRNG from the Java library instead of using one from our APIs that guarantee consisting PRNG seeding across replicas. We will detect this fault using checksums and shut down the cluster. A developer will have to fix the source of the non-determinism to continue. This comes up in testing sometimes, but is extremely rare in production, only happening when developers fail to read our procedure guidelines on determinism (http://docs.voltdb.com/UsingVoltDB/DesignProc.php). Note that when we fail, we can't just roll back the transaction because we don't use two-phase-commit for single-part transactions for performance reasons. We don't need it for deterministic workloads.

Q2: In continuation the Q1, since primary partition tees up the query to all its replica, so it must have knowledge on the location of all its replica. But what if the primary partition goes down, how will other know which replica to promote?


We use an embedded ZooKeeper cluster for certain metadata, state machines and leader election. We have modified it to run in our VoltDB process without any additional config. It also uses our failure model, so we can add nodes and fail nodes and continue to always have a working ZooKeeper.

Q3: Lets take an example of Chord (distributed hash table) where each node (host machine) are part of an identifier circle. So in case of VoltDB, what are the nodes on that identifier circle (it may not exist, but I'm just taking an analogy) - the host machine, the primary partition or the replica partition? I believe its primary partition. If so, then does all these replicas also form some sort of secondary ring (identifier circle)?


The partitions are logical. Partition keys hash to a ring. The ring is divided into sections owned by logical partitions. Each logical partition is owned by a primary. Each primary manages a set of replicas.
interntest
Apr 13, 2015
Thanks @jhugg