Forum: VoltDB Architecture

Post: Initiators and execution sites

Initiators and execution sites
robmartin0
Dec 23, 2013
Hi,

Im a little confused about the number of initiators. The performance guide states "There is one initiator for each unique partition, and a separate initiator for multi-partition transactions within the cluster". However the "Lifecycle of a Transaction" blog post states "There is one Initiator at each node". Could someone clarify which of these statements are correct and maybe explain the role of execution sites and how they relate to the initiators. Are the initiators or the execution sites single threaded?
If someone could explain the order in which the various components interact for both single and multi partition transactions, that would be great.
jhugg
Dec 23, 2013
What you read in the performance guide is correct for all versions of VoltDB >= 3.0. The blog post covers an older transaction coordination scheme we replaced in 3.0. The new system no longer has a global a-priori ordering for all transactions, and the cross-partition work is now serialized through one node. The benefits are much less dependency on NTP synchronization for latency and throughput, much much lower latency, and better scaling to large number of partitions. The change in cross-partition coordination has lots of implications, but the 3.x releases and upcoming 4.x releases generally have improved cross-partition performance over the 1.x and 2.x releases.
robmartin0
Dec 23, 2013
So if we have an initiator for each partition plus the multi-partition initiator, does each initiator have a corresponding site executor?
jhugg
Dec 23, 2013
The multi-partition initiator doesn't have a corresponding site executor. It does have the ability to borrow a host-local executor temporarily. The partition initiators have K+1 site executors for a given redundancy level, one of which will be located on the same host as the initiator.
robmartin0
Dec 23, 2013
I currently have a SP procedure and a MP procedure.
When I run @statistics PROCEDURE I am informed that the SP procedure ran on HOST 0 SITE_IDs 0 and 1 (with PARTITION_ID 0 and 2 respectively) and on HOST 1 SITE_IDs 0 and 1 (with PARTITION_ID 1 and 3 respectively) which is exactly what I expect.
The report also specifies that the MP procedure ran on Host 1 SITE_ID 2 and PARTITION_ID 16383. Obviously the multi-partition initiator is running on host 1 but why does it report the SITE_ID as 2. Shouldn't it borrow the site executor with ID 0 or 1 on the same host? And what does partition id 16383 represent?
Thanks
jhugg
Dec 23, 2013
The MP Initiator has it's own Site ID for message passing purposes, which is separate from the execution site ids. In this case it's 2. There are more site ids than actual execution sites, as "site" and "execution site" are not quite the same thing.

The partition id assigned to the MPI is the maximum valid partition id, which is currently 2^15-1. This might be a bit confusing for stats purposes.
robmartin0
Dec 23, 2013
Thanks for you help. That has really cleared a lot up for me.
Final question - If there is only one MP initiator then all MP procedures will need to be processed (or coordinated) on the same node. This will send requests to the executions sites for all partitions in parallel which will require network calls to other nodes before "post-processing" the results. Am I right in thinking the MP initiator will do all this in a single threaded environment processing only one MP procedure at a time? And if network latency is poor this will severely effect throughput of the MP initiator?
jhugg
Dec 23, 2013
What you describe is pretty much the case in 3.0. There are some optimizations done here and there (especially for reads), but write throughput is limited to thousands of transactions per second (or lower depending on network latency).

But this isn't a fundamental problem. In 2014, we plan to roll out a series of updates to allow certain kinds of MP transactions to run simultaneously, allowing for much higher throughput. The trick is identifying transactions that can't conflict. Reads are the obvious example, but there are many kinds of write transactions that can be run simultaneously as well.