Forum: VoltDB Architecture

Post: Messages from failed sites

Messages from failed sites
sdrobert
Jul 28, 2011
Hello and thanks for your earlier help,


How does VoltDB guarantee that messages from failed sites will be ignored? For example, suppose the fault handler synthesizes a 'commit' message in response to a node failing. The initiator might take that and the non-failed commits and think that a transaction is completed. In the meantime the HostMessenger deserializes a commit message from a failed site. The initiator has no idea how to handle this. Is this possible? How is it resolved if so?


Thanks,
Sean
I also noticed that in the
sdrobert
Jul 29, 2011
I also noticed that in the SimpleDtxnInitiator class, a HashMap stores pending transactions. How is concurrency maintained on said list (can removeSite touch the fault at the same time as deliver)?
Hi Sean, The initiator and
aweisberg
Jul 30, 2011
Hi Sean,


The initiator and its mailbox are always accessed under synchronized blocks or methods. The blocks all synchronize on the initiator instance (even in the mailbox). You are right that it is possible for the a message from a failed node to arrive just as the node is being timed out. In between the initiator and the execution sites, I identified some instances where it would be harmless, some that would result in an exception that wouldn't do any damage, and some where a response might be sent to the client incorrectly. We should definitely wait for the socket to close and all messages to be delivered before reporting the fault to the fault distributor. If the socket is closed (either on the remote end, or by a timeout) then all the messages are guaranteed to be delivered before failure processing starts.


-Ariel
I created tickets for both
aweisberg
Jul 30, 2011
I created tickets for both the issues you found. https://issues.voltdb.com/browse/ENG-1617 and https://issues.voltdb.com/browse/ENG-1616.