Forum: Building VoltDB Applications

Post: Stop node during transaction

Stop node during transaction
vdbusr
Mar 15, 2016
Hi,

We detached the one of the nodes from the cluster by killing VoltDB process for the maintenance.
At that time, the following error has occurred.

org.voltdb.client.ProcCallException: Connection to database host (hostname and IP of one of 5nodes) was lost before a response was received

In our assumption, the result will be received. Because two other nodes has replicated partition and they can response to client.
(My environment is VoltDB 4.9.3, hostcount = 5, kfactor = 2)

Then I have two question,
1. Will The "SELECT" procedure execute at only one node regardless of kfactor?
2. Can i avoid interruption of procedure by node detaching?
( Can "@StopNode" avoid this Exception? Or should I handle it in procedure or client apps? )
bballard
Mar 15, 2016
Yes, there is an optimization so that a read-only transaction will only run on one node. Writes are executed simultaneously on all partition replicas in the cluster based on the k-factor. But what is important is which node received the request and which single-partition-initiator (SPI) it is queued in. If that is the node that is stopped, there won't be a response. After the node is stopped, the call could be made to one of the remaining nodes and it would execute and send the response.

@StopNode is basically equivalent to killing the VoltDB process on one node. It won't wait for anything to complete first, it just stops immediately. If the client is connected to all nodes, there will be a certain percentage of the outstanding requests at that moment that will not get a response before the connection is lost. It's possible some of those were committed but the response didn't go out in time, in which case the other nodes will reflect the result of the transaction. All the procedure calls that were waiting (in queues on that node) to be processed will never get a response.

Let's say you have a stream of uniquely identified records you are passing into procedure calls and you want to ensure nothing is missing while taking a node down. There are two ways you can do this:

1. If you receive a response with the result that the transaction was committed, you know it was committed and durable. If you didn't, you may need to check if the record is in the database, and if not, make a new procedure call. Or, if your procedure is idempotent, you can simply make a new call for all the responses you didn't receive.

2. Pause the database momentarily before stopping the node. Design your client so when the database is paused it waits before sending any more requests. Then stop the node and resume the database. This way, when the other nodes take over the duty of managing the single-partition initiator (SPI) queues from the stopped node, nothing was lost because the queues were already empty.
vdbusr
Mar 16, 2016
Thanks for helping, I almost understand.

Please let me ask a question about this.
If that is the node that is stopped, there won't be a response.

Is that mean that the transaction is completed but there is no response?
Or does it mean that the transaction will not running on any node?
rmorgenstein
Mar 16, 2016
It could be either case. Take a look at https://docs.voltdb.com/UsingVoltDB/DesignAppErrHandling.php for more information on errors.
vdbusr
Mar 25, 2016
Writes are executed simultaneously on all partition replicas in the cluster based on the k-factor. But what is important is which node received the request and which single-partition-initiator (SPI) it is queued in. If that is the node that is stopped, there won't be a response. After the node is stopped, the call could be made to one of the remaining nodes and it would execute and send the response.

I dont understand this collectly.

In our environment, this issue only occurs by SELECT statement.
Write statements (insert, upsert, update, delet) return the result without any exception.
I though this means Remaining nodes can return the result event if the node received SPI stopped.
Or VoltDB internally reprocess it if the node received SPI could not return results?
Or a problem of the probability...?

I am a little confused....
jhugg
Mar 25, 2016
If you send a transaction to node A, and node A fails before you get a response, it is possible that your transaction either committed or aborted. VoltDB simply guarantees that it's not in some half-finished partial state.

For read-only transactions, there is nothing to commit, so it doesn't matter whether the work was done or not. If you still want the response to the query, you will need to resend the request to a live node.
vdbusr
Mar 28, 2016
I understand about read-only transaction need to retry.


If you send a transaction to node A, and node A fails before you get a response, it is possible that your transaction either committed or aborted.

In this case,
If client is also connected to live node B and C, can client receive the result of transaction?
Or Does it need to handle ProcCallException to make sure the transaction is commited?