Forum: Building VoltDB Clients

Post: Recovery II

Recovery II
henning
Apr 16, 2010
I wonder how the client could make sure that a certain transaction actually will be executed before a snapshot.
Except by


  • waiting for it to return – which in a heavy load, asynchronous scenario would require some sort of book keeping on the client side.
  • Or a wait interval – which seems unreliable.

My thinking goes that the fact that no guarantees are given what the order of execution will be, makes this a potential quandary.
This is related to


Thanks!
Henning
Orderly shutdown sequence / Recovery
chbussler
Apr 17, 2010
Hi Henning,
my vision of an orderly shutdown is that the VoltDB server
- gets into the shutdown state
- finishes all open procedures to be executed
- then does a 'final' snapshot
- and then really shuts down
So I am assuming that I can tell the VoltDB server, 'take a snapshot after all procedures are done', and so all clients are sure that their side effects are part of it.
In the recovery case, it seems to me that clients do not know about snapshots directly, so they cannot reason about the relationship between their stored procedures and scheduled snapshots. Only if a client initiates a snapshot itself, it knows that all its procedures' results are part of it.
Thanks,
Christoph
Focus on Snapshot Contents
henning
Apr 17, 2010
Hi Henning,
my vision of an orderly shutdown is that the VoltDB server
- gets into the shutdown state
- finishes all open procedures to be executed
- then does a 'final' snapshot
- and then really shuts down
So I am assuming that I can tell the VoltDB server, 'take a snapshot after all procedures are done', and so all clients are sure that their side effects are part of it.
In the recovery case, it seems to me that clients do not know about snapshots directly, so they cannot reason about the relationship between their stored procedures and scheduled snapshots. Only if a client initiates a snapshot itself, it knows that all its procedures' results are part of it.
Thanks,
Christoph


I was more focussed on the predictability of the contents of a snapshot with this post.
But in answer to your comment: in my view, where it gets interesting in the shutdown sequence is when transactions fail while 'the last ones are executed'. That's why that 'Strategy #2' ( https://community.voltdb.com/node/64#comment-138 ) got to look so tangled.
But generally, sure, the 'final shutdown snapshot' is probably what we are all virtually encircling here.
Still, in this thread I was actually trying to discuss that even this can sometimes be invalid: "Only if a client initiates a snapshot itself, it knows that all its procedures' results are part of it.", because there is no exact guaranteed order.
Under usual conditions, a short wait will do. But it's not unlikely that snapshots and recovery will be of interest exactly when a system starts to break. And under such conditions, a wait may still not suffice to be sure that even all own transactions are done with, unless you book-keep everything you sent on the client side.
I did not see a client method that delivers a count of outstanding asynchronous transactions. Did I miss it?
I think that might be possible and valuable. Possible because I am currently under the impression that there must be an internal list in the Client class that tracks callback functions. ( https://community.voltdb.com/node/83 ) If this is so, the length of this could be used. Valuable, well to solve the very question 'are all my transaction done?' Without wait or book-keeping.
The original question would ask about all transactions from all clients though. Like, how could I draw a line in the sand and force an order of transactions and snapshots.
Different scenarios
chbussler
Apr 18, 2010
I was more focussed on the predictability of the contents of a snapshot with this post.
But in answer to your comment: in my view, where it gets interesting in the shutdown sequence is when transactions fail while 'the last ones are executed'. That's why that 'Strategy #2' ( https://community.voltdb.com/node/64#comment-138 ) got to look so tangled.
But generally, sure, the 'final shutdown snapshot' is probably what we are all virtually encircling here.
Still, in this thread I was actually trying to discuss that even this can sometimes be invalid: "Only if a client initiates a snapshot itself, it knows that all its procedures' results are part of it.", because there is no exact guaranteed order.
Under usual conditions, a short wait will do. But it's not unlikely that snapshots and recovery will be of interest exactly when a system starts to break. And under such conditions, a wait may still not suffice to be sure that even all own transactions are done with, unless you book-keep everything you sent on the client side.
I did not see a client method that delivers a count of outstanding asynchronous transactions. Did I miss it?
I think that might be possible and valuable. Possible because I am currently under the impression that there must be an internal list in the Client class that tracks callback functions. ( https://community.voltdb.com/node/83 ) If this is so, the length of this could be used. Valuable, well to solve the very question 'are all my transaction done?' Without wait or book-keeping.
The original question would ask about all transactions from all clients though. Like, how could I draw a line in the sand and force an order of transactions and snapshots.


Hi Henning,
I see where you are going with this, thanks for responding. It seems to me that we have different scenarios that we might distinguish, at least initially:
1) Regular shutdown (e.g. because of maintenance of the hardware, etc.). In this case the application is running fine and no error states are observed.
2) Shutdown because of problems (e.g., clients start to see a lot of aborts, etc.)
The question here is, who triggers the shutdown. I originally simply wanted an administrator to initiate a shutdown of a normally running system. But as you point out, there are other cases.
Regarding the client, I think there are the cases:
a) Client needs to get its transactions through before shutdown. This is the case you describe where the client has a failed transaction and wants to retry it. Also, in all your 3 strategies, the server asks the clients to submit all they want to execute before the shutdown.
b) Clients cannot retry. This is the case that if during a shutdown a client transaction fails (that was already submitted) then no retry will be possible.
c) Clients are interrupted by shutdown. There is the case that the server does not ask the clients for the transactions they want to execute; they simply get told, which transaction finished normally, and which will not be executed due to shutdown.
In the case a) there might be an endless retry, so that would have to be addressed so that does not prevent the shutdown. In b) the client has to do bookkeeping to understand which of his transactions went through, and which failed. In c) the client needs to keep track which ones succeeded, and which ones won't be even tried by VoltDB.
If the system gets into an unstable state, and the clients cannot be sure what transactions went through, then there is the case where a client, after the system is back up, must have a way to test for data that allow it to see which of its transactions went through.
In all cases the client must have a way to keep track of its transactions, agree. However, I think a client must track the whole transaction, not just a counter, in case a client has to test for successfully executed transactions after restart.
I also assume that the database is in a consistent state after each client transaction (in normal operation). So in the worst case scenario, if a client creates a snapshot after each of its transactions, the series of snapshots should have consistent states only (I agree this is a lot of resources, but in a failing system it might get as close as possible to a last consistent state).
Not sure where we go with all of this, but I more and more believe that in a production system this needs the help of VoltDB; just trying to solve this by myself with application code is probably difficult.
Thanks,
Christoph