Forum: Managing VoltDB

Post: Orderly system shutdown?

Orderly system shutdown?
chbussler
Apr 9, 2010
Hi,
I'd like to shutdown my system 'orderly':

- stop all clients from accessing the database (except the one that does the shutdown) so that no changes take place any more

- perform a snapshot save so that the last state of the database is saved

- perform the shutdown

What is the best way to prevent clients from accessing the system? Is this functionality that I have to build into the client logic, or can I tell the database to not accept client requests any more (except from the client that manages the shutdown)?

Thanks,
Christoph
Orderly shutdown
tcallaghan
Apr 9, 2010
Christoph,

The logic you outlined is our current best practice. You need to stop your client applications, snapshot, and shutdown.

Its probably a good idea to build logic into your client applications so they understand how to handle a "system going down" gracefully, otherwise the applications that rely on the client applications will most likely fail.

-Tim
Thinking aloud ...
chbussler
Apr 9, 2010
Christoph,

The logic you outlined is our current best practice. You need to stop your client applications, snapshot, and shutdown.




Hi Tim,

just thinking aloud here. I could create a table that indicates a client access level, like access_ok, no_access, shutting_down. And all stored procedures check that level. Based on the level, stored procedures either go ahead and continue, or report back to the client that they cannot proceed.

Since this table is read only, clients accessing this table during a snapshot save would not interfere with the snapshot save. After the shutdown clients get the regular error when the database is not running.

Would that work?

The reason behind this is that (at least initially) I'd like to avoid having to have another persistent data store somewhere for that (like a configuration file) or trying to have to reach all clients or have the clients to coordinate on a single machine.

Thanks,

Christoph
Orderly system shutdown?
tcallaghan
Apr 9, 2010
Hi Tim,

just thinking aloud here. I could create a table that indicates a client access level, like access_ok, no_access, shutting_down. And all stored procedures check that level. Based on the level, stored procedures either go ahead and continue, or report back to the client that they cannot proceed.




Christoph,

That is a great idea, I think it will give you exactly what you want. Just make sure all your stored procedures check/respect this setting and that nobody is able to do ad-hoc insert/update/delete.

-Tim
Convention
chbussler
Apr 9, 2010
Christoph,

That is a great idea, I think it will give you exactly what you want. Just make sure all your stored procedures check/respect this setting and that nobody is able to do ad-hoc insert/update/delete.



Hi Tim,

thanks for confirming. Yes, totally agree that there is a convention that has to be agreed to by those who write the stored procedures.

How would I be able to prevent ad-hoc access?

I think to be able to 'inject' behavior into stored procedures independent of how the stored procedure got into the system would be really interesting (including adhoc access). Like external AOP definitions that that the Volt compiler picks up automatically. Just a thought.

Thanks,

Christoph
Ad-hoc security
tcallaghan
Apr 9, 2010
Hi Tim,

thanks for confirming. Yes, totally agree that there is a convention that has to be agreed to by those who write the stored procedures.




Christoph,

In your project.xml file you need to enable security:

<security enabled=“true” />
When you do, all connections will now need to authenticate.
You can then allow/deny access to adhoc for users and groups as follows:

<user name=“<groupname>” password=“<password>” adhoc=“true” />
<group name=“<groupname>” adhoc=“true” />
-Tim
Built-In
henning
Apr 16, 2010
I would ask the team to provide a built-in solution for this. I think this would help developing.

My cursory thoughts on that went into the same direction like Christoph's assessment of what he needs. Would not most developers need exactly the behavior that Christopher describes?

Plus, because of the non-guarantee of the order of execution of requests, my conjecture would be that you can't really program a clean solution for shutdown from the outside.

[This was posted after Tim's first response.]
It is relatively straight
rbetts
Apr 16, 2010
I would ask the team to provide a built-in solution for this. I think this would help developing.

My cursory thoughts on that went into the same direction like Christoph's assessment of what he needs. Would not most developers need exactly the behavior that Christopher describes?




It is relatively straight forward to add quiesce functionality to VoltDB -- to begin denying all new procedures while finishing all submitted procedures. The system can know the last transaction id of its quiesced state and execute a snapshot before terminating its processes.

It is more difficult, probably impossible, for VoltDB to know that all clients have fully processed their responses. An orderly shutdown process like this can not guarantee that all responses were processed by the application.

Arranging a clean shutdown "from the outside" requires that you can quiesce your client application. This is possible but does create design and implementation work for the application author which is not desirable. However, this does allow full processing by the client application.

We've thought a fair bit about shutdown processes internally and have been eager to hear from some application authors on the subject. In particular, understanding better if most applications will be resilient to a shutdown-database or if the application needs to be statefully informed by an operator; understanding requirements around quiescing ELT processes; understanding the desired operational practices with respect to the full application stack when needing to shutdown VoltDB. There are also operational considerations around terminating VoltDB nodes. Ideally, when a node fails an operator must be informed and a recovery process must be initiated. Shutdown should probably not start this operational process.

Thank you for your feedback - we appreciate it.

*--Ryan.
Strategy Proposals
henning
Apr 16, 2010
It is relatively straight forward to add quiesce functionality to VoltDB -- to begin denying all new procedures while finishing all submitted procedures. The system can know the last transaction id of its quiesced state and execute a snapshot before terminating its processes.





There won't be a clean and easy solution, I guess. Which would be one more argument for having it built-in, if true.

You mention that you mulled about this for a bit and I guess there are few options you have not looked at. Still, my two cents, to describe what I would like to have, as far as I can see.

"It is more difficult, probably impossible, for VoltDB to know that all clients have fully processed their responses."

This is one step beyond what I had been looking at, if I understand correctly. You are saying that in the end, you even want to give clients a chance to react on failed transactions and execute some fallback that might involve contacting the server again? I would have been satisfied with less but of course, you are right. And yes, this puts responsibility on the clients. But it should not be programmed there, but only 'queried' from it, in form of a notification as detailed below.

The rigid sequence of status exchanges becomes necessary for the asynchronous transactions and the intrinsically oscillating nature of VoltDB execution, where the order of things is not to be predicted.

The issue seems akin to the intricacies of dealing with network connections and responses: A good solution will have to be a bit complicated and involve some state signals from the servers being transmitted to the clients, as well as acknowledges from the clients to the servers. Naively, without reflection of what may go wrong during wait for responses:

Strategy (1) - One Last Round


  • server: shut down requested, please send all remaining stuff
  • (clients send their remaining queues and stop queuing new stuff)
  • clients: all sent
  • server: all exceptions have been dealt out again, nothing more accepted now.
  • (server makes snapshot if so requested in shut down command)
  • (clients mop their things up, log desperate super fails to file locally and close down or wait to reconnect)

This would not allow clients to retry anything that happened to fail in the last round. But that might be ok because any client must probably have two strategies coded in anyway: first a retry, then a cancellation as worst case. Or shut down case then. So in the above sequence, after the server signaled shut down, no retries are possible and that's that.
A nicer layout would allow even one last retry:

Strategy (2) - Three Last Rounds


  • server: shut down requested, please send all remaining stuff
  • (clients send their remaining queues and stop queuing new stuff)
  • clients: all sent, ready for clean up
  • server: all results and exceptions have been dealed out, two last rounds accepted
  • (clients may resend stuff in response to failed transactions, but should not send anything new)
  • clients: all retries sent
  • server: all results and exceptions have been dealed out, one last round accepted
  • (clients could resend stuff, but preferably only log alerts, e.g. to a log table, and neither redo anything that failed before, nor send entirely new things)
  • clients: last moves sent
  • server: all exceptions have been dealt out again, nothing more accepted now.
  • (server makes snapshot if so requested in shut down command)
  • (clients mop their things up and send remarks to be stored with the snap shot, of what may have gone wrong while shutdown)
  • (server stores client remarks with snapshot and shuts down)
  • (clients mop their things up, e.g. make local logs of their pains and shut down or wait to reconnect)

Something like this should be generically useful for every project. So I would still think that the mechanics of this belong into the VoltDB Client class org.voltdb.client.Client.
It doesn't look elegant, but I'd be surprised if it could. And I would absolutely opt for a simple time out as default, for development, where the server needs no client response and simply does this:

Strategy (3) - Time Out


  • server: shut down requested, please send all remaining stuff, shop will close in 10 seconds.
  • (clients send their remaining queues and stop queuing new stuff)
  • server (after 10 sec): ok, shut down now.
  • (server makes snapshot if so requested in shut down command)
  • (clients mop their things up, log fails to file locally and close down or wait to reconnect)


That should be all that is needed for development.
Client notification of shutdown
rbetts
Apr 16, 2010
There won't be a clean and easy solution, I guess. Which would be one more argument for having it built-in, if true.

You mention that you mulled about this for a bit and I guess there are few options you have not looked at. Still, my two cents, to describe what I would like to have, as far as I can see.

"It is more difficult, probably impossible, for VoltDB to know that all clients have fully processed their responses."




Your proposals inform clients to begin a shutdown procedure with a tight coupling between client and server. Our shutdown best-practice / suggestion also involves informing clients of shutdown. Only, VoltDB does not provide this notification in band with the wire protocol; the operator must provide the notification to clients out of band. An out of band notification is a looser coupling and potentially provides an interface for clients to shutdown independently of the database (presumably client hosts also come down for occasional maintenance?).

In any of your deployments, do you use load balancers between clients and servers and use load balancer configuration management to direct traffic away from a backend service that is being shutdown (or quiesced for maintenance)?

*--Ryan.
Clients cannot be reached!
chbussler
Apr 16, 2010
Your proposals inform clients to begin a shutdown procedure with a tight coupling between client and server. Our shutdown best-practice / suggestion also involves informing clients of shutdown. Only, VoltDB does not provide this notification in band with the wire protocol; the operator must provide the notification to clients out of band. An out of band notification is a looser coupling and potentially provides an interface for clients to shutdown independently of the database (presumably client hosts also come down for occasional maintenance?).



Hi,

one assumption I am not really fond of at all is the assumption that you can reach clients and tell them that a shutdown is in progress. Keeping track of clients, making sure you reach them, etc., is hard and I'd suggest to look at options where this is not necessary.

As far as I understand, clients submit procedure invocations. A procedure invocation returns return codes about the success of procedures. In the same way an additional return code could be returned informing the client of the server state change (i.e. shutting down) and the client then can react to it (by e.g. keeping yet to be made procedure calls around or drop them, or ...).

From a server side I assumed that a procedure call is either sent to the server or not. So I thought the database server knows at any point precisely the number of procedures to be executed. So if a shutdown command comes, the server could stop accepting new procedure calls and finish all existing ones. So the server knows exactly when the last procedure was executed, then could so a save snapshot, and then shut down.

Is this possible?

Thanks,
Christoph

PS Of course, there could be a 'hard' shutdown, that drops all queued procedures in the server. But that is 'just' a variation on what we mean by shutdown, and what state is in the snapshots.
Reaching Clients
henning
Apr 16, 2010
Hi,

one assumption I am not really fond of at all is the assumption that you can reach clients and tell them that a shutdown is in progress. Keeping track of clients, making sure you reach them, etc., is hard and I'd suggest to look at options where this is not necessary.






1
"Keeping track of clients ... is hard"
In my case that is probably not the motivation. In my setup I can keep track. But I would also have to build a meta structure that really is a part of VoltDB in my eyes. That's open to interpretation obviously, but I think it is, because almost every user of VoltDB will have to solve this problem. Maybe it's the messy moment of a splendid high performance machine there that needs to be landed. But it's still an integral part of the package.
I do understand that the aim is kind of never shutting down! But development and reality may ask for it. At least developers obviously do.

2
The delivery of results of asynchronous transactions may, on some level, be an equivalent to sending notifications to the client, as both are kind of push. A dedicated receiver callback function should do to receive the signals.
To extend the protocol for this, if necessary, should be worth it.
Agree 1 / question on 2
chbussler
Apr 17, 2010
1
"Keeping track of clients ... is hard"
In my case that is probably not the motivation. In my setup I can keep track. But I would also have to build a meta structure that really is a part of VoltDB in my eyes. That's open to interpretation obviously, but I think it is, because almost every user of VoltDB will have to solve this problem. Maybe it's the messy moment of a splendid high performance machine there that needs to be landed. But it's still an integral part of the package.



Hi Henning,

yes, I agree with 1, definitely, that was the origin of the initial question.

On 2, however, I don't really agree completely. Yes, in the asynchronous case, the asynchronous callback knows how to reach the client, no question about that. But then we also have the synchronous and ad-hoc cases (and down the road maybe other possibilities of access the server). My point is that ideally it would work for all types of access, not only for the asynchronous access.

However, VoltDB could say, a shutdown notification service only works in the asynchronous case.

Christoph
Even easier
henning
Apr 17, 2010
Hi Henning,

yes, I agree with 1, definitely, that was the origin of the initial question.

On 2, however, I don't really agree completely. Yes, in the asynchronous case, the asynchronous callback knows how to reach the client, no question about that. But then we also have the synchronous and ad-hoc cases (and down the road maybe other possibilities of access the server). My point is that ideally it would work for all types of access, not only for the asynchronous access.




Hi Christoph,

I was thinking that in the synchronous case, the problem of how to signal from server to client does not even exist in the first place. Only for the asynchronous case, were it should be solvable the same way as the delivery of transaction results. That's why I only addressed that.

So the bottom line would be, in neither case should it be a problem to get the signals through. Looking at the wire protocol I would think that confirms this impression.
Client Notifications
henning
Apr 17, 2010
Hi Christoph,

I was thinking that in the synchronous case, the problem of how to signal from server to client does not even exist in the first place. Only for the asynchronous case, were it should be solvable the same way as the delivery of transaction results. That's why I only addressed that





1
"An out of band notification is a looser coupling and potentially provides an interface for clients to shutdown independently of the database"
That way around should more naturally be falling outside of the scope of VoltDB built-in fittings. That this is suggested for now, is quite clear, I guess. It's just less of a coordinated server shut down, though, just switching off the clients one by one. And the task remains to be generic.

2
"In any of your deployments, do you use load balancers between clients and servers and use load balancer configuration management to direct traffic away from a backend service that is being shutdown (or quiesced for maintenance)?"

No. I am sure you wanted to hint at something there?

With VoltDB my reading was that balancing of traffic by round-robbin will do?

Thanks!
Henning