Forum: Managing VoltDB

Post: Snapshot questions

Snapshot questions
alexlzl
Feb 15, 2011
Got several snapshot save/restore questions to verify with the developers:

* Can I restore a snapshot file on another node?
I found that each snapshot file has a suffix like -KEY_VALUE-host_1 (as in "manual_snapshot2-KEY_VALUE-host_1.vpt", as I generated the snapshot by invoking @SnapshotSave with SAVEID="manual_snapshot" for the key_value example).

- Where does "KEY_VALUE" come from? Is it the upper-case of the "table" name?

- Where does "host_1" come from?

When I use @SnapshotRestore, I can only pass the same SAVEID="manual_snapshot". If I am restoring on a different node in the cluster, how can that node determine the suffix, espeically the "host_1" case? (since the node's sequence in the cluster may very likely not _1 any more)
* If I restore a node to an "old" snapshot, will that node automatically sync-up with the cluster to get the up-to-date data?

It is not explicitly mentioned in the document, neither did I get a log entry indicating it is happening (I need this notification to decide if this node is ready for client traffic). I assume this should be the case since otherwise there is no way to recover a failed node

* Extend the last question, if I am doing schema update (snapshot - shutdown - start - restore) on all nodes, will they all converge to the consistent state eventually? Is there a special sequence I should follow? (like start in reverse sequence as shutdown) How can I tell the converge is completed?

Thank you for your help.
some answer to my own questions
alexlzl
Feb 15, 2011
Figured out one answer through testing (by using @Statistics system procedure to figure out the data count):

- The only way to join back a cluster is to start the node with "rejoinhost". In this case, the node will recover the data from the cluster (depends on k-factor of course), and the following log indicates it is successful. There is actually no way to "restore from snapshot then rejoin" since rejoin is a startup option.

INFO 2011-02-15 19:52:47,783 [ExecutionSite:0104] HOST: Node recovery completed after 2 seconds with 119 megabytes transferred at a rate of 43.06912775968151 megabytes/sec

Though, I guess snapshot is local to a node right? Then in the event of adding node or upgrading schema, do I need to issue restore to ALL the nodes?

* Does the sequence matter?
* How to tell the full cluster-wise restore is done? I assume it may take some time for different snapshots to consolidate.

[update] with k-factor=1 (mirror on two nodes), if I restore each node separately to its own last snapshot, the total rows got duplicated in the cluster. Seems the snapshot file itself doesn't contain the necessary information about cluster states. Does that mean we have to be really really careful about which snapshots to be used to restore?

This will become really complicated if k-factor>2, how do I know which set of nodes' snapshots contain the full cluster data? Really hope the cluster-wide backup/restore can be more thoroughly explained since that is the most critical part of using VoltDB.

Please help. Thank you.
re: Snapshot questions
tcallaghan
Feb 16, 2011
Figured out one answer through testing (by using @Statistics system procedure to figure out the data count):

- The only way to join back a cluster is to start the node with "rejoinhost". In this case, the node will recover the data from the cluster (depends on k-factor of course), and the following log indicates it is successful. There is actually no way to "restore from snapshot then rejoin" since rejoin is a startup option.




Thanks for working through some of your original questions and adding some new ones. I've responded to your new questions inline.
Though, I guess snapshot is local to a node right? Then in the event of adding node or upgrading schema, do I need to issue restore to ALL the nodes?
* Does the sequence matter?
* How to tell the full cluster-wise restore is done? I assume it may take some time for different snapshots to consolidate.

When you create a snapshot all the VoltDB nodes are writing their own data to their own files. When performing a @SnapshotRestore you are restoring the entire VoltDB cluster (you perform this operation once), not just some of the nodes.
[update] with k-factor=1 (mirror on two nodes), if I restore each node separately to its own last snapshot, the total rows got duplicated in the cluster. Seems the snapshot file itself doesn't contain the necessary information about cluster states. Does that mean we have to be really really careful about which snapshots to be used to restore?
This will become really complicated if k-factor>2, how do I know which set of nodes' snapshots contain the full cluster data? Really hope the cluster-wide backup/restore can be more thoroughly explained since that is the most critical part of using VoltDB.

When a user initiates a @SnapshotRestore all nodes in the cluster consider the available files, making sure the files are not-corrupt (via checksums) and ensuring that the collective snapshot files contain all the partitions for any partitioned tables. When k > 0 the cluster contains redundant copies of partitions so there are multiple copies of the data within the collective snapshot files. As of Version 1.2.1.06 (get it from http://www.voltdb.com/community/downloads.php) you can put all the snapshot files on a single node or have them on the nodes where they were created, as long as the set of files is valid we will perform the restore.

-Tim
re: Snapshot questions
tcallaghan
Feb 16, 2011


  • Can I restore a snapshot file on another node?

I found that each snapshot file has a suffix like -KEY_VALUE-host_1 (as in "manual_snapshot2-KEY_VALUE-host_1.vpt", as I generated the snapshot by invoking @SnapshotSave with SAVEID="manual_snapshot" for the key_value example).
- Where does "KEY_VALUE" come from? Is it the upper-case of the "table" name?
- Where does "host_1" come from?

When VoltDB creates a snapshot all nodes in the cluster create one file for each table: replicated tables read their data from a single partition on the node and partitioned tables read their data from all partitions. As you said, the snapshot file name contains the uppercase name of the table, host_1 is the internal VoltDB host number and guaranteed to be unique across the cluster.When I use @SnapshotRestore, I can only pass the same SAVEID="manual_snapshot". If I am restoring on a different node in the cluster, how can that node determine the suffix, espeically the "host_1" case? (since the node's sequence in the cluster may very likely not _1 any more)

When you call @SnapshotRestore VoltDB is checking all available snapshot files for the given SAVEID and will consider all files present on all nodes.

  • If I restore a node to an "old" snapshot, will that node automatically sync-up with the cluster to get the up-to-date data?

It is not explicitly mentioned in the document, neither did I get a log entry indicating it is happening (I need this notification to decide if this node is ready for client traffic). I assume this should be the case since otherwise there is no way to recover a failed node

@ShapshotRestore is a full restore, not just a single node. Recovering a failed node is another process entirely, see Recovering from System Failures for more information.

  • Extend the last question, if I am doing schema update (snapshot - shutdown - start - restore) on all nodes, will they all converge to the consistent state eventually? Is there a special sequence I should follow? (like start in reverse sequence as shutdown) How can I tell the converge is completed?


As with the previous response, @SnapshotRestore does a full system restore so everyone is restored to the same transactional point in time as when the snapshot was created.
Thank you so much for the
alexlzl
Feb 16, 2011


  • Can I restore a snapshot file on another node?

I found that each snapshot file has a suffix like -KEY_VALUE-host_1 (as in "manual_snapshot2-KEY_VALUE-host_1.vpt", as I generated the snapshot by invoking @SnapshotSave with SAVEID="manual_snapshot" for the key_value example).
- Where does "KEY_VALUE" come from? Is it the upper-case of the "table" name?
- Where does "host_1" come from?






Thank you so much for the detailed explanation. It makes total sense now. Two more questions:

1. As of Version 1.2.1.06 (get it from http://www.voltdb.com/community/downloads.php) you can put all the snapshot files on a single node or have them on the nodes where they were created ...
How to make it happen that all snapshot files will be put on a single node?

2. Do the snapshot utilities mentioned in this post
(http://forum.voltdb.com/showthread.php?253-Snapshot-Utilities&highlight=Snapshot+Utilities) talk to the cluster? Seems it only deals with local directories. Or does it assume that it must be executed on a running node?
1. As of Version 1.2.1.06
tcallaghan
Feb 17, 2011
Thank you so much for the detailed explanation. It makes total sense now. Two more questions:




1. As of Version 1.2.1.06 (get it from http://community.voltdb.com/downloads) you can put all the snapshot files on a single node or have them on the nodes where they were created ...
How to make it happen that all snapshot files will be put on a single node?
You'd need to copy them all manually, all servers export locally. If possible, leave/put the files on the servers where they were created and do the restore.
2. Do the snapshot utilities mentioned in this post (http://community.voltdb.com/node/112) talk to the cluster? Seems it only deals with local directories. Or does it assume that it must be executed on a running node?
The snapshot utilities are standalone and do not interact with the servers in your cluster. All files must be local to where you are running the utilities from.

-Tim