Forum: Managing VoltDB

Post: Crash on snapshot restore w/2.5

Crash on snapshot restore w/2.5
davec
Oct 30, 2012
I am having problems doing online point-in-time-recovery (PITR) with volt 2.5. It would be nice to have in case someone deletes data by accident, etc. The documentation indicates snapshots are PITR consistent, so it seems like this should be possible. I tried a live save and restore and it killed the cluster.

I started by doing @SnapshotSave a live cluster with no admin mode, no pause/resume/shutdown, simulating an automatic snapshot. I made successfully made a snapshot then waited a few minutes to restore. Attempting @SnapshotRestore caused the cluster to die. I have included the log below. I am using "Build: 2.5 voltdb-2.5-0-gb7b610e Community Edition."

Possible causes are divergent master/backup replicas due to an inconsistent stored procedure or doing the restore live. I am running k=1.

It seems that snapshots should be consistent, so I'm just risking weird query results on the client side. It also seems like automatic snapshots happen online. It also seems like a live snapshot/restore should be possible given the architecture. Maybe this isn't a supported use case and I need follow the maintenance window procedure and shutdown the cluster even though there's no schema changes?

If the database replicas are diverged, is there a way to return to coherence? I know I can drop and recreate the catalog with two catalog updates if I know the table is bad, but how can I discover the snapshot is bad without crashing the cluster? Is there a way to import only selected tables from the snapshot so one bad apple doesn't spoil the bunch?

Is this a "contagion" problem so I should run two different volt clusters: one for mission critical data and one for less important data?

TIA

log:
2012-10-29 22:27:52,318 INFO [ExecutionSite:0101] HOST: Restoring from path: /storage/voltdbdata/snapshots with nonce: worksforme
2012-10-29 22:27:52,421 INFO [ExecutionSite:0101] EXPORT: Truncating export data after txnId 1278456261272141825
2012-10-29 22:27:52,422 INFO [ExecutionSite:0101] EXPORT: Drained source in generation 1276195267418259456 with 1 of 4 drained
2012-10-29 22:27:52,422 INFO [ExecutionSite:0101] EXPORT: Drained source in generation 1276195267418259456 with 2 of 4 drained
2012-10-29 22:27:52,422 INFO [ExecutionSite:0101] EXPORT: Drained source in generation 1276195267418259456 with 3 of 4 drained
2012-10-29 22:27:52,422 INFO [ExecutionSite:0101] EXPORT: Drained source in generation 1276195267418259456 with 4 of 4 drained
2012-10-29 22:27:52,424 INFO [Raw export processor] EXPORT: Finished draining generation 1275812195829022720
2012-10-29 22:27:52,424 INFO [Raw export processor] EXPORT: Creating connector org.voltdb.export.processors.RawProcesso r
2012-10-29 22:27:52,424 INFO [Raw export processor] EXPORT: Processor ready for data.
2012-10-29 22:27:52,668 INFO [ExecutionSite:0101] org.voltdb.sysprocs.saverestore.PartitionedTableSaveFileState: Total partitions for Table: SEQUENCE: 6
2012-10-29 22:27:52,669 INFO [ExecutionSite:0101] org.voltdb.sysprocs.saverestore.PartitionedTableSaveFileState: Partit ion set: [0, 1, 2, 3, 4, 5]
2012-10-29 22:27:52,669 INFO [ExecutionSite:0101] HOST: Distribution plan for table SEQUENCE
2012-10-29 22:27:52,670 INFO [ExecutionSite:0101] HOST: Host 0 will distribute partitions 0 3
2012-10-29 22:27:52,670 INFO [ExecutionSite:0101] HOST: Host 1 will distribute partitions 1 5
2012-10-29 22:27:52,670 INFO [ExecutionSite:0101] HOST: Host 2 will distribute partitions 2 4
2012-10-29 22:27:52,670 INFO [ExecutionSite:0101] org.voltdb.sysprocs.saverestore.PartitionedTableSaveFileState: Total partitions for Table: LASTUPDATEFLIGHTHISTORY: 6
2012-10-29 22:27:52,670 INFO [ExecutionSite:0101] org.voltdb.sysprocs.saverestore.PartitionedTableSaveFileState: Partit ion set: [0, 1, 2, 3, 4, 5]
2012-10-29 22:27:52,670 INFO [ExecutionSite:0101] HOST: Distribution plan for table LASTUPDATEFLIGHTHISTORY
2012-10-29 22:27:52,671 INFO [ExecutionSite:0101] HOST: Host 0 will distribute partitions 0 3
2012-10-29 22:27:52,671 INFO [ExecutionSite:0101] HOST: Host 1 will distribute partitions 1 5
2012-10-29 22:27:52,671 INFO [ExecutionSite:0101] HOST: Host 2 will distribute partitions 2 4
2012-10-29 22:27:52,671 INFO [ExecutionSite:0101] org.voltdb.sysprocs.saverestore.PartitionedTableSaveFileState: Total partitions for Table: FLIGHTHISTORYCODESHARE: 6
2012-10-29 22:27:52,671 INFO [ExecutionSite:0101] org.voltdb.sysprocs.saverestore.PartitionedTableSaveFileState: Partit ion set: [0, 1, 2, 3, 4, 5]
2012-10-29 22:27:52,671 INFO [ExecutionSite:0101] HOST: Distribution plan for table FLIGHTHISTORYCODESHARE
2012-10-29 22:27:52,671 INFO [ExecutionSite:0101] HOST: Host 0 will distribute partitions 0 3
2012-10-29 22:27:52,671 INFO [ExecutionSite:0101] HOST: Host 1 will distribute partitions 1 5
2012-10-29 22:27:52,672 INFO [ExecutionSite:0101] HOST: Host 2 will distribute partitions 2 4
2012-10-29 22:27:52,672 INFO [ExecutionSite:0101] org.voltdb.sysprocs.saverestore.PartitionedTableSaveFileState: Total partitions for Table: FLIGHTHISTORY: 6
2012-10-29 22:27:52,672 INFO [ExecutionSite:0101] org.voltdb.sysprocs.saverestore.PartitionedTableSaveFileState: Partit ion set: [0, 1, 2, 3, 4, 5]
2012-10-29 22:27:52,672 INFO [ExecutionSite:0101] HOST: Distribution plan for table FLIGHTHISTORY
2012-10-29 22:27:52,672 INFO [ExecutionSite:0101] HOST: Host 0 will distribute partitions 0 3
2012-10-29 22:27:52,672 INFO [ExecutionSite:0101] HOST: Host 1 will distribute partitions 1 5
2012-10-29 22:27:52,672 INFO [ExecutionSite:0101] HOST: Host 2 will distribute partitions 2 4
2012-10-29 22:27:52,672 INFO [ExecutionSite:0101] org.voltdb.sysprocs.saverestore.PartitionedTableSaveFileState: Total partitions for Table: FLIGHTHISTORYUPDATE: 6
2012-10-29 22:27:52,672 INFO [ExecutionSite:0101] org.voltdb.sysprocs.saverestore.PartitionedTableSaveFileState: Partit ion set: [0, 1, 2, 3, 4, 5]
2012-10-29 22:27:52,673 INFO [ExecutionSite:0101] HOST: Distribution plan for table FLIGHTHISTORYUPDATE
2012-10-29 22:27:52,673 INFO [ExecutionSite:0101] HOST: Host 0 will distribute partitions 0 3
2012-10-29 22:27:52,673 INFO [ExecutionSite:0101] HOST: Host 1 will distribute partitions 1 5
2012-10-29 22:27:52,673 INFO [ExecutionSite:0101] HOST: Host 2 will distribute partitions 2 4
Attempted violation of constraint
In ../../src/ee/execution/VoltDBEngine.cpp:746
Attempted violation of constraint
In ../../src/ee/execution/VoltDBEngine.cpp:746
()
voltdb::VoltDBEngine::loadTable(int, voltdb::ReferenceSerializeInput&, long, long)
Java_org_voltdb_jni_ExecutionEngine_nativeLoadTable()
[0x7f25e27ebf90]
Attempted violation of constraint
In ../../src/ee/execution/VoltDBEngine.cpp:746
()
voltdb::VoltDBEngine::loadTable(int, voltdb::ReferenceSerializeInput&, long, long)
Java_org_voltdb_jni_ExecutionEngine_nativeLoadTable()
[0x7f25e27ebf90]
()
voltdb::VoltDBEngine::loadTable(int, voltdb::ReferenceSerializeInput&, long, long)
Java_org_voltdb_jni_ExecutionEngine_nativeLoadTable()
[0x7f25e27ebf90]
Attempted violation of constraint
In ../../src/ee/execution/VoltDBEngine.cpp:746
()
voltdb::VoltDBEngine::loadTable(int, voltdb::ReferenceSerializeInput&, long, long)
Java_org_voltdb_jni_ExecutionEngine_nativeLoadTable()
[0x7f25e27ebf90]
2012-10-29 22:27:52,763 FATAL [ExecutionSite:0103] HOST: No additional info.
VoltDB has encountered an unrecoverable error and is exiting.
The log may contain additional information.
VoltDB has encountered an unrecoverable error and is exiting.
The log may contain additional information.
2012-10-29 22:27:52,763 FATAL [ExecutionSite:0104] HOST: No additional info.
VoltDB has encountered an unrecoverable error and is exiting.
Hi Dave, You're getting
bballard
Oct 30, 2012
Hi Dave,
You're getting constraint violations because the snapshot restore is trying to insert existing records. Snapshot restore should only be performed on an empty database.

For an example, see this forum post (http://community.voltdb.com/node/1426) that describes how to take a snapshot and stop the database for maintenance, then restart and restore from the snapshot. The database is stopped and then restarted in CREATE mode (empty database), before the snapshot is loaded.

If you have automatic snapshots configured, restarting the database in RECOVER or START mode will reload the most recent automated snapshot. You can still recover from a specific manual snapshot by restarting the database in CREATE mode and then using the @SnapshotRestore command.

-Ben
Shutting down VoltDB server
Clocks
Oct 30, 2012
Is there a way to stop the VoltDB server from the terminal without pressing ctrl-C? I would like to make the server exit normally.
sqlcmd1> exec
bballard
Oct 30, 2012
sqlcmd
1> exec @Shutdown

Connection to database host (localhost/127.0.0.1:21212) was lost before a response was received

You should expect to see the connection lost error, or a longer exception message depending on the version of VoltDB. That is normal.
Another form that is non-interactive and works better in a script is:
echo "exec @Shutdown" | sqlcmd --servers=localhost --port=21211