Forum: Building VoltDB Applications

Post: snapshot transaction id

snapshot transaction id
David Best
Feb 7, 2011
I understand that a snapshot is a transactionally consistent image of the database. But when restoring a snapshot, is there any way to determine the transaction high watermark?
Our scenario is this: We export transaction details tagged with the transaction id to a data warehouse. If we take automatic snapshots, then if the cluster goes down, we would restore from the latest snapshot. Of course, there will likely be more recent transactions posted to the data warehouse, so we will need to read and reapply them. But this requires knowing the last transaction seen in the snapshot.
Without the last txn, it seems the snapshot is virtually useless; we would have to bypass it and reconstruct a complete image from the data warehouse.
I suppose I can create a table to hold the most recent transaction id per partition, and update it on all state-changing procedures. Then on restore, select the max value. This seems like something better handled by Volt...unless I'm completely off base :)
Thoughts?
Hi David, There is a
aweisberg
Feb 8, 2011
Hi David,
There is a timestamp field in snapshots that is taken from the wall clock time on the node coordinating the initiation of the snapshot. The timestamp is embedded in every .vpt and is in the .digest file and is the same cluster wide. The first 4-bytes of the digest file is a CRC and the rest of the file is a plain text , separated list. The first entry in the list is the plain text representation of the timestamp.
Our plan is to change that field to be the transaction id in 1.3. I created https://issues.voltdb.com/browse/ENG-983 to track this change.
If you are willing to wait until 1.3 you can code against the existing timestamp by converting it to a txnid (see https://source.voltdb.com/browse/Engineering/trunk/src/frontend/org/voltdb/TransactionIdManager.java?hb=true#to189) with the knowledge that it will be slightly off.
If you are willing to build the .jar you can check out https://svn.voltdb.com/eng/branches/voltdb-1.2.1 and apply http://pastebin.com/6j3DP3LA and get that functionality now.
Please let me know if this solution works for you
-Ariel
==
EDIT: We've since switched to GitHub, so some of these links may not work. Reply to this post if you need updated links.
Thanks
David Best
Feb 8, 2011
Hi David,
There is a timestamp field in snapshots that is taken from the wall clock time on the node coordinating the initiation of the snapshot. The timestamp is embedded in every .vpt and is in the .digest file and is the same cluster wide. The first 4-bytes of the digest file is a CRC and the rest of the file is a plain text , separated list. The first entry in the list is the plain text representation of the timestamp.
Our plan is to change that field to be the transaction id in 1.3. I created https://issues.voltdb.com/browse/ENG-983 to track this change.
If you are willing to wait until 1.3 you can code against the existing timestamp by converting it to a txnid (see https://source.voltdb.com/browse/Engineering/trunk/src/frontend/org/voltdb/TransactionIdManager.java?hb=true#to189) with the knowledge that it will be slightly off.
If you are willing to build the .jar you can check out https://svn.voltdb.com/eng/branches/voltdb-1.2.1 and apply http://pastebin.com/6j3DP3LA and get that functionality now.
Please let me know if this solution works for you
-Ariel
==
EDIT: We've since switched to GitHub, so some of these links may not work. Reply to this post if you need updated links.


That's great, I will continue development based on this. It's not critical path yet, so I may hold off on applying the patch...is there a target for the 1.3 release yet?
Transaction Id
seven
Mar 28, 2011
Would a valid solution be to create a table in Volt called "last_transaction" that only had 1 row and column that our stored procedures update with the last transaction id? That way the snapshot would contain the value of the last transaction id or is there be a better solution until 1.3 is released? The "with the knowledge that it will be slightly off" comment worries me with using the timestamp in the snapshot.
Also is the the transaction id a sequential number that is consistent across partitions?
For 1.2.1, If you maintained
aweisberg
Mar 29, 2011
Would a valid solution be to create a table in Volt called "last_transaction" that only had 1 row and column that our stored procedures update with the last transaction id? That way the snapshot would contain the value of the last transaction id or is there be a better solution until 1.3 is released? The "with the knowledge that it will be slightly off" comment worries me with using the timestamp in the snapshot.
Also is the the transaction id a sequential number that is consistent across partitions?


For 1.2.1, If you maintained a last transaction id it would have to be an entry in a partitioned table at each partition. Upon reloading the snapshot you would have to select the value from each partition and use the max as the last transaction id in the snapshot. You correct to be concerned about the timestamp in 1.2 being slightly off. I only suggest using it as something to develop against until 1.3 becomes available.
In 1.3 things are much improved and the snapshot format has been changed (old snapshots still load). The transaction id is in the header of every snapshot file and the headers are now JSON. Any of the .digest files associated with a snapshot will have the transaction id in an easy to consume format. The .digest is a UTF-8 encoded JSON object with a 4-byte CRC32 prefix. The "txnId" field contains the transaction id.
Transaction ids are consistent across partitions and executed in order. For ordering purposes you can compare them as a signed 64-bit integers. The most significant bits are a millisecond timestamp followed by a sub-millisecond counter followed by the id of the initiator the timestamp originated from.
-Ariel