Forum: Other

Post: VoltDB Deployment Questions

VoltDB Deployment Questions
diego
Jan 28, 2011
VoltDB Questions:

1) How are memory of deleted rows reclaimed? For non-varchar rows. There appears to be a vacuum feature but the doc implies that it doesn't apply for non-varchar fields.

2) Is there still a 2MB limit on the size of the result set returned by the stored procedures? Why such a low number? Can the sql statements within a stored procedure be above this limit?

3) On a 32GB machine, I set 28GB to be the voltdb dataset size, how long will the disk snapshot take assuming 100MB/s sequential write speed of the hard drive? Does the snapshooter write sequentially or randomly?

4) Does the snapshot process blocks incoming requests? What happens if write/read requests are coming in while the snapshot is taking place? Is voltdb taking a snapshot of only actual data used (let say only 12GB used out of 28GB) or backing up the complete config size of 28GB in the above case?

5) When the cluster suffers unrecoverable failure and the cluster is restarted manually by admin, are the data auto recovered from last snapshot? Or is this a manual process?

6) When a 28GB node snapshot is being restored assuming 100MB/s read speed from hd backend, how long will the recovery take? Estimated of course.

7) Assume a 4 node cluster failure. 3 nodes recovered successfully from snapshots, the 4th failed recovery on the last snapshot due to corruption. Can the 4th node recover from an older snapshot and still rejoin the cluster? Similar but different question. If kfactor=1, can the 4th rejoin the cluster without doing the snapshot recovery since somewhere else on the cluster, its data is duplicated.

8) Are there checksum protection for the data stored in ram to protection against corruption?

9) K-safety and rack-awareness. For example, let's say I have 4 note cluster of kfactor=1. However, 2 node is in rack A and the other 2 in rack B. Is there a way to setup so that no mirror-pair is in the same physical rack?

Thanks from someone looking at VoltDB as a potential deployment target.
re: VoltDB Deployment Questions
tcallaghan
Jan 28, 2011
Diego,

Responses to many of your questions are inline, I'll have a VoltDB Developer respond to the others.
VoltDB Questions:

3) On a 32GB machine, I set 28GB to be the voltdb dataset size, how long will the disk snapshot take assuming 100MB/s sequential write speed of the hard drive? Does the snapshooter write sequentially or randomly?
The snapshot process performs sequential writes to disk. Snapshots saturate the IO bandwidth available so in your example it will take about 280 seconds (28GB / .1GB/s). Keep in mind that we only backup table data, not index, so you'd only write the entire 28GB if you had no indexes.

4) Does the snapshot process blocks incoming requests? What happens if write/read requests are coming in while the snapshot is taking place? Is voltdb taking a snapshot of only actual data used (let say only 12GB used out of 28GB) or backing up the complete config size of 28GB in the above case?
Scheduled snapshots do not block incoming requests, manually executed snapshots can block if you want them to. When the snapshot is running we make a copy of any tuples that have not been snapshotted in the current snapshot but are about to be modified by a transaction. We only snapshot table data, not indexes, so you will have far less than 28GB of snapshot data for the case you described.

5) When the cluster suffers unrecoverable failure and the cluster is restarted manually by admin, are the data auto recovered from last snapshot? Or is this a manual process?
You need to reload your snapshot by calling the @SnapshotRestore system procedure, it is not automatically loaded. We are considering adding a "start and load from snapshot" feature.

6) When a 28GB node snapshot is being restored assuming 100MB/s read speed from hd backend, how long will the recovery take? Estimated of course.
It really depends, loading the table data itself is very fast. Indexes and materialized views will slow down the process.

7) Assume a 4 node cluster failure. 3 nodes recovered successfully from snapshots, the 4th failed recovery on the last snapshot due to corruption. Can the 4th node recover from an older snapshot and still rejoin the cluster? Similar but different question. If kfactor=1, can the 4th rejoin the cluster without doing the snapshot recovery since somewhere else on the cluster, its data is duplicated.
Snapshots files contain checksum information to ensure their integrity. These can be checked offline. You must have all snapshots available to perform a @SnapshotRestore. In your example you could run our "Snapshot Converter" to transform them to CSV/TSV file and load the data manually. Lastly, if you are running a k-safe cluster, VoltDB only needs 1 copy of each partition for the restore process.

9) K-safety and rack-awareness. For example, let's say I have 4 note cluster of kfactor=1. However, 2 node is in rack A and the other 2 in rack B. Is there a way to setup so that no mirror-pair is in the same physical rack?
VoltDB does not currently allow the user to provide provisioning information.

-Tim
Hi Diego, 1) There are
aweisberg
Jan 28, 2011
Hi Diego,

1) There are different strategies used to reclaim memory for different types of data. Row data is stored in blocks and a load factor is maintained by compacting blocks together as they become empty. Tree and hash indexes take a zero fragmentation approach that relies on an allocator that only allocates/deallocates at the end. When a node of a tree or hash index is deleted, the last node is moved into the hole created by the deletion, and the space previously used by the moved node is given back to the allocator. The allocator allocates 2 megabytes blocks and once a block is empty it is deleted. String data that is not inlined into a row is pooled and available for reuse, but not returned to the OS.
The current release doesn't have the zero fragmentation hash index (coming in 1.3) and uses pooling instead.

2) VoltDB is focused on high throughput OLTP and is optimized for transactions that touch "small" amounts of data. If you are returning 2 megabyte result sets you will be network limited which defeats the tradeoffs and optimizations made for throughput. Stored procedures have to return the entire result set (no cursors) and many store procedures invocations are in flight at any given time and all the results sets have to fit in memory. We also had to pick a limit so that we had an upper bound to test against.
You can return multiple VoltTables from a procedure until you hit the 50 megabyte message limit. Individual VoltTables are limited to 10 megabytes. A batch of SQL statements can return up to 10 megabytes of results.

3)How are you limiting the dataset size to 28GB?
On a typical 7.5k SATA drive you get 70-80 megabytes/sec. 4 7.5k SATA drives in RAID-0 do 240 megabytes/sec. One EBS user reports getting around 40 megabytes/sec with an unspecified EBS config. My laptop does around 20 megabytes/sec. We don't snapshot indexes or materialized views which cuts down the amount that needs to be snapshotted considerably.

7) On snapshot restore only one replica set of snapshot data will be used for the restore even if there are multiple copies visible to the cluster. If there is a corrupt snapshot file the snapshot restore will return an error specifying the problem file. The corrupt file needs to removed or renamed, and then the cluster and snapshot restore can be restarted and another replica of the data will be used.
As long as the cluster as a whole has visibility to a complete uncorrupted snapshot, a restore will succeed. It doesn't matter which nodes have the snapshot data and it doesn't matter what the kfactor or host count of the new cluster is.

8) All parts of snapshots are checksummed including headers and metadata. Incomplete snapshots are also detected. Rows are written in 2 megabytes blocks and each block has a CRC32.

-Ariel