Forum: VoltDB Architecture

Post: Swapping out BLOBs

Swapping out BLOBs
Feb 12, 2011
I've been obsessed with the idea of storing everything in VoltDB, but saving memory by swapping out large BLOBs that aren't being used.

VoltDB hasn't got BLOBs AFAIK (unless I missed a new release), but I am assuming it will come eventually.

I think I might have found a possible way, and I was wondering if it makes sense from an architectural point-of-view.

Assuming BLOB support was integrated, and the DB admin could specify that BLOBs in a specific column over a certain size 'MAX_SIZE', which haven't been accessed for a certain amount of time 'MAX_AGE', should be swapped out of memory. VoltDB could add some hidden columns for each such BLOB column as follow: a timestamp "ACCESS" of last access, and a pointer 'FSP' to the location of the BLOB on the latest DB snapshot, and some flag "SWAPPED" specifying if it was swapped out. When the BLOB is written, the file-system pointer and the flag would be cleared. When the BLOB is saved in a snapshot, the file-system pointer would be set. At some regular time intervals, all BLOBs would be checked for candidates to swap out. That is, those where:


would be true. Any such BLOB would be cleared from memory and have it's SWAPPED flag set to true.

Then, when a stored procedure is run that tries to read a BLOB that was swapped out, it would roll-back, the BLOB would be scheduled for reloading using some back-ground thread, and the procedure would be re-run automatically when the BLOB is back in memory, all transparently to the client, except the longer delay. The main problem I see is that is that the BLOBs that are swapped out would have to be copied from an old snapshot into a new one, so that all snapshots contain all the data, which would complicate and snapshot code and slow it down.

So, does this sound reasonable, or totally unpractical to implement?

And btw, there is a typo on this page: SaveSnapshotAuto : "360s" should probably be "3600s" ...
re: Swapping out BLOBs
Feb 16, 2011

You ideas are interesting but I'm not sure they are practical to implement. Do you have "BLOB heavy" applications that also take advantage of our stored-procedure + transactional properties. You could implement a hybrid approach to your application using us for the OLTP workload and storing the BLOB data in another data store.

I created a Jira ticket for the typo (, thanks for the heads up.