Forum: Building VoltDB Applications

Post: Blobs

Blobs
henning
Mar 1, 2010
What is the recommended way to store binary data / blobs?
RE: Blobs
aweisberg
Mar 1, 2010
Hi Henning,

Right now you can store blobs using the VARCHAR type. Currently we don't support VARCHAR values larger than (2 ^ 15) - 2 bytes. Another limitation is that a result row returned by a SELECT statement can't be larger then 2 ^ 15 bytes. We use strcmp to compare VARCHAR columns so you can't use a blob in a comparison (including equality), or as an index key unless you hex encode it.

Hope this helps,

Ariel Weisberg
Encoding Issues?
henning
Mar 2, 2010
Hi Henning,

Right now you can store blobs using the VARCHAR type. Currently we don't support VARCHAR values larger than (2 ^ 15) - 2 bytes. Another limitation is that a result row returned by a SELECT statement can't be larger then 2 ^ 15 bytes. We use strcmp to compare VARCHAR columns so you can't use a blob in a comparison (including equality), or as an index key unless you hex encode it.

Hope this helps,

Ariel Weisberg

Thanks Ariel,

are there any implicit encoding conversions (ANSI/UTF-8) to be aware of that may cripple binary data on its way from client into the data base and back that may need some form of escaping?
RE: Encoding Issues?
aweisberg
Mar 2, 2010
Thanks Ariel,

are there any implicit encoding conversions (ANSI/UTF-8) to be aware of that may cripple binary data on its way from client into the data base and back that may need some form of escaping?

Hi Henning,

I forgot to mention that you can pass a byte array as a parameter to a stored procedure to get your binary data into the procedure. At that point you can pass the array as a parameter to an UPDATE or INSERT statement for a VARCHAR field. This skips the encoding/decoding step. It is also a faster way to insert real strings into the database since it saves the encoding/decoding work involved in serializing/constructing a Java String.

Ariel Weisberg
And getting it out again?
henning
Apr 2, 2010
Hi Henning,

I forgot to mention that you can pass a byte array as a parameter to a stored procedure to get your binary data into the procedure. At that point you can pass the array as a parameter to an UPDATE or INSERT statement for a VARCHAR field. This skips the encoding/decoding step. It is also a faster way to insert real strings into the database since it saves the encoding/decoding work involved in serializing/constructing a Java String.

Ariel Weisberg

So I need not encode it any way, also not Base 64 for getting it into the procedure and then on into the database when passing as byte array?

But the caveat when getting it out again remains, as described in your fore-last post?

I would have to Base 64 encode it to get it out? How would I do that? Change the respective value of the resulting VoltTable before returning it? So that it is not running into the assertion?
You don't need to encode to
aweisberg
Apr 3, 2010
So I need not encode it any way, also not Base 64 for getting it into the procedure and then on into the database when passing as byte array?

But the caveat when getting it out again remains, as described in your fore-last post?

I would have to Base 64 encode it to get it out? How would I do that? Change the respective value of the resulting VoltTable before returning it? So that it is not running into the assertion?

You don't need to encode to get it into the procedure, but it is probably better to do the encoding on the client and not the server.

Our dirty little secret is that you could probably get away with stashing unencoded data via a VARCHAR this way because you can get the data out of the result table via getStringAsBytes. I haven't tested round tripping binary data and this isn't supported. We strongly discourage stashing binary data in VARCHAR fields because this is almost guaranteed to break in the future. A BLOB type is coming sooner or later.

The assertion (caveat) I am talking about is in the Java VoltTable class and it is only hit when you add rows to a VoltTable.
Incidentally the limit for VARCHARS has been raise to 1 megabyte for future releases (and max serialized VoltTable row size has been raised to 2 megabytes).

Ariel
RE: Encoding issues?
aweisberg
Mar 2, 2010
Thanks Ariel,

are there any implicit encoding conversions (ANSI/UTF-8) to be aware of that may cripple binary data on its way from client into the data base and back that may need some form of escaping?

Hi again,

One thing that Ryan Betts reminded me of is that getting blobs back out again can be a little tricky. You can call VoltTable.getStringAsBytes() to skip the UTF-8 decoding step and Java String construction, but VoltTable.addRow() has assertion that runs strings (including strings passed as byte[]) through a UTF-8 decoder to ensure that it is valid UTF-8 data.
You should b64 encode blobs until we add a blob type.

Ariel