Forum: Building VoltDB Applications

Post: varchar vs varbinary

varchar vs varbinary
Earthnail
Sep 7, 2015
I want to create a table that stores:

data: 16 bytes binary data
row: 1 byte unsigned int
document_id: 8 byte integer


My query will be:

select distinct document_id where data = %s and row between %s and %s



My first attempt was to create a table like this, which fails (error message directly below):

SQL Command :: localhost:21212
1> CREATE TABLE Keypoint (
2>   Data VARBINARY(16),
3>   Row TINYINT,
4>   WorkID Integer,
5>   PRIMARY KEY(Data, Row, WorkID)
6> );
[null]: VARBINARY values are not currently supported as index keys: 'DATA'


So I went with the following schema:

1> Create TABLE keypoints (
2>   data VARCHAR(16 BYTES) NOT NULL,
3>   row TINYINT NOT NULL,
4>   work_id INTEGER NOT NULL,
5>   PRIMARY KEY(data, row, work_id)
6> );
Command succeeded.


Is that safe for storing and querying binary data? Are there any catches? Is it maybe more efficient to abuse the DECIMAL type, or two BIGINT columns? I'm expecting a few hundred billion rows in production, and a high number of queries/s will be absolutely crucial.
bballard
Sep 9, 2015
Hi,

Using VARCHAR is probably the best choice, but you would want to encode/decode as UTF8 charset values.

Based on 100B rows, this could be 5.5 TB of data in RAM or more.

Our engineering team looked into this, and it would be possible to remove the restriction so that a native VARBINARY column could be indexed. This is something we could prioritize for a customer. If you are building something for a funded project, please contact sales@voltdb.com. We'd like to help you do this.

Best regards,
Ben
xin
Sep 25, 2015
Hi Earthnail,

On Github master branch, we now support index on varbinary type. Our next version V5.7 will pick up this change.

So you can try the same DDL/queries on our community master branch or wait for V5.7 to be released soon.

Please let us know if you have any other questions.

Thanks,
Xin