Forum: Other

Post: VoltDB unexpected database size

VoltDB unexpected database size
sirin
Jan 20, 2015
Hi,

We observe some unexpected values when we check the database size reported by VoltDB. We populate one table with two columns both of which are of the long type. As we increase the number of tuples in the database, after some point, the size of the database decreases instead of increasing. In the table below,

- the first column shows the number of rows in the table
- the second column shows the TUPLEDATA values that "exec @Statistics MEMORY 0;" command prints when executed from ./bin/sqlcmd
- the third column shows the TUPLEDATA values that we expect to see, where we simply estimate the size of each tuple as 16Bytes (since we keep two longs, i.e., bigints, in a row) and multiply it with the number of tuples.

As can be seen, up to 160 million rows, the reported TUPLEDATA value is quite close to the expected TUPLEDATA size. However, for 1.6 billion and more rows, TUPLEDATA values are decreasing.

The fourth row in the table below shows the index size (INDEXMEMORY), which is also reported by "exec @Statistics MEMORY 0;" command from ./bin/sqlcmd. There, we see an expected behavior; while 160 million rows takes 7.5GB, 1.6 billion rows takes 75GB, which is linear.
Could you please explain us why we get this kind of unexpected database size report. How can we fix this problem to get the correct TUPLEDATA values?

Thanks in advance,

Regards,
Utku Sirin
sirin
Jan 20, 2015
Sorry for not adding the table. I attach the table as an image.

Thanks,
Regards

54
pzhao
Jan 20, 2015
Sirin,

I think our best approach here is to reproduce this problem. Just a few questions/information I'd like to gather:

After 1.6billion rows, can you validate that there are indeed values with a simple select statement 'where <firstcolumn> = 1600000001'?
Can you supply us with your deployment file, schema, and memory installed on all the nodes in your voltdb cluster?
What version of VoltDB are you current running?

I'd like to mimic as closely as possible to your voltdb configuration.

Peter Zhao
sirin
Jan 22, 2015
Hi,

Thank for your response.

I populated 1.6billion rows and it seems I can query it. One thing is Linux seems to have "cached" around 113G, would that be a problem? The output of free -m command:

$ free -m
total used free shared buffers cached
Mem: 258302 257495 807 0 600 113171
-/+ buffers/cache: 143723 114579
Swap: 16370 21 16349

The version we use is 4.8. We run our experiments in a single machine, the total amount of memory is 264501916KB (~264GB).

Our deployment and schema files are as below:

deployment.xml:

<?xml version="1.0"?>
<deployment>
<cluster hostcount="1" sitesperhost="1" kfactor="0" />
<httpd enabled="true">
<jsonapi enabled="true" />
</httpd>
</deployment>


ddl.sql:

CREATE TABLE store
(
id BIGINT NOT NULL,
val BIGINT NOT NULL,
PRIMARY KEY (id)
);

PARTITION TABLE store ON COLUMN id;

CREATE PROCEDURE FROM CLASS mbench.procedures.Initialize;
CREATE PROCEDURE FROM CLASS mbench.procedures.Get;
CREATE PROCEDURE FROM CLASS mbench.procedures.Update;

PARTITION PROCEDURE Initialize ON TABLE store COLUMN id;

Thank you,

Regards,
Utku
pzhao
Jan 22, 2015
Sirin,

Thanks for your response. Could you do a simple count(*) on your 'store' table just to verify the row count?
Added, would it be possible to send us all the necessary files via zip so that we can reproduce your setup?
If necessary, we will utilize AWS to test your setup.
Please send to support@voltdb.com.

Peter Zhao
pzhao
Jan 26, 2015
Sirin,

One suggestion is to check if swapping is enabled using the command 'swapon -s'. Memory swapping is enabled by default for most linux distributions and hinder VoltDB performance.

Here some documentation on recommended memory management configuration:
http://docs.voltdb.com/AdminGuide/adminmemmgt.php#adminserverswap