Forum: Building VoltDB Applications

Post: VoltDB is exhausting the RAM while loading the data.

VoltDB is exhausting the RAM while loading the data.
sanket
Oct 25, 2016
I am trying to load the database tables into VoltDB database using csvloader utility of VoltDB. When I am trying to load one table of size 5GB, Voltdb eats the RAM so fast that free RAM become 200 MB from 55 GB, then the VoltDB process gets killed by the system. I have also increased the RSS from 80% to 95% in deployment file and my RAM is 65GB.

What can be the reason for this and what are the recommended setting for VoltDB to avoid this?
bballard
Oct 25, 2016
Is the table you are loading partitioned? That's the first thing to check, because if you have the default sitesperhost=8 on a single server, and the table is not partitioned, there will be a complete copy of the table in each of the 8 partitions. If the table is partitioned, the data is distributed among the partitions based on the hashing assignment of the values of the partitioning key column.

If it's partitioned and you still can't load all of the data, the next thing to look at would be the schema. There are formulas in the Planning Guide that describe the memory usage for given datatypes and for indexes. The VMC interface also has a sizing worksheet that gives you the mins and maxes based on the schema. You could also post the definition of the table you are trying to load, along with any indexes you have defined on it, and we can explain more about the bytes it would use per row.
How to check whether the table is partitioned or not?
sanket
Oct 26, 2016
Is the table you are loading partitioned? That's the first thing to check, because if you have the default sitesperhost=8 on a single server, and the table is not partitioned, there will be a complete copy of the table in each of the 8 partitions. If the table is partitioned, the data is distributed among the partitions based on the hashing assignment of the values of the partitioning key column.

If it's partitioned and you still can't load all of the data, the next thing to look at would be the schema. There are formulas in the Planning Guide that describe the memory usage for given datatypes and for indexes. The VMC interface also has a sizing worksheet that gives you the mins and maxes based on the schema. You could also post the definition of the table you are trying to load, along with any indexes you have defined on it, and we can explain more about the bytes it would use per row.


Can you please tell me, how to check that the table is partitioned or not and how to make them partitioned?
bballard
Oct 26, 2016
To check, you can go to the Schema tab in the VMC web interface (typically http://localhost:8080) and then on the 2nd row of tabs select "Schema" which lists all of the tables and views. Find the table, and it will have a label "Partitioned" or "Replicated".

From the CLI, you can call "exec @SystemCatalog TABLES;" and look at the "REMARKS" column. If this shows {"partitionColumn":"<column name>"} then it is partitioned. If it is NULL then it is replicated.

Here's how to choose a partition column and how to set it in your DDL:
https://docs.voltdb.com/UsingVoltDB/DesignPartition.php
https://docs.voltdb.com/UsingVoltDB/ddlref_partitiontable.php