Forum: Other

Post: VoltDB native memory is wasted on invalid SP inserts

VoltDB native memory is wasted on invalid SP inserts
Dmtry
Sep 2, 2015
Hi,
I've encountered quite strange behaviour that makes me lost.
Preconditions

  • OS: Ubuntu 14.04.3 LTS
  • Transparent huge pages: off
    
    $ cat /sys/kernel/mm/transparent_hugepage/enabled
    always madvise [never]
    $ cat /sys/kernel/mm/transparent_hugepage/defrag
    always madvise [never]
    

  • VoltDB sources revision: tag 'voltdb-5.5'
  • VoltDB build line: ant clean dist -Djmemcheck=NO_MEMCHECK
  • VoltDB schema:
    
    CREATE TABLE TABLE2 (
        id INTEGER NOT NULL,
        data VARCHAR(65535) NOT NULL,
        CONSTRAINT pk_TABLE2 PRIMARY KEY (id)
    );
    PARTITION TABLE TABLE2 ON COLUMN id;
    



Scenario

  1. Execute VoltDB (1 node, 2 partitions, catalog with schema mentioned above)
  2. Execute iteratively in 10 threads (assuming voltClient is initialized) the following code:
    
            int key = random.nextInt(1000000);
            voltClient.callProcedure("TABLE2.insert", key, UUID.randomUUID().toString());
    



    Result
    After table is filled (@Statistics TABLE reports about 500000 tuples in each partition) RES memory reported by 'top' continues to grow gradually.
    If I leave machine overnight untouched, the VoltDB process is killed by OOM killer.

    Expected Result
    RES memory should not grow after table is filled.

    Workaround
    Interestingly, but if I execute "select count(*) from TABLE2" from sqlcmd, the extra memory is reclaimed!

    I noticed some amount of unreacheable DirectByteBuffer instances in the heap dump (it was reduced after calling 'select count...' above). Probably it is connected somehow but do not sure, cause as I know VoltDB calls buffer's cleaner explicitly...

    Did anyone notice similar behavior of have any explanation of what happens?
    Thanks!
Dmtry
Sep 15, 2015
Some new details:
Part of memory is reclaimed each time when in course of MP request ExecutionEngine.nativeExecutePlanFragments method is executed against each individual partition. Looks like some per-partition swollen buffers are cleared.
nshi
Sep 15, 2015
When you left the server running overnight, were there any transactions, either MP or SP, running periodically?

Did you have auto-snapshot turned on?

What was the RSS after inserting all the tuples and what was the RSS when the process was killed by the OOM killer?
Dmtry
Sep 16, 2015
When you left the server running overnight, were there any transactions, either MP or SP, running periodically?

Yes, 10 threads continually executed SP inserts to the same partitioned table. Please note that table has 'data' VARCHAR column. When I tried to insert to the table that has NO any column except primary 'id', I did not note the problem with memory.

Did you have auto-snapshot turned on?

I do not see 'snapshot' section in deployment.xml I used so seems no, there were no snapshots.

What was the RSS after inserting all the tuples and what was the RSS when the process was killed by the OOM killer?

Well, could not remember exact figures but top MEM column showed about 3.7% out of 16 GB after inserting of 1000000 unique records. Cannot find at the moment record regarding OOM in the system log (it happend quite long time ago) but if it is needed I can try to reproduce OOM condition again. Should I? If so, what are other metrics can I measure to help with the problem analysis?