Forum: Installation

Post: requirements vs. desirements

requirements vs. desirements
deanbanks
Jul 27, 2010
Is the requirement for a 2+ core machine a hard operating requirement, or simply a recommendation to attain excellent performance? I'm acquiring used hardware for a bare-bones test implementation and would like to know what in the VoltDB minimum hardware requirements list is a hard requirement vice simply a recommendation.

Cheers,
Dean
re: required server specs
tcallaghan
Jul 27, 2010
Dean,

Good question. The number of physical cores, speed of the cores, and memory bandwidth all have an impact on the performance of VoltDB. However, for testing purposes you can certainly use a low-end system.

I just ran the Voter example application (in examples/voter within the kit) on my MacBook Pro 13. With a 2.26 GHz Core2Duo (dual core) I was able to perform 29,022 transactions per second running both the VoltDB server and client application.

If you have a chance please run Voter on your hardware and report back the system specs (CPU + RAM) as well as the transactional performance.
-Tim
partial results
deanbanks
Aug 25, 2010
I have voter benchmark results on a one-machine configuration.

Specs on the single VoltDB server:

Single 3.0GHz Xeon (Irwindale-era: single core, hyperthreading) 2M cache 800MHz FSB
4 gigs of DDR2-400/PC2-3200 RAM

The client was run on a similar machine, connected to the server over a gigabit link using jumbo frames. Client was run with defaults except the votes per second was adjusted to keep from flooding the server (and to keep the queue length from reaching zero).

I also adjusted the sitesperhost metric with interesting results. The VoltDB manual suggests setting sitesperhost to 75% of the number of physical cores. I realize that while the OS shows a hyperthreading CPU as two virtual cores, its performance is closer to 1.25 to 1.5 cores. Thus I had assumed a sitesperhost=1 would be the sweet spot.

sitesperhost=1: 5700 votes/sec
sitesperhost=2: 8400 votes/sec
sitesperhost=3: 10,728 votes/sec
sitesperhost=4: 11,780 votes/sec


The performance gain seems to taper off as the sitesperhost goes up, but there are indeed gains. This seems incongruent with the recommendations in the user manual.

I'll post numbers for the 3 machine cluster next week.
re: Single core Xeon numbers
tcallaghan
Aug 25, 2010
I have voter benchmark results on a one-machine configuration.

Specs on the single VoltDB server:

Single 3.0GHz Xeon (Irwindale-era: single core, hyperthreading) 2M cache 800MHz FSB
4 gigs of DDR2-400/PC2-3200 RAM

The client was run on a similar machine, connected to the server over a gigabit link using jumbo frames. Client was run with defaults except the votes per second was adjusted to keep from flooding the server (and to keep the queue length from reaching zero).

I also adjusted the sitesperhost metric with interesting results. The VoltDB manual suggests setting sitesperhost to 75% of the number of physical cores. I realize that while the OS shows a hyperthreading CPU as two virtual cores, its performance is closer to 1.25 to 1.5 cores. Thus I had assumed a sitesperhost=1 would be the sweet spot.

sitesperhost=1: 5700 votes/sec
sitesperhost=2: 8400 votes/sec
sitesperhost=3: 10,728 votes/sec
sitesperhost=4: 11,780 votes/sec


The performance gain seems to taper off as the sitesperhost goes up, but there are indeed gains. This seems incongruent with the recommendations in the user manual.

I'll post numbers for the 3 machine cluster next week.


Dean,

Thanks for posting your results and please post your 3-node numbers when you have them. Also, do you know the actual Xeon model number used?

Our recommendations for setting sitesperhost are based on quad-core Xeons. Your numbers show that this 75% rule doesn't hold up for a single-socket single-core server, it will be interesting to see what happens when you have a multi-node cluster.

-Tim
CPU info
deanbanks
Aug 25, 2010
Hi Tim,
Best I can tell, the processor is one of these: http://www.cpu-world.com/CPUs/Xeon/Intel-Xeon%203%20GHz%20-%20RK80546KG0802MM%20-%20NE80546KG0802MM%20%28BX80546KG3000FA%29.html

/proc/cpuinfo:
vendor_id : GenuineIntel
cpu family : 15
model : 4
model name : Intel(R) Xeon(TM) CPU 3.00GHz
stepping : 3
cpu MHz : 2992.980
cache size : 2048 KB
fpu : yes
fpu_exception : yes
cpuid level : 5
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc up pebs bts pni dtes64 monitor ds_cpl cid cx16 xtpr
bogomips : 5985.96
clflush size : 64
cache_alignment : 128
address sizes : 36 bits physical, 48 bits virtual


The other two machines in the cluster will be identical.

Cheers,
Dean
Latency calculation in ClientVoter.java
deanbanks
Sep 1, 2010
Part of my application is particularly sensitive to latency, so I'm changing ClientVoter.java to use 1ms buckets rather than 25ms buckets in the latency calculation. As such, I've got a stupid question--
// change latency to bucket
int latency_bucket = (int) (execution_time / 251);
if (latency_bucket > 8) {
latency_bucket = 8;
}
latency_counter[latency_bucket]++;




Since the voter test output implies buckets that are 25ms wide (0-25ms, 26-50ms, ...), and since all latency data within the voter application appears to be in units of milliseconds, it makes sense that modulo 25 division would be used to map response latency into the appropriate bucket. What I can't figure out is: Why is execution_time is divided by 251 rather than by 25 in order to get 25 millisecond-wide buckets?

I ran the application with

int latency_bucket = (int) (execution_time / 1);

and the histogram seemed to match the reported average latency, although I did not verify it mathematically.


Cheers,

Dean
re: Latency Calculation
tcallaghan
Sep 1, 2010
Part of my application is particularly sensitive to latency, so I'm changing ClientVoter.java to use 1ms buckets rather than 25ms buckets in the latency calculation. As such, I've got a stupid question-- // change latency to bucket
int latency_bucket = (int) (execution_time / 251);
if (latency_bucket > 8) {
latency_bucket = 8;
}
latency_counter[latency_bucket]++;




Since the voter test output implies buckets that are 25ms wide (0-25ms, 26-50ms, ...), and since all latency data within the voter application appears to be in units of milliseconds, it makes sense that modulo 25 division would be used to map response latency into the appropriate bucket. What I can't figure out is: Why is execution_time is divided by 251 rather than by 25 in order to get 25 millisecond-wide buckets?

I ran the application with

int latency_bucket = (int) (execution_time / 1);

and the histogram seemed to match the reported average latency, although I did not verify it mathematically.


Cheers,

Dean


Dean,
Not a stupid question, but one that scared me when I first read your post! (I thought I was calculating the buckets incorrectly)

My code is dividing by "25l" (the final character is the letter L, not the number 1).


-Tim
interesting benchmark results
deanbanks
Sep 1, 2010
Thanks, Tim. That makes a lot more sense.

I've done a series of benchmarks this morning...with very interesting results. Let me preface this by stating that my focus is on latency, while VoltDB is primarily focused on transactions per second. However, latency does seem to be a good proxy for determining when VoltDB is starting to saturate.

I've modified the ClientVoter.java program to divide latency into 2ms buckets, and to report latency as a function of the percentage of total responses at or below a given latency. I've done this to align with some other benchmarks that I'm doing. The benefit of this format, is it's possible to do a nice line graph (0-100%) vs. latency for each load point in order to quickly see the impact of increasing load.

In redoing a few benchmarks from what I published previously, I saw some anomalous results. That pushed me to dig a bit more into what had changed. What I found was surprising--I did my cluster benchmarks first, then repeated some of the single server benchmarks. When repeating the single server benchmarks, the performance was significantly higher. Here's what I found--performance of a single VoltDB node is highly dependent on the leader= field in deployment.xml: If the server's IP address is used, the performance is significantly higher than if localhost is used:

average latency in ms
15000 sps 10000 sps 5000 sps
named IP 30.35 3.17 2.34
localhost 78.31 8.02 2.14



sps = stored procedures per second (the setpoint used in ClientVoter)

So I definitely wanted a specific IP address in deployment.xml

Here are the numbers for a single server, running with 4 sites and an explicit IP address. The percentages are cumulative of the total request volume. Note that there appears to be an error in the client application when 500 procedures/sec are seleceted (it reports that it runs close to 1000). I have not delved into the details of this issue, since low volume numbers are only used to baseline server latencies at light load.

Latency (ms) 20000sps 15000sps 10000sps 7500sps 5000sps 2500sps 1000sps 500sps
2 0.00% 0.06% 6.64% 24.18% 18.02% 17.72% 12.39% 12.51%
4 0.00% 3.16% 67.24% 88.95% 88.37% 75.24% 42.53% 42.89%
6 0.00% 16.55% 91.69% 95.52% 96.07% 93.22% 71.97% 72.19%
8 0.00% 36.52% 95.66% 96.39% 96.95% 96.68% 89.19% 89.32%
10 0.00% 55.17% 96.73% 93.81% 97.28% 97.33% 95.58% 95.75%
12 0.00% 69.21% 97.24% 97.01% 97.39% 97.45% 96.87% 97.07%
14 0.00% 78.28% 97.46% 97.08% 97.41% 97.47% 97.20% 97.42%
16 0.00% 83.88% 97.55% 97.11% 97.42% 97.48% 97.24% 97.47%
avg latency 1624 30.35 3.17 3.03 2.34 2.66 4.54 4.13
calls/sec 17121 14939 9972 6988 4995 1998 999 999



Here are the numbers for a K=1 cluster of 3 servers. All server machines are 3.0GHz single core as outlined in my previous posts.

Latency (ms) 20000sps 15000sps 10000sps 7500sps 5000sps 2500sps 1000sps 500sps
2 0.00% 0.00% 0.00% 0.00% 0.04% 0.04% 0.28% 0.34%
4 0.00% 0.00% 0.10% 0.74% 2.63% 2.12% 4.42% 4.79%
6 0.00% 0.00% 2.86% 8.88% 13.34% 14.20% 11.48% 11.93%
8 0.00% 0.00% 12.34% 31.87% 31.71% 33.87% 23.83% 24.56%
10 0.00% 0.00% 25.85% 62.25% 54.98% 55.46% 41.07% 45.79%
12 0.00% 0.07% 38.21% 84.07% 76.81% 74.62% 52.72% 68.54%
14 0.00% 0.37% 48.47% 93.53% 88.92% 86.87% 59.08% 84.66%
16 0.00% 1.50% 57.94% 96.17% 94.22% 93.12% 68.68% 93.14%
avg latency 197.12 78.83 14.45 8.77 9.09 9.16 11.61 9.72
calls/sec 19789 14824 9980 6984 4992 1998 999 999



Lots to consider here. Thoughts?

Still on the TODO list is to perform a sensitivity study of number of sites (single server, single CPU) vs. performance.

Cheers,
Dean
re: results
tcallaghan
Sep 2, 2010
Thanks, Tim. That makes a lot more sense.

I've done a series of benchmarks this morning...with very interesting results. Let me preface this by stating that my focus is on latency, while VoltDB is primarily focused on transactions per second. However, latency does seem to be a good proxy for determining when VoltDB is starting to saturate....


Dean,

This is interesting data that I'll digest in the next few days. Also interesting is the numbers in light of the hardware used (single core CPUs).
-Tim
Minimal: 64bit, Java 1.6
henning
Aug 8, 2011
Hi Dean,

from what I have seen, only 64bit and Java 1.6 are required to get your toes wet.

The remaining requirements apply when you go beyond Hello World. If I am not mistaken:

Ant is required for easy building of tests or building VoltDB from source (which you usually don't need).

2+ cores, 4GB+ are probably were decent performance starts.

Java 1.6-18+ saves you from a mean Java VM multi tasking bug that the folks at VoltDB found at pointed out to Sun/Oracle. It will probably only bite with really throughput.

Best,
Henning