Forum: Building VoltDB Applications

Post: New Example Application: Voting Simulation (Voter)

New Example Application: Voting Simulation (Voter)
tcallaghan
Mar 31, 2010
You can download the application from https://community.voltdb.com/node/47
It will be included with all the other example applications in our next release.

====================================================================================


This example application simulates a voting process. It allows between 1 and 12 candidates to run in an election of any duration (limited only by the physical memory of your cluster since the simulator tracks how many time each phone number has voted).


Voter is a high-performance application. A single client may have a difficult time keeping a modern 3-node cluster busy.
- As delivered: a single Dell R610 (2 x Xeon 5530) for client/server, client rate limited at 100,000 transactions per second (TPS), 2 partitions, performs 80,104 TPS.
- Running a single R610 as the client and using 3 R610s as servers (6 partitions per server) achieves 400,257 TPS.


Tech Notes:
- The Voter client application supports rate control (the number of transactions per second to send to the servers), this can be modified in the build.xml file.
- The Voter client application also records/reports latency. It is implemented entirely in the client application for v0.6 compatibility (latency tracking is built into the client library in the next version so this client application will be changed with that same release).


Important Notes:
- runs on VoltDB v0.6.02 and above.
- must be installed into your kit's "examples" folder (alongside "auction", "helloworld", and "satellite") to run as-is.
- modify location of voltdb*.jar and "-Djava.library.path" to run from other locations.
What am I doing wrong?
jharris
Apr 7, 2010
Hey Tim,

I'm trying the out of the box single-node configuration (ant server/ant client) on an HP DL580 G5 (24 x Xeon X7460 @ 2.66GHz) running CentOS 5.4 and am getting the following:

*************************************************************************
Voting Results
*************************************************************************
- Accepted votes = 3,929,581
- Rejected votes (invalid contestant) = 19,720
- Rejected votes (voter over limit) = 0

- Contestant Jessie Alloway received 550,467 vote(s)
- Contestant Kelly Clauss received 651,957 vote(s)
- Contestant Jessie Eichman received 223,113 vote(s)
- Contestant Alana Bregman received 651,864 vote(s)
- Contestant Tabatha Gehling received 223,597 vote(s)
- Contestant Edwina Burnam received 1,628,583 vote(s)

- Contestant Edwina Burnam was the winner with 1,628,583 vote(s)


*************************************************************************
System Statistics
*************************************************************************
- Ran for 120.70 seconds
- Performed 3,949,301 Stored Procedure calls
- At 32,721.06 calls per second
- Average Latency = 591.33 ms
- Latency 0ms - 25ms = 0
- Latency 25ms - 50ms = 0
- Latency 50ms - 75ms = 0
- Latency 75ms - 100ms = 0
- Latency 100ms - 125ms = 0
- Latency 125ms - 150ms = 0
- Latency 150ms - 175ms = 0
- Latency 175ms - 200ms = 0
- Latency 200ms+ = 3,889,210
I also tried it on a Dell R900 (16 x Xeon X7350 @ 2.93GHz) and got similar results:

*************************************************************************
Voting Results
*************************************************************************
- Accepted votes = 3,823,844
- Rejected votes (invalid contestant) = 19,157
- Rejected votes (voter over limit) = 0

- Contestant Jessie Alloway received 534,525 vote(s)
- Contestant Kelly Clauss received 635,048 vote(s)
- Contestant Jessie Eichman received 217,810 vote(s)
- Contestant Alana Bregman received 633,730 vote(s)
- Contestant Tabatha Gehling received 218,320 vote(s)
- Contestant Edwina Burnam received 1,584,411 vote(s)

- Contestant Edwina Burnam was the winner with 1,584,411 vote(s)


*************************************************************************
System Statistics
*************************************************************************
- Ran for 121.24 seconds
- Performed 3,843,001 Stored Procedure calls
- At 31,698.78 calls per second
- Average Latency = 1072.96 ms
- Latency 0ms - 25ms = 0
- Latency 25ms - 50ms = 0
- Latency 50ms - 75ms = 0
- Latency 75ms - 100ms = 0
- Latency 100ms - 125ms = 0
- Latency 125ms - 150ms = 0
- Latency 150ms - 175ms = 0
- Latency 175ms - 200ms = 0
- Latency 200ms+ = 3,780,377

Any ideas?

HP Info:

[jharris@dbXX voter]$ uname -a
Linux dbXX 2.6.18-164.2.1.el5 #1 SMP Mon Sep 21 04:37:42 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux

[jharris@dbXX voter]$ cat /proc/cpuinfo | tail -n 24
processor : 23
vendor_id : GenuineIntel
cpu family : 6
model : 29
model name : Intel(R) Xeon(R) CPU X7460 @ 2.66GHz
stepping : 1
cpu MHz : 2666.759
cache size : 16384 KB
physical id : 3
siblings : 6
core id : 5
cpu cores : 6
apicid : 29
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm constant_tsc pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr sse4_1 lahf_lm
bogomips : 5333.64
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:

[jharris@dbXX voter]$ cat /proc/meminfo
MemTotal: 132099200 kB
MemFree: 3354172 kB
Buffers: 356636 kB
Cached: 125556928 kB
SwapCached: 508 kB
Active: 64583160 kB
Inactive: 62275608 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 132099200 kB
LowFree: 3354172 kB
SwapTotal: 16386292 kB
SwapFree: 13372252 kB
Dirty: 248 kB
Writeback: 0 kB
AnonPages: 944756 kB
Mapped: 62773948 kB
Slab: 758932 kB
PageTables: 716168 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
CommitLimit: 82435892 kB
Committed_AS: 67828364 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 390400 kB
VmallocChunk: 34359347939 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
Hugepagesize: 2048 kB
Doing something wrong?
tcallaghan
Apr 7, 2010
Hey Tim,

I'm trying the out of the box single-node configuration (ant server/ant client) on an HP DL580 G5 (24 x Xeon X7460 @ 2.66GHz) running CentOS 5.4 and am getting the following:

*************************************************************************
Voting Results
*************************************************************************
- Accepted votes = 3,929,581
- Rejected votes (invalid contestant) = 19,720
- Rejected votes (voter over limit) = 0

- Contestant Jessie Alloway received 550,467 vote(s)
- Contestant Kelly Clauss received 651,957 vote(s)
- Contestant Jessie Eichman received 223,113 vote(s)
- Contestant Alana Bregman received 651,864 vote(s)
- Contestant Tabatha Gehling received 223,597 vote(s)
- Contestant Edwina Burnam received 1,628,583 vote(s)

- Contestant Edwina Burnam was the winner with 1,628,583 vote(s)...

Jonah,

I'm guessing that your question is why are your transactions per second (TPS) much lower than mine (32,000 for you and 80,000 for me).

The processor architecture changes that Intel made with the Xeon 55xx series (Nehalem) improved our performance dramatically over the Xeon 54xx series. These processors also support more channels of memory access per socket and higher speed DDR3.

Your Xeon 74xx and 73xx servers have more physical cores put are not as powerful (per core or per socket) for VoltDB. I believe the newly released Xeon 75xx processors bring Nehalem technology to the 4+ socket server space and should close that gap, I'd love to see someone run a benchmark on one.

Also, the voter application, as delivered, is run on a single machine (client + server) and only creates 2 partitions. You can modify the build.xml file and increase the number of partitions to find the sweet spot for your servers (look at the "catalog" target). Let me know the number of partitions that works best for your server.

I did some benchmarking on a 4 socket Xeon 7460 server in January and found that my best numbers were using 4 partitions on the server, I believe this was mostly limited by the available memory bandwidth.

Lastly, this application has extremely small transactions (even by VoltDB standards). Most real-world applications would have more complexity in their stored procedures. This type of application will not be able to utilize all 24 or 16 physical cores in your servers. On a 2 socket Xeon 55xx series server we have access to 8 physical cores (16 including hyperthreading). I normally set those servers to do 6 partitions each, with the other two cores available for networking and general housekeeping.

Hope this helps.

-Tim
Doh!
jharris
Apr 7, 2010
Jonah,

I'm guessing that your question is why are your transactions per second (TPS) much lower than mine (32,000 for you and 80,000 for me).

The processor architecture changes that Intel made with the Xeon 55xx series (Nehalem) improved our performance dramatically over the Xeon 54xx series. These processors also support more channels of memory access per socket and higher speed DDR3...

I forgot that the 55xx's were Nehalem. I'll give this a try on one of my 8-core servers running with E5540s @ 2.53GHz.
Nehalem
tcallaghan
Apr 7, 2010
I forgot that the 55xx's were Nehalem. I'll give this a try on one of my 8-core servers running with E5540s @ 2.53GHz.

Great.

Any chance you have access to Xeon 5600 or Xeon 7500 servers? :)

Someone else at VoltDB read your message and wondered if you were also concerned about the latency numbers in your executions. They are high because your client application is fire-hosing a 100% utilized server. There is a setting in build.xml in the "client" target as follows:

When you find what your server is capable of you can change the "votes per second" of this client to a value less than that number. You will then see dramatically better (lower) latency numbers.

-Tim
RE: Nehalem
jharris
Apr 7, 2010
Great.

Any chance you have access to Xeon 5600 or Xeon 7500 servers? :)

Someone else at VoltDB read your message and wondered if you were also concerned about the latency numbers in your executions. They are high because your client application is fire-hosing a 100% utilized server. There is a setting in build.xml in the "client" target as follows:

When you find what your server is capable of you can change the "votes per second" of this client to a value less than that number. You will then see dramatically better (lower) latency numbers.

-Tim

Nope, we're still waiting on those :(

I'm trying a two-node Nehalem cluster over GbE with the client running on a separate, fast driver machine and am seeing about a 30% increase in performance. Aside from the catalog build, what specific client changes did you make?
2 node test
tcallaghan
Apr 7, 2010
Nope, we're still waiting on those :(

I'm trying a two-node Nehalem cluster over GbE with the client running on a separate, fast driver machine and am seeing about a 30% increase in performance. Aside from the catalog build, what specific client changes did you make?

To get your best # with that configuration you'll want to do the following:
1. Change the build.xml "catalog" target to be 2 hosts and 6 sites per host. Also, set the leader to be one of the two servers (by name).
2. Change the build.xml "client" target to include both servers (comma separated), its currently set to localhost. Also, change the max number of votes per second the client will generate to something huge, say "999999999". This is currently set to "100000" which means a single client will generate a maximum of 100,000 transaction requests per second.

Let me know your findings.

-Tim
RE: 2 node test
jharris
Apr 7, 2010
To get your best # with that configuration you'll want to do the following:
1. Change the build.xml "catalog" target to be 2 hosts and 6 sites per host. Also, set the leader to be one of the two servers (by name).
2. Change the build.xml "client" target to include both servers (comma separated), its currently set to localhost. Also, change the max number of votes per second the client will generate to something huge, say "999999999". This is currently set to "100000" which means a single client will generate a maximum of 100,000 transaction requests per second.

Let me know your findings.

-Tim

In a 2,4 configuration I get 189K TPS. If I do 2,6 it's 152K.
Nehalem-based Results
jharris
Apr 7, 2010
Great.

Any chance you have access to Xeon 5600 or Xeon 7500 servers? :)

Someone else at VoltDB read your message and wondered if you were also concerned about the latency numbers in your executions. They are high because your client application is fire-hosing a 100% utilized server. There is a setting in build.xml in the "client" target as follows:

When you find what your server is capable of you can change the "votes per second" of this client to a value less than that number. You will then see dramatically better (lower) latency numbers.

-Tim

FYI, on the previously mentioned single-node Nehalem, I now get ~71.5K TPS with 2 partitions and ~90.4K with 4 partitions.
Min. processor for each line?
chbussler
Apr 12, 2010
I forgot that the 55xx's were Nehalem. I'll give this a try on one of my 8-core servers running with E5540s @ 2.53GHz.

Hi,

not sure if this question makes sense, but I'll give it a shot anyway. Intel has the different server processor lines (http://www.intel.com/p/en_US/products/server/processor). Do you have a 'starting' or 'minimal' processor per line that you would say supports VoltDBs structure 'properly'?

Like above, you said that the 55xx series is a lot better than the 54xx series. Do you have similar information for the other lines?
Or the other way around, what features should a processor have in order to support VoltDB's implementation best?

Thanks,
Christoph

PS Nobody discussed AMD yet, not sure if there is similar info on their processors. Thanks.
What we know about CPUs.
jhugg
Apr 12, 2010
Hi,

not sure if this question makes sense, but I'll give it a shot anyway. Intel has the different server processor lines (http://www.intel.com/p/en_US/products/server/processor). Do you have a 'starting' or 'minimal' processor per line that you would say supports VoltDBs structure 'properly'?

Like above, you said that the 55xx series is a lot better than the 54xx series. Do you have similar information for the other lines?
Or the other way around, what features should a processor have in order to support VoltDB's implementation best?

Thanks,
Christoph

PS Nobody discussed AMD yet, not sure if there is similar info on their processors. Thanks.

This is ballpark, from memory and depends on the workload, but for one benchmark, we see numbers like this:

2.4ghz Core 2 Duo: 6000 txns/sec.
2.4ghz Core 2 Quad: 15000 txns/sec.
3.0ghz Phenom II X4 (shanghai) 23000 txns/sec.
2.4ghz Xeon 5400 8-core (Core-2-derivative) 26000 txns/sec.
2.3ghz Opteron 2376 8-core (shanghai) 30000 txns/sec.
2.6ghz Core i7 920 4-core 8-thread: 30000 txns/sec.
2.6ghz Xeon 5500 8-core 16-thread (i7-derivative) 50000 txns/sec.

Short answer


i7/Nehelem/5500s are great. If you don't have an extreme workload you should be fine with older stuff, but it degrades quicker than you'd expect for a bunch of chips with about the same clock speed.

Longer answer


There are three major architectures in use (that we've tested).

Intel Core 2 / Xeon 5400s:

These processors seem bottlenecked by their memory bandwidth. The consumer chips use 2-channel DD2 and the Xeons use 2-channel buffered DDR2. Because the memory controller is on the northbridge, there is additional latency over the other processors on this list. Furthermore, the multi-socket chips have a bottlenecked cache-coherence scheme, meaning that 16 cores isn't any faster than 8 in our testing.

AMD Phenom II / Shanghai Opterons:

Phenom II processors in one socket perform slightly better than their Core 2 counterparts, probably due to the on-die memory controller. Multi-socket Opterons also outperform their Xeon 5400 counterparts, likely due to the on-die controller and the hypter-transport bus used for interprocessor communication.

Intel Core i7 / Xeon 5500s:

These chips have on-die memory controllers. They have 3 channels of DDR3 memory per socket. The multi-socket options have a hyper-transport-esque super-fast inter-processor bus. Finally, they all support SMT or hyper-threading, meaning they can run two instruction pipelines per core, for a 2x increase in logical cores. Since VoltDB is often memory-starved, hyper-threading works harder to keep all cores busy. In general, this is a nice chip for VoltDB.

Notes


VoltDB seems to scale pretty well with CPU speed. Preliminary tests on 3.33ghz i7s and 5500s show they are proportionately faster than the 2.66 versions we have at VoltDB HQ. Same goes for 3.0ghz AMD chips versus 2.3ghz AMD chips.

We haven't done much testing on servers with more than 2 sockets. We expect the throughput/dollar will probably be better served by adding more 2-socket machines. If you really want to run on 4 or more sockets, please do it on an architecture with a fast inter-processor bus.

-John Hugg
VoltDB Engineering
Thanks!
henning
Apr 17, 2010
This is ballpark, from memory and depends on the workload, but for one benchmark, we see numbers like this:

2.4ghz Core 2 Duo: 6000 txns/sec.
2.4ghz Core 2 Quad: 15000 txns/sec.
3.0ghz Phenom II X4 (shanghai) 23000 txns/sec.
2.4ghz Xeon 5400 8-core (Core-2-derivative) 26000 txns/sec.
2.3ghz Opteron 2376 8-core (shanghai) 30000 txns/sec.
2.6ghz Core i7 920 4-core 8-thread: 30000 txns/sec.
2.6ghz Xeon 5500 8-core 16-thread (i7-derivative) 50000 txns/sec.

Short answer


i7/Nehelem/5500s are great. If you don't have an extreme workload you should be fine with older stuff, but it degrades quicker than you'd expect for a bunch of chips with about the same clock speed...

-John Hugg
VoltDB Engineering

Thanks a lot, John, great primer on that issue!
Code for 'traditional RDBMS'?
chbussler
Apr 12, 2010
Hey Tim,

I'm trying the out of the box single-node configuration (ant server/ant client) on an HP DL580 G5 (24 x Xeon X7460 @ 2.66GHz) running CentOS 5.4 and am getting the following:

*************************************************************************
Voting Results
*************************************************************************
- Accepted votes = 3,929,581
- Rejected votes (invalid contestant) = 19,720
- Rejected votes (voter over limit) = 0

- Contestant Jessie Alloway received 550,467 vote(s)
- Contestant Kelly Clauss received 651,957 vote(s)
- Contestant Jessie Eichman received 223,113 vote(s)
- Contestant Alana Bregman received 651,864 vote(s)
- Contestant Tabatha Gehling received 223,597 vote(s)
- Contestant Edwina Burnam received 1,628,583 vote(s)

- Contestant Edwina Burnam was the winner with 1,628,583 vote(s)...

Hi,

great example application! It allows easily to play around with various parameters.

Just curious, do you happen to have the same application implemented on a traditional RDBMS like MySql or Postgres? For 'fun' it would be nice to see the number on those systems on the same hardware on my machine.

Thanks,
Christoph
re: Code for 'traditional RDBMS'?
tcallaghan
Apr 12, 2010
Hi,

great example application! It allows easily to play around with various parameters.

Just curious, do you happen to have the same application implemented on a traditional RDBMS like MySql or Postgres? For 'fun' it would be nice to see the number on those systems on the same hardware on my machine.

Thanks,
Christoph

Christoph,

We don't currently have this implemented in another RDBMS. The trick to getting a decent number is dynamically creating enough client "applications" to get anything resembling a decent number, since they are operating synchronously between client and server.

I'd love to hear your ideas as to how we could create a client framework for these purposes. Or anyone else's ideas...

-Tim
TPC test implementations?
chbussler
Apr 13, 2010
Christoph,

We don't currently have this implemented in another RDBMS. The trick to getting a decent number is dynamically creating enough client "applications" to get anything resembling a decent number, since they are operating synchronously between client and server.

I'd love to hear your ideas as to how we could create a client framework for these purposes. Or anyone else's ideas...

-Tim

Hi Tim,

not sure if TPC test implementation are openly available (have not done a search), but don't they have the same issue of generating requests?

Thanks,
Christoph
re: TPC test implementation
tcallaghan
Apr 13, 2010
Hi Tim,

not sure if TPC test implementation are openly available (have not done a search), but don't they have the same issue of generating requests?

Thanks,
Christoph

Christoph,

We will be releasing our implementation at some point in the near future.

-Tim
Framework
henning
Apr 17, 2010
Hi Tim,

not sure if TPC test implementation are openly available (have not done a search), but don't they have the same issue of generating requests?

Thanks,
Christoph

You mean a framework to have and start up multiple clients, also for other RDBMS?