Forum: Installation

Post: how to deploy on multiple machine,take the voter example?

how to deploy on multiple machine,take the voter example?
zankyhw
Jan 17, 2012
Hi,Guys

Recently i have tested the voltdb, but i have a problem with the voter example.

How can deploy on multiple machine, i can't find any attribute on the deployment.xml , or any description about this in
the guide doc. So can any one help with this?


I am curious about without any descripition about the IP addresses of the cluster, how can the data be ditributed across
the cluster.


in the deployment.xml file. there is only hostcount, siteperhost attribute, where do i assign the cluster machines Ip address?
Startup leader
rbetts
Jan 17, 2012
At startup, each node is configured with the IP address of the "leader." The leader node is only special in that it coordinates the startup of a cluster, communicating to the nodes that connect to it which links to establish to form a fully connected mesh.


RunStartDB#RunStartDBTasks


Hopefully this helps - let us know if you still have any questions.
Ryan.
have get some info from
zankyhw
Jan 18, 2012
At startup, each node is configured with the IP address of the "leader." The leader node is only special in that it coordinates the startup of a cluster, communicating to the nodes that connect to it which links to establish to form a fully connected mesh.

https://community.voltdb.com/docs/UsingVoltDB/RunStartDB#RunStartDBTasks

Hopefully this helps - let us know if you still have any questions.
Ryan.


i have get some info from your doc, but still have question.
//
When you are starting a VoltDB database, the VoltDB server process performs the following actions:

If you are starting the database on the node identified as the lead node, it waits for initialization messages from the remaining nodes.

If you are starting the database on a non-lead node, it sends an initialization message to the lead node indicating that it is ready.

Once all the nodes have sent initialization messages, the lead node sends out a message to the other nodes that the cluster is complete. The lead node then distributes the application catalog to all nodes.
//

My question is :If i specify 2 nodes in the cluster, do i need to do all the init work on the 2 nodes? Is there any possible can i specify the IP address of non-lead node on the lead-node to finish the init process?

If i did't finish the startup work on the non-lead node, the whole database won't be created, is that right?
The cluster won't be
rbetts
Jan 18, 2012
i have get some info from your doc, but still have question...


The cluster won't be available for work until it is fully initialized. If you specify 2 nodes, then you need to fully initialize two nodes.
i have specify the lead node
zankyhw
Jan 18, 2012
The cluster won't be available for work until it is fully initialized. If you specify 2 nodes, then you need to fully initialize two nodes.


i have specify the lead node address in the run.sh file,
and i have init the lead node,but when i init the non-lead node
i got the error message

INFO 02:38:10,860 [main] HOST: Build: 2.1.1 voltdb-2.1-37-g67899a1 Community Edition
INFO 02:38:10,865 [main] HOST: URL of deployment info: deployment.xml
INFO 02:38:11,057 [main] HOST: Cluster has 2 hosts with leader hostname: "10.78.121.161". 3 sites per host. K = 0.
INFO 02:38:11,057 [main] HOST: The entire cluster has 1 copy of each of the 6 logical partitions.
INFO 02:38:11,058 [main] HOST: Detection of network partitions in the cluster is not enabled.
INFO 02:38:11,059 [main] HOST: Using "/root/voltdb/examples/voter/voltdbroot" for voltdbroot directory.
INFO 02:38:11,078 [main] HOST: URL of deployment info: deployment.xml
INFO 02:38:11,112 [main] HOST: Beginning inter-node communication on port 3021.
INFO 02:38:11,124 [Thread-4] HOST: Connecting to the VoltDB cluster leader...
WARN 02:38:11,126 [Thread-4] org.voltdb.messaging.SocketJoiner: Joining primary failed: Connection refused retrying..
WARN 02:38:11,376 [Thread-4] org.voltdb.messaging.SocketJoiner: Joining primary failed: Connection refused retrying..
WARN 02:38:11,627 [Thread-4] org.voltdb.messaging.SocketJoiner: Joining primary failed: Connection refused retrying..
WARN 02:38:11,877 [Thread-4] org.voltdb.messaging.SocketJoiner: Joining primary failed: Connection refused retrying..
WARN 02:38:12,128 [Thread-4] org.voltdb.messaging.SocketJoiner: Joining primary failed: Connection refused retrying..
WARN 02:38:12,378 [Thread-4] org.voltdb.messaging.SocketJoiner: Joining primary failed: Connection refused retrying..




the two nodes have the same deployment.xml file, can u give me any hints on this error?
This means the second node
rbetts
Jan 18, 2012
i have specify the lead node address in the run.sh file,
and i have init the lead node,but when i init the non-lead node
i got the error message...


This means the second node can't contact 10.78.121.161 on port 3021. Can you ping 10.78.121.161 from the second node? Do you have any firewall rules affecting port 3021?
yeah, actually 10.78.121.161
zankyhw
Jan 19, 2012
This means the second node can't contact 10.78.121.161 on port 3021. Can you ping 10.78.121.161 from the second node? Do you have any firewall rules affecting port 3021?


yeah, actually 10.78.121.161 is a suse linux ,we can ping it from the non-lead node, and it's firewall is disabled, so still
we have the problem

is there any other things can cause this?
For some reason, the non-lead
rbetts
Jan 19, 2012
yeah, actually 10.78.121.161 is a suse linux ,we can ping it from the non-lead node, and it's firewall is disabled, so still
we have the problem

is there any other things can cause this?



For some reason, the non-lead node can't open a socket to the lead node. Can you attach the full logs for both the lead and non-lead nodes?

You can test the connection manually by starting VoltDB on 10.78.121.161 and telnet-ing from the second node to 10.78.121.161 on port 3021.


Feel free to send your logs to rbetts@voltdb.com if that is easier than posting them to the forum.


Ryan.
Specifically...
jhugg
Jan 17, 2012
There is a variable at the top of the run.sh script called LEADER. You'll want to change that to a real IP or hostname.

You'll also want to modify the deployment file to specify the number of hosts you're expecting.
Same issue - trying to run voter on a cluster
bobem
Mar 4, 2012
I am using the vmware images - very nice btw - to set up a 2 node cluster. I have copied and made the changes to the images so they are now voltdb1 at 192.168.1.151 and voltdb2 at 192.168.1.152. Both have full internet connectivity and both hosts files contain each other. I can ping between both machines. The issue that I see is that voltdb is only binding to the localhost address -

voltdb@voltdb2:~$ ping voltdb1
PING voltdb1 (192.168.1.151) 56(84) bytes of data.
64 bytes from voltdb1 (192.168.1.151): icmp_seq=1 ttl=64 time=0.371 ms



INFO 18:10:27,247 [main] HOST: Build: 2.2.1 voltdb-2.2.1-0-g7894422 Community Edition
INFO 18:10:27,251 [main] HOST: URL of deployment info: deployment.xml
INFO 18:10:27,496 [main] HOST: Cluster has 2 hosts with leader hostname: "voltdb1". 2 sites per host. K = 0.
INFO 18:10:27,497 [main] HOST: The entire cluster has 1 copy of each of the 4 logical partitions.
INFO 18:10:27,498 [main] HOST: Detection of network partitions in the cluster is not enabled.
INFO 18:10:27,499 [main] HOST: Using "/home/voltdb/voltdb-2.2.1/examples/voter/voltdbroot" for voltdbroot directory.
INFO 18:10:27,516 [main] HOST: URL of deployment info: deployment.xml
INFO 18:10:27,596 [main] HOST: Beginning inter-node communication on port 3021.
INFO 18:10:27,606 [Thread-4] HOST: Connecting to the VoltDB cluster leader voltdb1/192.168.1.151:3021
WARN 18:10:27,608 [Thread-4] org.voltdb.messaging.SocketJoiner: Joining primary failed: Connection refused retrying..


When i go to the voltdb1 machine I see

INFO 18:13:43,573 [main] HOST: Build: 2.2.1 voltdb-2.2.1-0-g7894422 Community Edition
INFO 18:13:43,578 [main] HOST: URL of deployment info: deployment.xml
INFO 18:13:43,805 [main] HOST: Cluster has 2 hosts with leader hostname: "voltdb1". 2 sites per host. K = 0.
INFO 18:13:43,805 [main] HOST: The entire cluster has 1 copy of each of the 4 logical partitions.
INFO 18:13:43,806 [main] HOST: Detection of network partitions in the cluster is not enabled.
INFO 18:13:43,807 [main] HOST: Using "/home/voltdb/voltdb-2.2.1/examples/voter/voltdbroot" for voltdbroot directory.
INFO 18:13:43,823 [main] HOST: URL of deployment info: deployment.xml
INFO 18:13:43,894 [main] HOST: Beginning inter-node communication on port 3021.
INFO 18:13:43,904 [Thread-4] HOST: Connecting to VoltDB cluster as the leader...



which looks good - but -

voltdb@voltdb1:~/voltdb-2.2.1/examples/voter$ netstat -na -tu
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:631 0.0.0.0:* LISTEN
tcp6 0 0 127.0.1.1:3021 :::* LISTEN
tcp6 0 0 :::22 :::* LISTEN
tcp6 0 0 ::1:631 :::* LISTEN
tcp6 0 0 ::1:54651 ::1:57631 TIME_WAIT
udp 0 0 0.0.0.0:36659 0.0.0.0:*
udp 0 0 0.0.0.0:68 0.0.0.0:*
udp 0 0 0.0.0.0:5353 0.0.0.0:*



the second node can't connect because the first node is not listening to the external interface

voltdb@voltdb1:~/voltdb-2.2.1/examples/voter$ ifconfig
eth3 Link encap:Ethernet HWaddr 00:0c:29:f7:fd:77
inet addr:192.168.1.151 Bcast:192.168.1.255 Mask:255.255.255.0
inet6 addr: fe80::20c:29ff:fef7:fd77/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:9927 errors:0 dropped:0 overruns:0 frame:0
TX packets:5787 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:1901728 (1.9 MB) TX bytes:343740 (343.7 KB)

lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:640 errors:0 dropped:0 overruns:0 frame:0
TX packets:640 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:32564 (32.5 KB) TX bytes:32564 (32.5 KB)



It is also strange that voltdb is only listening to the ipv6 localhost - although that could be by design. I have looked for a way of specifying the interface and have replaced localhost with voltdb1 in the voter directory (src and scripts).

I would be grateful for any help. Sorry for a necro bump, if it is an issue I can create a new thread.

Bob
In voltdb1 -
bobem
Mar 13, 2012
I was able to use two instances of our VMWare image to start a two node Voter client. I had to made three changes to the Voter example to make this happen.

1) Change the LEADER environment variable to the IP address (or hostname) of the leader in run.sh
2) Change the number of hosts specified in the deployment xml to 2.
3) Install and start NTP (and wait a few minutes for the clocks to sync up).

From the log output above, I don't think you're having trouble with 2, (and you haven't run into 3 yet).

So can you make sure that the LEADER is set in run.sh on *both* hosts to the *same* IP address? Let me know if that solves your problem.

Thanks.


In voltdb1 -

hosts:eureka


My hosts file was bad, voltdb1 was resolving voltdb1 to 127.0.0.1 and voltdb2 was resolving it to 192.168.1.151. Its interesting the voter database would come up fine in single site mode (Ie listen to 192.168.1.151:21212 - and function properly with a client able to connect across the net. The multi site database start up would only listen on 127.0.0.1:3021 and hence voltdb2 could not connect.

Anyway i am running - thank you for your help.
Multi-site listening vs Single Server Startup
bobem
Mar 27, 2012
In voltdb1 -

hosts:eureka


My hosts file was bad, voltdb1 was resolving voltdb1 to 127.0.0.1 and voltdb2 was resolving it to 192.168.1.151. Its interesting the voter database would come up fine in single site mode (Ie listen to 192.168.1.151:21212 - and function properly with a client able to connect across the net. The multi site database start up would only listen on 127.0.0.1:3021 and hence voltdb2 could not connect.

Anyway i am running - thank you for your help.


I think I know why this is occurring. In SocketJoiner.java VoltDB binds by:
m_listenerSocket.socket().bind(new InetSocketAddress(m_coordIp, BASE_PORT));



while in ClientInterface.java the socket is bound via:

m_serverSocket.socket().bind(new InetSocketAddress(m_port));



With the bogus voltdb1=127.0.0.1 in the hosts file the socket joiner would happily bind to 127.0.0.1. But when the system came up in single server mode it would bind to all addresses. I don't think there is any issue with this - although it is interesting that multi-host listener only listens to the specified address. Is this to allow the internal, node to node, VoltDB communication to work on a separate segment then client connections?
I just tried this out.
jhugg
Mar 9, 2012
I am using the vmware images - very nice btw - to set up a 2 node cluster. I have copied and made the changes to the images so they are now voltdb1 at 192.168.1.151 and voltdb2 at 192.168.1.152. Both have full internet connectivity and both hosts files contain each other. I can ping between both machines. The issue that I see is that voltdb is only binding to the localhost address...



I was able to use two instances of our VMWare image to start a two node Voter client. I had to made three changes to the Voter example to make this happen.

1) Change the LEADER environment variable to the IP address (or hostname) of the leader in run.sh
2) Change the number of hosts specified in the deployment xml to 2.
3) Install and start NTP (and wait a few minutes for the clocks to sync up).


From the log output above, I don't think you're having trouble with 2, (and you haven't run into 3 yet).


So can you make sure that the LEADER is set in run.sh on *both* hosts to the *same* IP address? Let me know if that solves your problem.


Thanks.