Results 1 to 8 of 8

Thread: Are all nodes the same in the cluster?

  1. #1

    Are all nodes the same in the cluster?

    Sorry if I missed this in the documentation, but your database has me quite intrigued and I have a few questions on the architecture.


    So a typical deployment would consist of a set of web servers and they would talk to the VoltDB cluster?
    That being the case. is there a single node that all the web servers would talk to. Kind of like an Aster Db where you have a queen with workers?


    I guess the part I'm trying to wrap my head around, is from a network standpoint. The webserver(s) do they know about all the nodes in the cluster and how do they communicate with the correct node. (if there isn't just one coordinator).


    Also from a high availablity standpoint. If a node goes down, is that data available from the other nodes in the cluster? When that node comes back online, how does the data get sync'd back to it? If there is a coordinator node is that fault tolerant as well?


    thanks...

  2. #2
    Senior Member
    Join Date
    Feb 2010
    Posts
    229

    re: Are all nodes the same in the cluster?

    VikingDBA,


    My responses are inline.


    So a typical deployment would consist of a set of web servers and they would talk to the VoltDB cluster?
    That being the case. is there a single node that all the web servers would talk to. Kind of like an Aster Db where you have a queen with workers?


    Tim: Your client applications connect to one or more nodes in the VoltDB cluster.


    I guess the part I'm trying to wrap my head around, is from a network standpoint. The webserver(s) do they know about all the nodes in the cluster and how do they communicate with the correct node. (if there isn't just one coordinator).


    Tim: VoltDB client applications send their work (stored procedure invocations) to any node in the cluster. If the work is sent to the node that contains the partition, it is handled by that node. If it is for a partition that is on another node, the work is forwarded to the appropriate node. In either case the response still comes back from the server that initially received the work.


    Also from a high availablity standpoint. If a node goes down, is that data available from the other nodes in the cluster? When that node comes back online, how does the data get sync'd back to it? If there is a coordinator node is that fault tolerant as well?


    Tim: We recommend that clusters are run "k-safe" for high availability. When you start your cluster you choose a value of k, and VoltDB stores k+1 copies of your data. In the event of a server failure, the other copies of your data are available in the cluster. Post failure, you can rejoin a node to the cluster to replace the failed node.


    -Tim

  3. #3
    New Member
    Join Date
    Jul 2013
    Posts
    19
    Hi,

    i had few questions about the architecture.Correct me if i am wrong

    When we start a cluster suppose i have 3 servers server1, server2,server3.

    Now to start suppose i start server1 as host , keeping the server1 up and running i ll start server2 and server3.

    Once all are running now can the request come to any server ????

    If for the first tym request comes to server1 to insert something into db then are the changes immediately reflected to server2 and server3???

    What if server1 goes down, will it stop server2 and server3 ????

  4. #4
    New Member
    Join Date
    Jul 2013
    Posts
    19
    Hi ,
    Suppose i am havinga a 2 node cluster with kfactor 1 then
    If the server which is started as a leader crashes then do the other node in the cluster also crash but the vice-versa is not true

  5. #5
    Super Moderator
    Join Date
    Feb 2010
    Posts
    186
    Quote Originally Posted by Shanky View Post
    Hi ,
    Suppose i am havinga a 2 node cluster with kfactor 1 then
    If the server which is started as a leader crashes then do the other node in the cluster also crash but the vice-versa is not true
    VoltDB includes a feature called "partition detection." Because VoltDB is a strictly consistent system, if a k-safe cluster is faulted in such a way as to leave two complete copies of the database separated from one another (so cut in half, basically), the database must take an action to prevent two live systems from continuing independent operation (split-brain).

    We do two things -- if the cluster is split with unequal node counts, for example split with 3 nodes on one side and 2 nodes on the other, we terminate the smaller cluster. When the cluster is split exactly 50/50, we terminate the side of the cluster that does not contain the oldest node.

    Of course, the cluster servers can not distinguish a remote crash from a network partition - so if in a 2 node cluster one node crashes, sometimes the other node will terminate itself to guarantee consistency and avoid split brain. This is what you are observing.

    There are a few solutions:

    (1) Run an odd number of nodes -- a k=1 2 node cluster really doesn't make sense in a consistent system for exactly the reasons you've observed. Run 3 nodes.
    (2) If you really just, you can disable partition-detection. I strongly recommend against this - but you have the option.

    This topic is explained in some detail in the online documentation. Please see: http://voltdb.com/docs/UsingVoltDB/KsafeNetPart.php

  6. #6
    VoltDB Team
    Join Date
    Oct 2011
    Posts
    68
    We don't recommend 2 nodes, because if a node crashes, the remaining node doesn't know if a network partition has occurred, or that the other node simply crashed. As a result, the surviving node has a 50/50 chance of determining "I should stay up!". Consider the scenario where there you have a 2 node cluster and a network partition does indeed happen: both nodes are active and accepting transactions, but can't communicate with the other node. This would be bad. In this case, one of the nodes choses itself to "survive". The other node will shut itself down. This is summarized here: http://voltdb.com/docs/UsingVoltDB/KsafeNetPart.php

    So to answer your question, it is fairly random, 50-50 if the surviving node will stay up or not, in your scenario. Note that it has nothing to do with the "leader", as nodes can fail (including the leader), be rejoined, etc, so the concept of a leader node isn't carried through the life of the database.

  7. #7
    Quote Originally Posted by jpiekos View Post
    We don't recommend 2 nodes, because if a node crashes, the remaining node doesn't know if a network partition has occurred, or that the other node simply crashed. As a result, the surviving node has a 50/50 chance of determining "I should stay up!". Consider the scenario where there you have a 2 node cluster and a network partition does indeed happen: both nodes are active and accepting transactions, but can't communicate with the other node. This would be bad. In this case, one of the nodes choses itself to "survive". The other node will shut itself down. This is summarized here: http://voltdb.com/docs/UsingVoltDB/KsafeNetPart.php

    So to answer your question, it is fairly random, 50-50 if the surviving node will stay up or not, in your scenario. Note that it has nothing to do with the "leader", as nodes can fail (including the leader), be rejoined, etc, so the concept of a leader node isn't carried through the life of the database.
    one question sir, let say if i still proceed with two node. what the best configuration should i use? i tried with it and found out if primary host down, the secondary node also down. but if secondary node down. primary still remain up.

  8. #8
    VoltDB Team
    Join Date
    Oct 2011
    Posts
    68
    If you are not concerned about network partitions (and possible "split brain"), you can turn partition detection off. In that way, with a K=1 configuration, if one node goes down, the other will survive. What you are seeing, as explained earlier, is a 50/50 chance of surviving given the network partition detection algorithm. Note that with network partition detection turned off, if you do have a partition, you can have "split brain", meaning both servers could be actively accepting new transactions, and data would diverge between each.

    Ideally you would run 3 nodes and enable network partitioning.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •