Forum: VoltDB Architecture

Post: Time travel

Time travel
jdrowell
Jul 19, 2010
I keep getting "Initiator time moved backwards" errors when running VoltDB in a virtualized environment. Is this a known problem? I'm not sure which hypervisor is being used (the hosting company doesn't mention it), but here's the kernel:


Linux ps27591 2.6.33.3-vs2.3.0.36.30.4-swap-nooomloop-nolivelock-oom-jt4 #17 SMP Fri Jun 4 16:02:05 PDT 2010 x86_64 GNU/Linux


Any hints? /proc is pretty hacked up also, but I may be able to dig up more information if needed.


Thanks,
jd
re: Time Travel
tcallaghan
Jul 19, 2010
JD,


Can you find out from your hosting provider what virtualization technology they are using? It sounds like your provider is synchronizing the VM times with the hosts time.


I run VMware ESXi here at VoltDB and there is a setting for each VM under "Options" -> "VMware Tools" -> "Synchronize guest time with host" that I leave unchecked. Instead, run the NTP client service within your VM and use a common NTP server for all nodes in your cluster. Minimally, you can do "ntpdate " if the NTP client service isn't running to synchronize a few servers.


-Tim
Same error: Initiator time moved backwards from
seo01
Sep 22, 2010
JD,


Can you find out from your hosting provider what virtualization technology they are using? It sounds like your provider is synchronizing the VM times with the hosts time.


I run VMware ESXi here at VoltDB and there is a setting for each VM under "Options" -> "VMware Tools" -> "Synchronize guest time with host" that I leave unchecked. Instead, run the NTP client service within your VM and use a common NTP server for all nodes in your cluster. Minimally, you can do "ntpdate " if the NTP client service isn't running to synchronize a few servers.


-Tim


I debug my VoltDB applications under Mac OS X and regularly see this problem. The machine is synced to Apples NTP server time.euro.apple.com. On further investigations the Exception seems to match syncronisation with the NTP server:
e.g.


Initiator time moved backwards from: 1285146127863 to 1285146127260
#1285146127260-->Sep 22 10:02:07 BST 2010
From the console:
$ grep ntpd /var/log/system.log
...
...
Sep 22 10:02:06 seo01s-iMac ntpd[26]: time reset -1.041238 s
My plan short term is to turn off the synchronisation to the NTP server but looking at the System log, I think this could lead to up to 50s drift per day. Are there any better solutions?
No great answer
jhugg
Sep 22, 2010
I debug my VoltDB applications under Mac OS X and regularly see this problem. The machine is synced to Apples NTP server time.euro.apple.com. On further investigations the Exception seems to match syncronisation with the NTP server:
e.g.


Initiator time moved backwards from: 1285146127863 to 1285146127260
#1285146127260-->Sep 22 10:02:07 BST 2010
From the console:
$ grep ntpd /var/log/system.log
...
...
Sep 22 10:02:06 seo01s-iMac ntpd[26]: time reset -1.041238 s
My plan short term is to turn off the synchronisation to the NTP server but looking at the System log, I think this could lead to up to 50s drift per day. Are there any better solutions?


The NTP daemon seems to only make negative adjustments if time gets more than 128ms out of sync. For a reliable NTP server and an NTP client that has a good sense of the hardware clock's drift profile, that's quite a bit.


Now I'm not an expert in how to override OS X's GUI configuration of the NTP daemon, but it seems like you could do a few things to mitigate the problem.


1. Try a different NTP server like pool.ntp.org. It might reduce the frequency of the issue, but it's probably not a full fix.
2. Figure out how to run the NTP daemon with the -x option. From the manpage, it will make the daemon resist negatively adjusting time unless the diff is over 600s. If your actual skew is over 50s a day, then it might still have to make negative adjustments.
3. Improve the drift profile. I forget how to do this, but google should be helpful.
4. All or some combination of the above.


If your mac is a laptop and it is frequently moving locations and/or connecting and disconnecting from the network, this may be a tough problem to solve, short of disabling NTP as you have already tried.


Finally, if you do turn off NTP, you can always run ntpdate as a one-off whenever you're not running VoltDB.
Same error on an Amazon cloud node
seo01
Oct 4, 2010
I debug my VoltDB applications under Mac OS X and regularly see this problem. The machine is synced to Apples NTP server time.euro.apple.com. On further investigations the Exception seems to match syncronisation with the NTP server:
e.g.


Initiator time moved backwards from: 1285146127863 to 1285146127260
#1285146127260-->Sep 22 10:02:07 BST 2010
From the console:
$ grep ntpd /var/log/system.log
...
...
Sep 22 10:02:06 seo01s-iMac ntpd[26]: time reset -1.041238 s
My plan short term is to turn off the synchronisation to the NTP server but looking at the System log, I think this could lead to up to 50s drift per day. Are there any better solutions?


I've just experienced this same error on an Amazon cloud node. Volt ran happily for 38 hours followed by:


[java] Initiator time moved backwards from: 1285973898800 to 1285973898793
[java] java.lang.Thread.dumpThreads(Native Method)
[java] java.lang.Thread.getAllStackTraces(Thread.java:1487)
[java] org.voltdb.VoltDB.crashVoltDB(VoltDB.java:299)
[java] org.voltdb.TransactionIdManager.getNextUniqueTransactionId(TransactionIdManager.java:127)
[java] org.voltdb.dtxn.SimpleDtxnInitiator.tick(SimpleDtxnInitiator.java:194)
[java] org.voltdb.ClientInterface.processPeriodicWork(ClientInterface.java:993)
[java] org.voltdb.PeriodicWorkTimerThread.run(PeriodicWorkTimerThread.java:57)
[java] VoltDB has encountered an unrecoverable error and is exiting.
[java] The log may contain additional information.
[java] Java Result: 255
My attempt at a solution is to change the node from synchronising with the clock of the box it is virtualised on to an NTP server.
Solution details:
sudo apt-get install ntp
stop the VM syncing with the clock of the internal box
sudo bash -c "echo 1 > /proc/sys/xen/independent_wallclock"


add -x option to the ntp init file
sudo nano /etc/init.d/ntp


NTPD_OPTS="-x"
Restart the NTPD server
sudo /etc/init.d/ntp restart


Check its running
grep ntp /var/log/syslog
If anyone has any other solutions I would be very interested. Also is anyone else successfully running a cluster on Amazon? I'd like to know if you suffered the same problem.
NTP improvements and Changes for 1.2
jhugg
Oct 5, 2010
Recently we've begun suggesting the "-x" option for NTP. "-x" will prevent NTP from negatively adjusting the time if the differential is less than 600 seconds.


Additionally, we've changed the failure mode for the upcoming 1.2 release. If time has been negatively adjusted:
1. A more descriptive message will be logged (log4j) with priority "ERROR".
2. An error message will also be sent to STDERR.
3. If the adjustment is >= 3s, then the node will fail.
4. If the adjustment is < 3s, then the initiator will delay processing work for the duration of the negative adjustment, then continue as usual.

If you are running on a development machine with imperfect NTP, you may periodically witness tiny pauses, but should be otherwise unaffected. These pauses vary depending on setup and clock accuracy, but shouldn't be more than a few seconds a day total.


If you are running in production, or want to ensure there are no pauses, please run NTP with "-x" and monitor the log output for errors.