[quagga-dev 5130] Re: Clock drift on BGP dumps
vasco at ripe.net
Tue Nov 20 11:01:46 GMT 2007
This is a response on the thread [quagga-dev 4559] Re: Clock drift on BGP dumps (see below).
I have found 2 issues on the bgp dumps that appear to be due to this clock drift.
We are using quagga v. 0.99.6 and have the following dumps config:
dump bgp all [path]/updates.%Y%m%d.%H%M 5m
dump bgp routes-mrt [path]/bview.%Y%m%d.%H%M 8h
#1 - We see this pattern occurring occasionally in our dumps:
-rw-r--r-- 1 quagga quagga 17564432 Nov 8 16:00 bview.20071108.1559.gz
-rw-r--r-- 1 quagga quagga 17477039 Nov 9 00:00 bview.20071108.2359.gz
-rw-r--r-- 1 quagga quagga 17468794 Nov 9 08:00 bview.20071109.0759.gz
-rw-r--r-- 1 quagga quagga 17448066 Nov 9 15:59 bview.20071109.1559.gz
-rw-r--r-- 1 quagga quagga 17448408 Nov 9 16:00 bview.20071109.1600.gz
-rw-r--r-- 1 quagga quagga 17338814 Nov 10 00:00 bview.20071109.2359.gz
-rw-r--r-- 1 quagga quagga 17371172 Nov 10 08:00 bview.20071110.0759.gz
-rw-r--r-- 1 quagga quagga 17408868 Nov 10 15:59 bview.20071110.1559.gz
-rw-r--r-- 1 quagga quagga 17409899 Nov 10 16:00 bview.20071110.1600.gz
-rw-r--r-- 1 quagga quagga 17428320 Nov 11 00:00 bview.20071110.2359.gz
-rw-r--r-- 1 quagga quagga 17448494 Nov 11 08:00 bview.20071111.0759.gz
Nearly all the full table dumps have the system time stamp (on the file name) of 1 minute before the exact time (0759, 1759 and 2359).
And seems that in the cases when the dump is finished within the minute (before the real dump time), a duplicate dump is made, with almost the exact same data (the only difference being the changes within that minute, probably).
See the cases above for 1559/1600.
Is there a way to go around this, to avoid duplicate dumps being made?
The main problem for us is that all the dumps are inserted in a database, and having sequenced duplicates causes double the load on our database insertion, and duplicate data too.
I have also noticed similar cases in the update dumps (however, that is less of a problem given the smaller size of the update files):
-rw-r--r-- 1 quagga quagga 42966 Nov 13 16:10 updates.20071113.1605.gz
-rw-r--r-- 1 quagga quagga 30151 Nov 13 16:15 updates.20071113.1610.gz
-rw-r--r-- 1 quagga quagga 85480 Nov 13 16:20 updates.20071113.1615.gz
-rw-r--r-- 1 quagga quagga 57322 Nov 13 16:25 updates.20071113.1620.gz
-rw-r--r-- 1 quagga quagga 101048 Nov 13 16:29 updates.20071113.1625.gz
-rw-r--r-- 1 quagga quagga 191 Nov 13 16:30 updates.20071113.1629.gz
-rw-r--r-- 1 quagga quagga 36164 Nov 13 16:35 updates.20071113.1630.gz
-rw-r--r-- 1 quagga quagga 35763 Nov 13 16:40 updates.20071113.1635.gz
-rw-r--r-- 1 quagga quagga 37738 Nov 13 16:45 updates.20071113.1640.gz
(notice the 1629 update..)
These cases seem to happen once every few days.
#2 - Occasionally, we also see empty rib dump files being created. This can happen at apparently random times (where a dump was not expected), or at the expected time for the dump:
-rw-r--r-- 1 quagga quagga 23216990 Oct 12 00:00 bview.20071011.2359.gz
-rw-r--r-- 1 quagga quagga 23008469 Oct 12 08:00 bview.20071012.0759.gz
-rw-r--r-- 1 quagga quagga 40 Oct 12 14:05 bview.20071012.1405.gz
-rw-r--r-- 1 quagga quagga 0 Oct 12 14:14 bview.20071012.1414
-rw-r--r-- 1 quagga quagga 20127182 Oct 12 16:00 bview.20071012.1600.gz
-rw-r--r-- 1 quagga quagga 40 Oct 12 23:59 bview.20071012.2359.gz
-rw-r--r-- 1 quagga quagga 20138531 Oct 13 08:00 bview.20071013.0759.gz
-rw-r--r-- 1 quagga quagga 20228040 Oct 13 16:00 bview.20071013.1559.gz
-rw-r--r-- 1 quagga quagga 20258573 Oct 14 00:00 bview.20071013.2359.gz
The 40b files are merely the gzip header, originally the file was 0b.
Any advice greatly appreciated on these issues. :)
Information Services Dept.
Paul Jakma* paul at clubi.ie
/Fri Jan 12 02:37:11 GMT 2007/
On Thu, 11 Jan 2007, Massimo Rimondini wrote:
>/ As you can see, by looking at the file names, all the RIBs but the
/>/ first two appear to be collected about 2 minutes before the
/>/ expected scheduled time. Even more strange, sometimes it happens
/>/ that two dumps are produced within a very limited time span (15:58
/>/ and 15:59), possibly resulting in incomplete file contents (look at
/>/ the file size). On the other hand, it is weird to notice that the
/>/ second dump (08:00 in the above list) took place at the correct
/>/ time instant. The 1 hour time offset you observe between the file
/>/ name timestamp and the actual file creation time is normal and due
/>/ to quagga running inside a User-Mode Linux sandbox within which the
/>/ timezone is set to UTC (our Linux box is set to CET).
/>/ The clock drift pattern you see has been observed in several
/>/ experiments I have performed, even after changing the dump
/>/ generation rate (12h instead of 8h). Conversely, BGP updates have
/>/ always been collected at the expected time instants, even after
/>/ changing the collection rate.
Note that only 'routes-mrt' does a bulk-dump at scheduled times. For
'updates', the updates are logged as they come in, the interval just
re-opens the dump file with the new name.
That said, it should be affected by the issue too, presuming
routes-mrt dumping is /also/ enabled..
>/ I experienced this issue with both Quagga v0.99.5 (Debian package)
/>/ and Quagga v0.99.6 (compiled from scratch). Everything seems to go
/>/ fine with the latest stable release (0.98.6).
The only difference between the two is that 0.99.5+ has changes to
how lib/thread calculates timers, to make it robust against backward
time-changes. In particular this means the quagga daemons now have
their own notion of time, decoupled from the system clock, at least
so far as it effects when timers get run. (The dump filename is still
constructed from the system clock time).
>/ My guess is that something goes wrong between the internal clock of
/>/ Quagga and the RTC of the machine it is running on.
Well, that they can drift from each other is intentional. Quagga's
clock /can/ appear to run faster than system time, if system time is
prone to changing backwards:
>/ Our RTC clock is NTP synchronized,
which is what NTP may have to do regularly if the system timer runs
fast compared to UTC.
>/ and has no reason to undergo
/>/ sudden (backward or forward) changes. I have already attempted to
/>/ locate the portion of code that is responsible for this mistiming
/>/ without success, but I am still looking into it.
I'd suggest printing out the 'recent_time' struct, exported by
thread.c, and comparing it to the result of time() - before and after
I suspect what that happens is that NTP regularly makes backward
corrections that are detectible to the code in lib/thread.c which
adjusts Quagga's time forward by 1s if/when it detects a backward
system-time transition. See quagga_gettimeofday_relative_adjust() in
I would have thought though that NTP makes adjustments in /tiny/
increments, such that that probability of them being visible to
Quagga is low (and hence, Quagga's time+1s code rarely gets
Do we know the NTP increment? Do you have a fast-running RTC in that
system? Does not running NTP fix the problem?
Another solution may be to define HAVE_CLOCK_MONOTIC if your system
supports clock_gettime() and CLOCK_MONOTONIC. The code supporting
that isn't tested yet though (but its not /hugely/ complicated -
CLOCK_MONOTONIC support is deliberately not enabled precisely to test
the gettimeofday() monotonic code as widely as possible, as it needs
it more ;) ).
I'm not sure what we can do about this problem unfortunately - it's
quite intentional that Quagga may run 'fast' when the system clock
regularly makes noticeable backward jumps in time.
Paul Jakma paul at clubi.ie <http://lists.quagga.net/mailman/listinfo/quagga-dev> paul at jakma.org <http://lists.quagga.net/mailman/listinfo/quagga-dev> Key ID: 64A2FF6A
Pie are not square. Pie are round. Cornbread are square.
More information about the Quagga-dev