[quagga-dev 5130] Re: Clock drift on BGP dumps

Vasco Asturiano vasco at ripe.net
Tue Nov 20 11:01:46 GMT 2007


Hi all,

This is a response on the thread [quagga-dev 4559] Re: Clock drift on BGP dumps (see below).

I have found 2 issues on the bgp dumps that appear to be due to this clock drift.
We are using quagga v. 0.99.6 and have the following dumps config:

dump bgp all [path]/updates.%Y%m%d.%H%M 5m
dump bgp routes-mrt [path]/bview.%Y%m%d.%H%M 8h

#1 - We see this pattern occurring occasionally in our dumps:

-rw-r--r--    1 quagga   quagga   17564432 Nov  8 16:00 bview.20071108.1559.gz
-rw-r--r--    1 quagga   quagga   17477039 Nov  9 00:00 bview.20071108.2359.gz
-rw-r--r--    1 quagga   quagga   17468794 Nov  9 08:00 bview.20071109.0759.gz
-rw-r--r--    1 quagga   quagga   17448066 Nov  9 15:59 bview.20071109.1559.gz
-rw-r--r--    1 quagga   quagga   17448408 Nov  9 16:00 bview.20071109.1600.gz
-rw-r--r--    1 quagga   quagga   17338814 Nov 10 00:00 bview.20071109.2359.gz
-rw-r--r--    1 quagga   quagga   17371172 Nov 10 08:00 bview.20071110.0759.gz
-rw-r--r--    1 quagga   quagga   17408868 Nov 10 15:59 bview.20071110.1559.gz
-rw-r--r--    1 quagga   quagga   17409899 Nov 10 16:00 bview.20071110.1600.gz
-rw-r--r--    1 quagga   quagga   17428320 Nov 11 00:00 bview.20071110.2359.gz
-rw-r--r--    1 quagga   quagga   17448494 Nov 11 08:00 bview.20071111.0759.gz


Nearly all the full table dumps have the system time stamp (on the file name) of 1 minute before the exact time (0759, 1759 and 2359). 
And seems that in the cases when the dump is finished within the minute (before the real dump time), a duplicate dump is made, with almost the exact same data (the only difference being the changes within that minute, probably).
See the cases above for 1559/1600. 

Is there a way to go around this, to avoid duplicate dumps being made? 
The main problem for us is that all the dumps are inserted in a database, and having sequenced duplicates causes double the load on our database insertion, and duplicate data too.

I have also noticed similar cases in the update dumps (however, that is less of a problem given the smaller size of the update files):

-rw-r--r--    1 quagga   quagga      42966 Nov 13 16:10 updates.20071113.1605.gz
-rw-r--r--    1 quagga   quagga      30151 Nov 13 16:15 updates.20071113.1610.gz
-rw-r--r--    1 quagga   quagga      85480 Nov 13 16:20 updates.20071113.1615.gz
-rw-r--r--    1 quagga   quagga      57322 Nov 13 16:25 updates.20071113.1620.gz
-rw-r--r--    1 quagga   quagga     101048 Nov 13 16:29 updates.20071113.1625.gz
-rw-r--r--    1 quagga   quagga        191 Nov 13 16:30 updates.20071113.1629.gz
-rw-r--r--    1 quagga   quagga      36164 Nov 13 16:35 updates.20071113.1630.gz
-rw-r--r--    1 quagga   quagga      35763 Nov 13 16:40 updates.20071113.1635.gz
-rw-r--r--    1 quagga   quagga      37738 Nov 13 16:45 updates.20071113.1640.gz

(notice the 1629 update..)

These cases seem to happen once every few days.

---

#2 - Occasionally, we also see empty rib dump files being created. This can happen at apparently random times (where a dump was not expected), or at the expected time for the dump:

-rw-r--r--    1 quagga   quagga   23216990 Oct 12 00:00 bview.20071011.2359.gz
-rw-r--r--    1 quagga   quagga   23008469 Oct 12 08:00 bview.20071012.0759.gz
-rw-r--r--    1 quagga   quagga         40 Oct 12 14:05 bview.20071012.1405.gz
-rw-r--r--    1 quagga   quagga          0 Oct 12 14:14 bview.20071012.1414
-rw-r--r--    1 quagga   quagga   20127182 Oct 12 16:00 bview.20071012.1600.gz
-rw-r--r--    1 quagga   quagga         40 Oct 12 23:59 bview.20071012.2359.gz
-rw-r--r--    1 quagga   quagga   20138531 Oct 13 08:00 bview.20071013.0759.gz
-rw-r--r--    1 quagga   quagga   20228040 Oct 13 16:00 bview.20071013.1559.gz
-rw-r--r--    1 quagga   quagga   20258573 Oct 14 00:00 bview.20071013.2359.gz

The 40b files are merely the gzip header, originally the file was 0b.

Any advice greatly appreciated on these issues. :)

Best regards,
-- 

Vasco Asturiano
-------------------------
Information Services Dept.
RIPE NCC
http://www.ripe.net

*--------------

Paul Jakma* paul at clubi.ie
<mailto:quagga-dev%40lists.quagga.net?Subject=%5Bquagga-dev%204559%5D%20Re%3A%20Clock%20drift%20on%20BGP%20dumps&In-Reply-To=45A5FC70.9060201%40dia.uniroma3.it>
/Fri Jan 12 02:37:11 GMT 2007/

On Thu, 11 Jan 2007, Massimo Rimondini wrote:
>/ As you can see, by looking at the file names, all the RIBs but the 
/>/ first two appear to be collected about 2 minutes before the 
/>/ expected scheduled time. Even more strange, sometimes it happens 
/>/ that two dumps are produced within a very limited time span (15:58 
/>/ and 15:59), possibly resulting in incomplete file contents (look at 
/>/ the file size). On the other hand, it is weird to notice that the 
/>/ second dump (08:00 in the above list) took place at the correct 
/>/ time instant. The 1 hour time offset you observe between the file 
/>/ name timestamp and the actual file creation time is normal and due 
/>/ to quagga running inside a User-Mode Linux sandbox within which the 
/>/ timezone is set to UTC (our Linux box is set to CET).
/>/
/>/ The clock drift pattern you see has been observed in several 
/>/ experiments I have performed, even after changing the dump 
/>/ generation rate (12h instead of 8h). Conversely, BGP updates have 
/>/ always been collected at the expected time instants, even after 
/>/ changing the collection rate.
/
Note that only 'routes-mrt' does a bulk-dump at scheduled times. For 
'updates', the updates are logged as they come in, the interval just 
re-opens the dump file with the new name.

That said, it should be affected by the issue too, presuming 
routes-mrt dumping is /also/ enabled..

>/ I experienced this issue with both Quagga v0.99.5 (Debian package) 
/>/ and Quagga v0.99.6 (compiled from scratch). Everything seems to go 
/>/ fine with the latest stable release (0.98.6).
/
The only difference between the two is that 0.99.5+ has changes to 
how lib/thread calculates timers, to make it robust against backward 
time-changes. In particular this means the quagga daemons now have 
their own notion of time, decoupled from the system clock, at least 
so far as it effects when timers get run. (The dump filename is still 
constructed from the system clock time).

>/ My guess is that something goes wrong between the internal clock of 
/>/ Quagga and the RTC of the machine it is running on.
/
Well, that they can drift from each other is intentional. Quagga's 
clock /can/ appear to run faster than system time, if system time is 
prone to changing backwards:

>/ Our RTC clock is NTP synchronized,
/
which is what NTP may have to do regularly if the system timer runs 
fast compared to UTC.

>/ and has no reason to undergo 
/>/ sudden (backward or forward) changes. I have already attempted to 
/>/ locate the portion of code that is responsible for this mistiming 
/>/ without success, but I am still looking into it.
/
I'd suggest printing out the 'recent_time' struct, exported by 
thread.c, and comparing it to the result of time() - before and after 
dumps.

I suspect what that happens is that NTP regularly makes backward 
corrections that are detectible to the code in lib/thread.c which 
adjusts Quagga's time forward by 1s if/when it detects a backward 
system-time transition. See quagga_gettimeofday_relative_adjust() in 
thread.c.

I would have thought though that NTP makes adjustments in /tiny/ 
increments, such that that probability of them being visible to 
Quagga is low (and hence, Quagga's time+1s code rarely gets 
triggered).

Do we know the NTP increment? Do you have a fast-running RTC in that 
system? Does not running NTP fix the problem?

Another solution may be to define HAVE_CLOCK_MONOTIC if your system 
supports clock_gettime() and CLOCK_MONOTONIC. The code supporting 
that isn't tested yet though (but its not /hugely/ complicated - 
CLOCK_MONOTONIC support is deliberately not enabled precisely to test 
the gettimeofday() monotonic code as widely as possible, as it needs 
it more ;) ).

I'm not sure what we can do about this problem unfortunately - it's 
quite intentional that Quagga may run 'fast' when the system clock 
regularly makes noticeable backward jumps in time.

regards,
-- 
Paul Jakma	paul at clubi.ie <http://lists.quagga.net/mailman/listinfo/quagga-dev>	paul at jakma.org <http://lists.quagga.net/mailman/listinfo/quagga-dev>	Key ID: 64A2FF6A
Fortune:
Pie are not square.  Pie are round.  Cornbread are square.




More information about the Quagga-dev mailing list