[quagga-dev 3698] Re: rfc2385

Simon Talbot simont at nse.co.uk
Thu Sep 29 23:36:54 BST 2005


Hasso,
 
I have been doing quite a bit more monitoring of our installed base over
the summer and trying to see a pattern to the occurrence of the rfc2385
slab.c bugcheck. It appears to only happen when there are "broken" MD5
protected sessions floating around. ie. when say communication between
routers has been interrupted (pipe down etc.) for a short period, and
then come back online with one router still believing that an TCP
session is up, and one (the Quagga Unit) thinking the session is dead.
The fault seems to lie around the handling of the situation when MD5
signed packets are arriving for a TCP connection which no longer exists.
 
The packet arrives at Quagga with an MD5 signature, the kernel has no
record of this conversation so replies to the sender with a TCP RST --
for some reason the kernel does not add an MD5 signature to this TCP RST
so the originating router ignores the RST and continues retrying the TCP
Open ad infinitum.
 
Whilst for 99.9% of the time this process is simply irritating and fills
up logs with the Signature found but not expected messages etc. -- on
occasion it triggers the kernel bugcheck.
 
This problem is obviously exaggerated during a restart of the router, as
there are many broken 'Ghost' connections arriving from the routers
peers, especially if the shutdown before the restart was not clean.
 
My guess here is that Quagga is in one hand instructing the kernel to
MD5 sign data on port 179 to Peer x, whilst the Ghost conversation if
still in progress and the kernel/Md5 patch is flushing the keys etc. out
of memory (or the keys did not exist in the first place) and hence when
trying to sign/check signature of a packet, attempting to read/write
kernel memory which it does not own and causing the bugcheck.
 
Now, there is a lot of guess work in here, I am going to start a
detailed look at the implementation, especially around the TCP RST which
lacks the signature, as this needs correcting anyway -- it may also lead
me to the real problem.
 
I thought I would inform you of my musings and monitoring results, in
case it lights any light bulbs in your head, or rings bells for anyone
else who is investigating this. I would be interested in your thoughts ?
 
Simon
 

Simon Talbot MEng, ACGI 
(Chief Engineer) 
Tel: 0845 6440972 
Fax: 0845 6440971 





More information about the Quagga-dev mailing list