[quagga-dev 6463] Re: Valgrind hits (and subsequent segfaults) in ospf6d

Joakim Tjernlund Joakim.Tjernlund at transmode.se
Mon Mar 16 10:32:51 GMT 2009


> 
> [Please Cc me on replies, I'm not subscribed to the list]
> 
> Hi,
> 
> We're using Quagga (as of latest git HEAD including volatile/misc, but 
the

If this is from my repo at
 http://code.quagga.net/cgi-bin/gitweb.cgi?p=people/jocke/quagga;a=summary
cool :)
BTW, I pushed an update less than two weeks ago.

You are more than welcome to try the volatile/unnumbered branch too :)
There are bug fixes in there that might be useful.

> problem is also in at least 0.9.10 and 0.9.11, AFAICS). We're having 
problems
> that ospf6d segfaults, especially when a neighbor router goes down. 
Valgrind
> gives errors on the following form, which appears to be relevant:
> 
> ==14614== Invalid read of size 4
> ==14614==    at 0x28F9B: ospf6_route_lock (ospf6_route.c:208)
> ==14614==    by 0x2A39D: ospf6_route_next (ospf6_route.c:682)
> ==14614==    by 0x31F04: ospf6_intra_brouter_calculation 
(ospf6_intra.c:1362)
> ==14614==    by 0x359F5: ospf6_spf_calculation_thread (ospf6_spf.c:540)
> ==14614==    by 0x4E500B7: thread_call (thread.c:1051)
> ==14614==    by 0xE3C9: main (ospf6_main.c:302)
> ==14614==  Address 0x5ebd7b8 is 32 bytes inside a block of size 232 
free'd
> ==14614==    at 0x4C2130F: free (vg_replace_malloc.c:323)
> ==14614==    by 0x4E51C9A: zfree (memory.c:110)
> ==14614==    by 0x28F1D: ospf6_route_delete (ospf6_route.c:187)
> ==14614==    by 0x29010: ospf6_route_unlock (ospf6_route.c:222)
> ==14614==    by 0x2A1BD: ospf6_route_remove (ospf6_route.c:635)
> ==14614==    by 0x3AF07: ospf6_abr_examin_summary (ospf6_abr.c:656)
> ==14614==    by 0x3B2E1: ospf6_abr_examin_brouter (ospf6_abr.c:736)
> ==14614==    by 0x19005: ospf6_top_brouter_hook_remove (ospf6_top.c:107)
> ==14614==    by 0x2A1B4: ospf6_route_remove (ospf6_route.c:633)
> ==14614==    by 0x31DDC: ospf6_intra_brouter_calculation 
(ospf6_intra.c:1390)
> ==14614==    by 0x359F5: ospf6_spf_calculation_thread (ospf6_spf.c:540)
> ==14614==    by 0x4E500B7: thread_call (thread.c:1051)
> 
> (The line numbers might be +/- a few lines, since there's extra 
debugging
> code in there as well.)
> 
> It's a bit hard to follow the lock/unlock stuff since I don't know the 
Quagga
> code, but it seems a bit weird to me to _remove() a route within 
something
> that traverses the linked list it's part of. Is this really right? And 
is it
> right that ospf6_route_remove() is called twice? The lock count of the 
route
> is 2 when it enters the removing loop in ospf6_intra_brouter_calculation

That makes sense, the count is 1 before you enter and the
ospf6_route_head()/ ospf6_route_next() will add 1 too
during the loop.

> (the loop ospf6_intra.c:1390 is part of).
> 
> I've tried using the classic idiom of fetching the _next() node before 
doing
> the removals etc., but that appears to be wrong as well, since
> ospf6_abr_examin_summary can remove the next node in the linked list. Of
> course, I might be entirely wrong and ospf6_route_remove() should not 
call
> ospf6_route_delete() in this scenario at all :-)

The ospfd6 route code is somewhat hard to follow, especially since
it adds its own route_top()/route_next() functions.

This comment stands out though in ospf6_route_remove():
   node->info = NULL; /* should unlock route_node here ? */
An unlock here might be a good idea.

 Jocke



More information about the Quagga-dev mailing list