[quagga-dev 6461] Valgrind hits (and subsequent segfaults) in ospf6d

Steinar H. Gunderson sgunderson at bigfoot.com
Sun Mar 15 23:25:07 GMT 2009


[Please Cc me on replies, I'm not subscribed to the list]

Hi,

We're using Quagga (as of latest git HEAD including volatile/misc, but the
problem is also in at least 0.9.10 and 0.9.11, AFAICS). We're having problems
that ospf6d segfaults, especially when a neighbor router goes down. Valgrind
gives errors on the following form, which appears to be relevant:

==14614== Invalid read of size 4
==14614==    at 0x28F9B: ospf6_route_lock (ospf6_route.c:208)
==14614==    by 0x2A39D: ospf6_route_next (ospf6_route.c:682)
==14614==    by 0x31F04: ospf6_intra_brouter_calculation (ospf6_intra.c:1362)
==14614==    by 0x359F5: ospf6_spf_calculation_thread (ospf6_spf.c:540)
==14614==    by 0x4E500B7: thread_call (thread.c:1051)
==14614==    by 0xE3C9: main (ospf6_main.c:302)
==14614==  Address 0x5ebd7b8 is 32 bytes inside a block of size 232 free'd
==14614==    at 0x4C2130F: free (vg_replace_malloc.c:323)
==14614==    by 0x4E51C9A: zfree (memory.c:110)
==14614==    by 0x28F1D: ospf6_route_delete (ospf6_route.c:187)
==14614==    by 0x29010: ospf6_route_unlock (ospf6_route.c:222)
==14614==    by 0x2A1BD: ospf6_route_remove (ospf6_route.c:635)
==14614==    by 0x3AF07: ospf6_abr_examin_summary (ospf6_abr.c:656)
==14614==    by 0x3B2E1: ospf6_abr_examin_brouter (ospf6_abr.c:736)
==14614==    by 0x19005: ospf6_top_brouter_hook_remove (ospf6_top.c:107)
==14614==    by 0x2A1B4: ospf6_route_remove (ospf6_route.c:633)
==14614==    by 0x31DDC: ospf6_intra_brouter_calculation (ospf6_intra.c:1390)
==14614==    by 0x359F5: ospf6_spf_calculation_thread (ospf6_spf.c:540)
==14614==    by 0x4E500B7: thread_call (thread.c:1051)

(The line numbers might be +/- a few lines, since there's extra debugging
code in there as well.)

It's a bit hard to follow the lock/unlock stuff since I don't know the Quagga
code, but it seems a bit weird to me to _remove() a route within something
that traverses the linked list it's part of. Is this really right? And is it
right that ospf6_route_remove() is called twice? The lock count of the route
is 2 when it enters the removing loop in ospf6_intra_brouter_calculation
(the loop ospf6_intra.c:1390 is part of).

I've tried using the classic idiom of fetching the _next() node before doing
the removals etc., but that appears to be wrong as well, since
ospf6_abr_examin_summary can remove the next node in the linked list. Of
course, I might be entirely wrong and ospf6_route_remove() should not call
ospf6_route_delete() in this scenario at all :-)

Any ideas?

/* Steinar */
-- 
Homepage: http://www.sesse.net/



More information about the Quagga-dev mailing list