[quagga-dev 7146] Re: ospfd leaves stale OSPF routes

Roman Hoog Antink rha at open.ch
Thu Aug 13 15:59:59 BST 2009


A simple and nicer workaround would be increasing the send buffer size 
in zebra/zserv.c:zebra_serv_un():1435 by means of setsockopt SO_SNDBUF. 
I tested this successfully by increasing /proc/sys/net/core/wmem_default 
(was 110592) to /proc/sys/net/core/wmem_max (=8388608).

Roman Hoog Antink schrieb:
> Hi there
> 
> I have quagga 0.99.14 under linux 2.6.29, learning more than 500 routes 
> from an OSPF peer. When sending SIGTERM to ospfd, 150 of the learned 
> routes remain in zebra and are marked with a * in "show ip route ospf", 
> meaning these are still active in the kernel.
> 
> I think I found the reason, why ospfd fails to cleanup all learned 
> routes in the kernel during shutdown. Let me guide you through the 
> shutdown process (as far as I dived into it).
> 
> The signal handler of SIGTERM finally leads to 
> ospfd/ospfd.c:ospf_finish_final(). That function first terminates all 
> timers (line 472) and then in line 525...
>   ospf_route_delete (ospf->old_external_route);
> ...zebra is being told to delete all external OSPF routes.
> 
> The communication to zebra is a nonblocking write over a unix socket. If 
> ospfd has to delete many routes (e.g. more than 500), the socket buffer 
> gets overrun, because zebra can't delete the routes as fast as they come 
> in, and a timer in ospfd should retry later. You can find the timer 
> being setup here: lib/zclient.c:262:zclient_send_message()
>     case BUFFER_PENDING:
>       THREAD_WRITE_ON(master, zclient->t_write,
>         zclient_flush_data, zclient, zclient->sock);
> 
> But as we are in the shutdown process already, that timer never gets its 
> chance.
> 
> You can find a patch in the attachment, which superficially deals with 
> the problem. But it is ugly, as it reverts all zebra communication to 
> blocking writes. Furthermore it uses usleep to avoid excessive CPU usage 
> during retries. That might lead to trouble, as usleep uses SIGALARM and 
> might interfere with ospfd's own timers.
> 
> In order to solve this problem nicely, a developer having the correct 
> semantics of ospfd's timers in mind, should think about it. Maybe we 
> should flush the buffers before exit somehow, whithout using a timer 
> thread.
> 
> Regards,
> Roman Hoog Antink
> 



More information about the Quagga-dev mailing list