[quagga-users 14032] Re: "SLOW THREAD" errors and "Hold Timer Expired" events on all neighbors

Donald Sharp sharpd at cumulusnetworks.com
Mon Jun 29 21:52:24 BST 2015


I wouldn't mind getting a look at a perf run over bgp, as it is coming
up/receiving all these routes.  Something like this:

perf record -o <output_filename> -p <pid of bgp> --call-graph
<let it run while bgp is churning on the routes>
<ctrl-c> after a while.

perf report -i <filename of output file from above> --call-graph

Perhaps there is something obvious going on that will show up.

donald

On Sun, Jun 28, 2015 at 11:26 PM, Andrew Gideon <
andrew-quagga-users-918 at tagonline.com> wrote:

> I've an older "router" running CentOS5 that I'm trying to replace with
> one running CentOS7.  Both machines have 1G of RAM.
>
> I'm having no problems with the older router.  My reason for upgrading
> is to get away from the old 2.6.18 kernel and into a newer (3.10) kernel
> with support for RFC6164, ipset, etc.
>
> The older machine is running Quagga v0.99.22.4.  I've tried both
> quagga-0.99.22.4-4.el7 and 0.99.24.1 on the newer machine.
>
> In both cases, there are up to six IPv4 neighbors.  Two are eBGP peers
> providing full feeds.  The other four are iBGP peers providing whatever
> routes are "best" on those devices.
>
> Just because of how our topology is, this usually means 500000+ routes
> from three of the six peers.  The others are just sending a small number
> of routes for the subnets to which they provide gateway service.
>
> The newer router also has two IPv6 iBGP peerings, each providing about
> 10000 to 20000 routes.
>
> Very rarely, I'll see something like:
>
>         SLOW THREAD: task bgp_read (987250) ran for 9292ms (cpu time
> 9266ms)
>
> on the older router.
>
> On the newer router, the slightest route change (ie. adding or removing
> a gateway IP for some subnet, which means adding or removing a single
> route that will be distributed by BGP) often causes errors like:
>
>         SLOW THREAD: task bgp_scan_timer (7fc91c8a9120) ran for 62227ms
> (cpu time 1875ms)
>
> Sometimes, on the newer router, the problem goes further and all the
> peerings drop with:
>
>         %NOTIFICATION: sent to neighbor 207.111.77.38 4/0 (Hold Timer
> Expired) 0 bytes
>
> That never occurs on the older router.
>
> I'm trying to understand why the difference between the older and newer
> routers.
>
> I've noticed that that bgpd gets significantly larger on the newer
> router than the older, with ps reporting about 400000+M on the older
> router and as much as 700000+M on the newer router.  Given that the
> machines are rather memory-limited, I'm guessing that the problem on the
> newer router is that the process is paged out too often.
>
> I seem to have improved things - made the errors less frequent - by
> making the changes on the new router:
>
>       * Removing all "soft-reconfiguration inbound" which, at least on
>         Cisco routers, consumes extra memory.  Note that this remains
>         enabled on the older router for all neighbors.
>       * Renicing the bgpd process to -4
>       * Ionicing the bgpd process to Realtime
>
> I've also tried running on the new router the same bgpd configuration
> (with a change in neighbor IPs, of course) as on the older router.  Even
> w/o all the stuff (neighbors, additional route-maps, etc.) added for the
> IPv6 neighbors, the newer router still uses more memory that the older
> router.
>
> I'm going to try to put some more memory into the newer router later
> this week.  But...I'm still discomforted by this (or at least my lack of
> understanding of what is behind this).  I don't *know* that memory (and
> therefore paging) is behind the timeouts, but I suspect so if only
> because I tend to assume two odd problems occurring together are
> related.
>
> If anyone has any thoughts or suggestions, I'd welcome them.
>
> Thanks...
>
>         Andrew
>
>
>
> _______________________________________________
> Quagga-users mailing list
> Quagga-users at lists.quagga.net
> https://lists.quagga.net/mailman/listinfo/quagga-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quagga.net/pipermail/quagga-users/attachments/20150629/32e9f322/attachment.html>


More information about the Quagga-users mailing list