[quagga-dev 4361] BGP 0.99 testers needed
paul at clubi.ie
Thu Sep 14 04:42:12 BST 2006
I've taken the liberty of putting back what I believe may be the
final 0.99 bgpd regression fixes. See below for details on those.
What would be really useful is for people with test networks, and any
others who are affected by below issues, to take the CVS snapshot as
of 20060914 and stress test it as much as possible.
In particular stress testing of any and all combinations of:
- deleting neighbours and adding them back
- reconfiguring neighbours
(in ways that require bgpd to reset the session particularly)
- hard clearing neighbours
(in as wide a variety of BGP peer states as possible)
- maximum prefix overflow
(this was borked because of the prefix-count drift, that's fixed,
but max-prefix isn't well-tested in combination with the
clearing/shutdown/deleted changes. In theory those changes
shouldn't affect max-prefix, confirmation would be good).
would be appreciated.
One changeset in particular has not been widely tested, and if it has
a mistake will cause odd crashes (though, it's of a series intended
to eliminate a crash..).
I'll be away for a while and may not have email access for up to a
week and a half.
Regressions known in 0.99.5, and their status in CVS:
- Prefix count issue: Believed to be fixed by
2006-09-06 Paul Jakma <paul.jakma at sun.com>
* (general) Squash any and all prefix-count issues by
abstracting route flag changes, and maintaining count as and
when flags are modified (rather than relying on explicit
modifications of count being sprinkled in just the right
places throughout the code).
Fix confirmed by one tester at least, who was seeing this in
production. If there any issues please report output of the:
'show .... bgp neighbor <address> prefix-counts'
(NB: this command does a RIB walk, so it's not a 'cheap' command,
you may wish to use it sparingly. It's an enable-mode only
command for a reason)
- shutdown sometimes doesn't stick, and 'no neighbour' could still
cause crashes: Believed to be fixed by
2006-09-14 Paul Jakma <paul.jakma at sun.com>
* (general) Fix some niggly issues around 'shutdown' and clearing
by adding a Clearing FSM wait-state and a hidden 'Deleted'
FSM state, to allow deleted peers to 'cool off' and hit 0
references. This introduces a slow memory leak of struct peer,
however that's more a testament to the fragility of the
reference counting than a bug in this patch, cleanup of
reference counting to fix this is to follow.
One tester has tried to torture this a bit, and it fixes the 'no
neighbour' crash he saw.
The mentioned leak should be fixed by:
* (general) fix the peer refcount issue exposed by previous, by
just removing refcounting of peer threads, which is mostly
senseless as they're references leading from struct peer,
which peer_free cancels anyway. No need to muck around..
which I've tested extensively, though in a slightly different form.
It's fairly sane, but wider testing is needed to ensure there are
no dumb mistakes.
If there are crashes, chances are high that /only/ the latter 'fix
the peer refcount issue exposed' changeset needs to be reverted to
regain stability (and a very slow leak of a ~40kB struct peer every
time an ACCEPT_PEER peer comes in..).
Paul Jakma paul at clubi.ie paul at jakma.org Key ID: 64A2FF6A
All right, let's not panic. I'll make the money back by selling one
of my livers. I can get by with one.
-- Homer Simpson
Homer vs. Patty and Selma
More information about the Quagga-dev