[quagga-dev 14941] Re: [PATCH 00/89] Cumulus Take-3 Electric Bugaloooooooooooooooo
paul at jakma.org
Thu Mar 17 08:29:34 GMT 2016
On Wed, 16 Mar 2016, Donald Sharp wrote:
> If you have an interface that bounces allot, we were seeing crashes
> associated with a 'show ospf ...' command during a period when the
> interface was down. So since we are dumping some data to the vty it
> was decided that the most intelligent thing to do if we got into that
> situation was to safely not attempt to dereference a NULL pointer.
Weird, would love to know how the oi gets into that state though.
> I can't really answer this question. The person who wrote this code
> is 2+ years gone from the company, and the code was written just under
> 4 years ago now. There are no meaningful notes left for me to peruse
> at this point in time. I do know from what little documentation is
> there, that OSPF was experiencing some starvation because watchquagga
> was killing it and restarting it. This is one of the fixes that went
> in to fix this whole issue.
So, add batching, then have to tinker with hellos cause of the
batching..? How about, we just drop those patches?
>> Should this start by adding Piotr's patch first and then building on?
>> Also, does this reflect the comments on that patch set when submitted again
>> last year?
> These patches took Piotr's patch and reworked them, we credited him very
> clearly as the original author. Additionally Piotr's code didn't patch
> cleanly at the point in time due to the massive changes already in place
> for bgp and zebra. I see no resubmittal of Piotr's patch set last year and
> I don't see any CR comments for this section of code, what am I missing?
I'll send you a link to the thread in a couple of weeks so. Was late
last year I think.
It'd probably be preferable to have it as his patch + then Cumulus
>> ospfd: 16.0 rfc2328 compliance
>> What was the motivation for this?
> We had/have(tense? oh I hate thee) customers seeing loops without this
>> RFC3137 and 6987 both allow for routers to treat "infinite" cost links as
>> infinite, and the intent of the stub-router support definitely is to wholly
>> prevent any infinite cost link being used for transit. Quagga is aggressive
>> on that, but not incorrect in terms of those RFCs.
> Without this patch when I set 'max-metric router-lsa administrative' on a
> switch, it is no longer considered as part of the spf calculation for
> Suppose I have
> source of traffic -----A ---- B ---- C ---- Network I want to talk to.
> A is running fastpath, B and C are running Quagga ospf. B has a default
> route towards A for traffic.
> If on 'C' I do 'max-metric router-lsa'. This causes the links cost to be
> infinite. On B without this patch the links are ignored
As intended, yes.
> and the path to 'Network I want to talk to' is dropped, leaving a
> default route back to A, which the packet goes back out on. A picks
> it up and sends it right back to B. We have achieved a loopage.
> There is nothing in the RFC's that specify that we should not use a
> max-metric link, while in Quagga we don't consider it.
As I said, Quagga is aggressive about not using a stub-router as a
transit router. The RFC is pretty clear that that is an allowed outcome,
and indeed the "desired" one (as the RFC seems to put it).
" In some situations, it may be advantageous to inform routers in a
network not to use a specific router as a transit point but to still
route to it.
destined for the networks and reachable through such a stub router
may still be routed through it."
(Note "may" in the last sentence).
" If the path through the stub router is the only one, the routers of
the first type _will not use the stub router for transit_ (which is
_the desired behavior_), while the routers of the second type will
still use this path."
Added emphasis (via underscores) is mine.
The intent of stub-router support is to let a router 'exit' from the
OSPF domain gracefully - by updating its OSPF advertisement to /stop/
other routers sending /any/ packets toward it for transit, while still
maintaining adjacencies so that it can still calculate valid routes to
forward any packets that happen to still arrive (i.e. cause they were
sent before other routers were able to react); but keeping connectivity
to itself intact.
It seems to me it was intended to be fairly equivalent to shutting down
the router, as far as OSPF transit routing goes. If you set your only
router to a destination to be a "non-transit" "stub router" then, going
just by that language, yes, you should expect routing to that
destination is likely no longer to work.
If you don't want to discourage /all/ traffic, I'd have thought you
really would have wanted to set a metric *less than* infinite
("infinite" cost seems pretty clear to me). That starts to get into
traffic-engineering, and some kind of global, router-level "Set X, in:
set all links to min(max, link cost + X)" knob may be more what people
If there are people who thought "stub-router" didn't fully mean that,
that's interesting. Wasn't how I read the RFC or how I would have
interpreted an "infinite" cost link (the original OSPF RFCs interpreted
the same but later removed that requirement, I see). Given existing
Quagga OSPF, the best way to get that relaxed sense would be to use
0xfffe, or 0xffff - some X, as the cost.
Whether to take that patch as is, or whether to have a 'strict'
max-metric, and also a more relaxed version, hmmm. (Even as is, the
commit message language should be changed).
> Was the motivation keeping the router accessible, even if the
> management IP was also OSPF routed?
That should already be the case, certainly for the RID. If other local
IPs aren't, then maybe there's an issue in a later stage of the routing
calc where the router's stub links are added.
> If you've queried this patch a few times, I've missed it. I have 100
> patches I'm trying to get through the process. It was not my
> intention to miss anything.
I know. :)
Though, generally, if things are or have been inefficient on side of the
patch review process, it doesn't help if the queries or comments that
are made then also just get dropped on the other side.
> I think we've had this discussion... I'm pretty sure you said that you
> would submit a RFC to get this handled?
That was for the decision order change thing, i.e. applies to:
> It's a knob that someone can turn on/off if they need it.
It's more a pin on what admins may not realise is a hand grenade. BGP
already has enough hidden-danger areas, and a plethora of config
options. I'd like to try aim to make BGP - at least in Quagga - a bit
more forgiving and easier to administer. I don't want to seed it with
further config land-mines anyway.
MED similarly is a hand grenade.
And yes, with Det.MED and every-neighbour-AS-best-path-TX Add-Path you
can apply sufficient thrust to make MED sort of fly, at least in the
cases we've thought of so far. However, it still doesn't fix the core
underlying brokenness of MED, and it can still cause breakage - e.g. if
an admin doesn't run Add-Path/best-every-AS everywhere, or some other
scenario comes up that we hadn't anticipated perhaps in future BGP
extensions. Also, D.MED and every-neighbour-as-best-tx both add
significant additional processing overheads to BGP (and a cost in code
complexity). Tx every-neighbour-AS-best-path further adds significant
communication overheads to BGP - and BGP already is quite an expensive
and high-overhead protocol in terms of communication (compared to
link-state anyway, which is O(1) messages per link for any link-event;
path-vector is something like O(# simple-paths between link-event and
receiver), perhaps more, for worst case).
It'd be far far more performant to just fix the underlying BGP metric
issues and go make BGP safe and fast. It might just be better to go do
that, rather than try workaround the issues of a very problematic part
of BGP with a series of ever more costly changes to BGP.
Why not aim for simpler, safer, faster BGP?
At a minimum, that max-MED patch should sort after Add-Path +
Paul Jakma paul at jakma.org @pjakma Key ID: 64A2FF6A
If A equals success, then the formula is A = X + Y + Z.
X is work. Y is play. Z is keep your mouth shut.
-- Albert Einstein
More information about the Quagga-dev