[quagga-dev 8240] Re: Taking down a session when an error is detected

paul at jakma.org paul at jakma.org
Wed Sep 8 03:35:20 BST 2010


On Mon, 6 Sep 2010, Chris Hall wrote:

> Though there may be a fine line between Corruption and Nonsense...

> The RFC4893-bis Draft is extremely forgiving...

Sure.

Let's rewind a bit though: The original problem was of internally 
badly-constructed attributes (i.e. the general structure is a valid 
attribute, but the attribute specific meaning is invalid) being 
'tunneled' across sessions through routers that don't understand the 
attribute leading to a session being dropped later, at the first 
point where a router recognises the attribute. The sending router in 
that case of course had nothing to do with misconstructing the 
attribute and it seems wrong to reset that session.

The reason for this revisal of BGP started with AS4_PATH, and bugs on 
the net leading to sessions being reset far from the actual cause. 
I'm suggesting people working on bgpd should try apply this principle 
generally, where the cause of the malformed message need /not/ be the 
neighbouring router.

If I suggested that this should also be applied where the neighbour 
itself is generating bad messages, then apologies - that's definitely 
wrong :). That should default to reset, as per Nick. Though, if there 
are people who want very accepting bgpd, that perhaps should be an 
option, if someone wants to do that work.

:)

> ...if an AS4 Path contains unknown segment types, the Draft says the
> AS4 Path MUST be ignored.
>
> ...if an AS4 Path and its respective AS2 Path are inconsistent, the
> RFC and Draft say that the AS4 Path MUST be ignored.

I have to say, I disagree with the RFC, particularly on the one 
immediately above. The set of ASes through which the path has passed 
can only be determined by the union of the AS_PATH and AS4_PATH.

In case of an error, I'd rather be conservative and still apply 
loop-detection to that union. The worst case is that some of the ASes 
in that union are rubbish, however it's likely to include more valid 
ASes than just the AS_PATH.

If you only use AS_PATH, then your loop detection likely will be crap 
(particularly if AS_PATH doesn't include AS_TRANS - e.g. cause some 
non-AS4 peer rewrote the AS_PATH significantly).

> End-of-RIB is a better mechanism than jacking up the MRAI -- and
> definitely avoids flapping at other peers.

It definitely would be nice to draw up some kind of auto-resync 
mechanism using EoR.

> My feeling is that the sooner an untrustworthy peer is shut down, the
> better

Fully ACK. Sorry if I suggested otherwise. The big concern here is 
where the untrustworthy peer is not your neighbour - your neighbour 
is just passing on.

> RFC4724 discusses how to use a "Graceful Restart Capability" with 
> no AFI/SAFI to signal that an End-of-RIB will be sent.  You're 
> looking for something beyond that ?

Maybe. Don't have time to look into this right now though ;)

regards,
-- 
Paul Jakma	paul at jakma.org	Key ID: 64A2FF6A
Fortune:
Support Mental Health.  Or I'll kill you.



More information about the Quagga-dev mailing list