[quagga-dev 11576] Re: [PATCH] BGP: add aspath_aggregate_mpath that preserves path length

Boian Bonev bbonev at ipacct.com
Sat Oct 11 06:27:04 BST 2014

Hi Paul,

> This change is only relevant when "bgp bestpath as-path multipath-relax"
>> is active; else it behaves exactly like the old code.
>> The idea have been proposed long ago by Paul Jakma (
>> http://patchwork.quagga.net/patch/417/).
>> The RFC describes path aggregation in terms of few constraints and this
>> way does not violate them. The current way of creating aggregate path may
>> produce shorter path (in case of multipath-relax) which violates the length
>> constraint from the RFC. See the example in the commit message.
> I find it interesting you're working on this. I'm wondering why you're
> looking at this, what problems you've had and why you're going with this
> ahead of, e.g., Add-Path.

I have started fiddling around BGP ECMP long time ago because of a quite
non-trivial real life problem - balancing two non-diverse upstream ISPs
without having to manually set path preference by individual AS (for all
live ASes). By diverse I mean offering different best paths.

In many cases it is common that all upstream tier2 ISPs will have direct
connection to the same set of tier1s thus advertising the same set of
bestpaths. This may easily go over 50% of all routes. If BGP is setup to
select between peers on itself (without any forced preference) then most of
the outgoing bandwidth will go to the longest established session, most
probably saturating this link while the others are mostly idle. If that
session flaps, then it will swap with next session, but the overall result
will be the same.

Without ECMP the most trivial way is to set different preference to two
upstreams by dividing originating AS e.g. to odd/even. Then you will
discover that odd ASes generate nearly 2 times more traffic than even ones
which is quite odd by itself. Trying to keep the scales near flat will lead
to a day to day increasing preference lists that will become
unmanageable soon.

ECMP solves that issue quite well with the exception of some CDNs that
utilize tcp to anycasted address space, where a Linux FIB with route cache
that causes frequent nexthop selection flapping will lead to frequent
connection resets. More recent Linux kernels that do not use route cache
may be made to do nexthop selection in a deterministic way...

> ---
> Some background:
> I did some work on a BGP ECMP multipath draft with Manav Bhatia and Joel
> Halpern, which took the approach of providing an extended NEXT_HOP
> attribute, along with specifying how multiple UPDATEs could carry and
> withdraw an ECMP route. See
> https://tools.ietf.org/html/draft-bhatia-ecmp-routes-in-bgp-02
> https://tools.ietf.org/html/draft-bhatia-bgp-multiple-next-hops-01
> For interoperation with non-ECMP capable peers, we specified how to
> combine AS_PATHs using AS_SETs to allow an ECMP route to be advertise to a
> non-ECMP-capble peer, such that the the length and ASN information (though
> obv not the exact sequence) across all the constituent routes of the ECMP
> route could still be passed onto the ECMP peer.
> With respect to how to advertise ECMP routes, IDR went with a different
> approach, the "Add-Path" approach, that left internal UPDATE semantics
> mostly unchanged, instead adding an opaque identifier to the NLRI to allow
> multiple UPDATEs to be considered as a collective set of paths for an NLRI.
> The base IDR proposal unfortunately didn't cover some key details, like
> how to choose this identifier, or what constitutes a group of paths that
> can be advertised together. Details that are critical to interoperability.
> It also didn't consider how this extension might work once non-ADD_PATH
> peers are involved.
> This choice was I think deliberate, as I think the intention was that
> Add-Path would be a bit of a swiss-army knife and cover any use-case for
> advertising groups of paths. The bhatia draft was more focused on solving
> ECMP (and hence RR-route-oscilllations) in a simple and interoperable way,
> while still being applicable to route-server use (these are the 2 most
> important use cases, perhaps the only notable ones).
> I'm not sure what state things are in now on ECMP and multiple-paths in
> BGP. I havn't kept up with things the last few years.
> I see there is a longer guidelines draft now. At a glance it still looks a
> bit complicated, and it still doesn't look like it's easy for any two
> Add-Path supporting implementations to be able to tell if they actually are
> interoperable (?). There have also been academic studies on how to best
> choose Add-Path identifiers I think. I don't know what deployment Add-Path
> has to date either - as I havn't followed.
> Interestingly, I still don't see either of the Add-Path drafts addressing
> how to advertise an ECMP route to non-Add-Path peers in a way that
> preserves AS_PATH properties such that routing loops are avoided.
> It's worth noting that IDR has also deprecated the aggregation features in
> BGP, including AS_SET. If AS_PATH combination using AS_SET is needed for
> ECMP-non-ECMP interoperability, then such deprecation is problematic.
> However, I couldn't get IDR to see merit in that point.
> ---
> So, what's interesting here is that people, such as yourself, as still
> working on BGP multi-path, apparently outside of Add-Path. I'm curious why
> that is? I know a few years ago there were issues still to be worked out
> with Add-Path, and I'm curious if these are still at play?

Add-Path does not solve my problem and I even haven't looked further into
that before. Now have got much broader view thanks to your post.

It looks interesting to me in the route-server case where (EC)MP may get
propagated from an IXP to more than one router in an AS so each one can
make different decision based on its policy/proximity/etc. But then what
about the route-map syntax that can match and manipulate multiple paths...

I believe that the interop with non-ECMP speakers will be important at
least until IPv6 gets wider adoption, so most of the routers will have to
be changed/upgraded anyways and will hopefuly have some Add-Path support as
a side effect :)

What are those issues? Do we need to go back to IDR and try work those out?
> Is it just Add-Path interoperability that needs to be fixed (e.g. adding
> capabilities)? ??
> I think there at least one of the 2 Add-Path draft's authors might be
> active on this list too (even Cc'ed ;) ), if they know. ?
> It's kind of sad that, more than 10 years since Manav's draft, and 6 odd
> since the Add-Path, that it seems there still isn't any (AFAIK) easy,
> interoperable solution to solve the iBGP route-reflector oscillation
> problems, or eBGP route-server scaling issues.
> I'd be happy to get stuck in to help with fixing that.

Unfortunately I cannot give any input on the Add-Path/interop/IDR topics...

I have hit the scaling issues and can say that for current Q they are
present in non route-server setups as well - just put 6 full table
upstreams, 4 sessions with 200k routes and 50 downlinks most of which get
only default route and the only way to make it tick is to enable one
session every 10 minutes. Afterwards if an upstream flaps, all sessions
will start oscillating in established-timeout cycle because processing will
take more than 5 minutes and more sessions will flap causing others to flap
too... In a route server scenario it is much worse.

I have also looked into the Euro-IX branch (and have successfuly rebased
some patches) which mitigates some of these scalability issues, but the
differences from the main Q tree are so severe and unsplittable that I see
no easy way for a clean merge process without significant refactoring.

With best regards,
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quagga.net/pipermail/quagga-dev/attachments/20141011/39ba3c58/attachment-0001.html>

More information about the Quagga-dev mailing list