[quagga-dev 8911] Re: [PATCH] Zebra rib/fib to be in sinc with kernel routing table

James Carlson carlsonj at workingcode.com
Thu Oct 20 17:09:08 BST 2011


Lennart Sorensen wrote:
> Our solution has been that we patched the kernel to have a 'link_detect'
> flag on the interfaces and if it is enabled, the kernel removes the
> connect route on link down, and adds it back on link up.

Depending on what "removes" means, this could be a pretty serious
usability problem.

If "removes" means that it's actually removed from the kernel forwarding
table, and the user-level "route" command cannot access the route when
the interface is down, then I think that'd be a bad thing.

This means that the administrator cannot see what routes are actually on
the box if any interface is down.  The output from "netstat -nr" is a
lie, because it omits routes that are really there but that simply
cannot be seen at the moment.

Worse still, it means that the administrator cannot delete a route
that's "stuck" on a downed interface.  As far as the "route" command is
concerned, the route doesn't exist and thus can't be deleted.  But as
soon as the interface comes back up, it'll rise from the dead like an
extra in a Romero movie.

And it introduces some nasty edge cases into the kernel that have no
good solutions.  For example, suppose an interface goes down and takes a
route with it.  The user then adds a route that would have been rejected
as a duplicate if that route had been up ("SIOCADDRT: File exists") --
but it's accepted because the other route is currently missing.  The
interface comes back up.  Now what?  Who (if anyone) gets the error
about the conflicting route?  Are both installed in the table?  Is one
discarded silently?

> I know some kernel developers think it's a userspace problem to solve,
> but since the kernel creates the darn routes in the first place I consider
> it a kernel problem and solved it there.

As someone who has straddled that boundary for a long time, I think
there are really only a few sane ways forward:

1.  Routes "temporarily removed" by the kernel are made a first class
    object.  By this I mean that users can list, add, and delete these
    routes as necessary, without regard to the status of the interface.
    (One way to accomplish this would be to treat the underlying
    interface status as an immutable flag on the route, rather than
    removing/adding the route itself on link down/up.)

2.  Routes are made independent of link status.  If the underlying link
    is down, the route becomes an ICMP reject/destun path.  If you want
    something else, run a routing daemon that can add or delete routes
    as necessary.

3.  Routes are deleted when a link goes down, but do not come back.  If
    you have interfaces that can go down, and you have static routes,
    and want them to come back, you have to run something in user space
    (such as a routing daemon) that can do this.

Solaris did the "cache the routes for down interfaces in hidden memory"
trick for many years, and it was horrible for routing daemons to work
with.  I don't think it ever really worked quite right.

-- 
James Carlson         42.703N 71.076W         <carlsonj at workingcode.com>



More information about the Quagga-dev mailing list