[quagga-dev 12155] Re: [PATCH 3/4] lib: use trie structure for prefix list matching

David Lamparter equinox at opensourcerouting.org
Sun Apr 19 11:26:40 BST 2015

On Fri, Apr 17, 2015 at 06:24:55PM +0100, Paul Jakma wrote:
> On Mon, 13 Apr 2015, David Lamparter wrote:
> > Prefix lists were implemented with a simple linear list that is scanned
> > sequentially.  This is, of course, extremely inefficient as it scales by
> > O(n).  This patch adds a trie-ish data structure that allows quickly
> > descending based on the prefix.
> >
> > Note that the trie structure used here is designed for real-world use,
> > hence it uses a relatively crude fixed-size bytewise table instead of
> > some fancy balancing scheme.  It is quite cacheline efficient.
> >
> > Using real-world routeserver prefix lists, matching against a fulltable
> > dump:
> >
> > entries   before     after    factor
> >   9103   63.8s      .0124s   5142x
> >    772    4.52s     .0101s    445.3x
> >     86     .445s    .0098s     45.51x
> >      7     .0379s   .0099s      3.834x
> >      2     .0136s   .0095s      1.440x
> >      1     .0084s   .0095s       .879x
> Nice speedups.
> One thing, maybe stick the trie in a different file, and put a generic 
> interface over it. We already have a binary trie in lib/table, be nice 
> they use the same thing. Indeed, you could even use lib/table for this? 
> Why not?

lib/table.c is not optimal because:
- it won't out of the box support multiple entry for a single prefix
  (prefix lists can contain that with different le/ge values)
- lookups touch considerably more cache lines - one per prefix bit,
  because it's a binary tree - which is the most expensive thing on
  modern CPUs.  The byte-level trie jumps, well, a byte at a time, with
  little cache pollution.  Even more, the list off the pointer in the
  trie will only have one entry for "usual" prefix lists, meaning we
  only need to grab 3 cachelines to get to a /16.  lib/table.c will load
  17 cachelines, an order of magnitude more...
  (and, additionally, the working set of frequently-accessed data is
  larger with lots of nodes being candidates to load; on the other hand
  the cachelines for the bytewise approach contain neighboring
  information too and are fewer in total...)

The structure will start behaving badly for prefix list entries longer
than the depth of the table - where it regresses into linked list - but
testing this against actual data indicates it's not a serious problem.
We could try hanging off a lib/table.c trie instead of the linked

(That said, I guess lib/table.c would still perform in under a second
for all cases in the above table.)

> Also, the bucketing thing in this trie is sort of like a fixed-bucket 
> B-tree, but without balancing. Maybe one day it should become a b-tree.

Since this is fixed bytewise, the only thing to balance would be the
depth of the trie.  We could look at that in a future patch.

Moving to a B-tree seems to introduce complexity while making memory
access efficiency worse.  The beauty of this trie implementation is in
its simplicity and cacheline efficiency.

> And that raises the question, is this bucketed tree correct for plist 
> addresses data? If so, should lib/table use this? If not, why is the 
> distribution of plist prefixes different from route prefixes?

... I don't understand the correctness question, can you re-word?

I've put some thought into adapting something similar for lib/table.c;
however I haven't come to a satisfying idea yet.  Either way the code
added here is relatively special-purpose and not all that long, so I
don't see a major advantage in splitting it off.


More information about the Quagga-dev mailing list