[quagga-dev 12155] Re: [PATCH 3/4] lib: use trie structure for prefix list matching
equinox at opensourcerouting.org
Sun Apr 19 11:26:40 BST 2015
On Fri, Apr 17, 2015 at 06:24:55PM +0100, Paul Jakma wrote:
> On Mon, 13 Apr 2015, David Lamparter wrote:
> > Prefix lists were implemented with a simple linear list that is scanned
> > sequentially. This is, of course, extremely inefficient as it scales by
> > O(n). This patch adds a trie-ish data structure that allows quickly
> > descending based on the prefix.
> > Note that the trie structure used here is designed for real-world use,
> > hence it uses a relatively crude fixed-size bytewise table instead of
> > some fancy balancing scheme. It is quite cacheline efficient.
> > Using real-world routeserver prefix lists, matching against a fulltable
> > dump:
> > entries before after factor
> > 9103 63.8s .0124s 5142x
> > 772 4.52s .0101s 445.3x
> > 86 .445s .0098s 45.51x
> > 7 .0379s .0099s 3.834x
> > 2 .0136s .0095s 1.440x
> > 1 .0084s .0095s .879x
> Nice speedups.
> One thing, maybe stick the trie in a different file, and put a generic
> interface over it. We already have a binary trie in lib/table, be nice
> they use the same thing. Indeed, you could even use lib/table for this?
> Why not?
lib/table.c is not optimal because:
- it won't out of the box support multiple entry for a single prefix
(prefix lists can contain that with different le/ge values)
- lookups touch considerably more cache lines - one per prefix bit,
because it's a binary tree - which is the most expensive thing on
modern CPUs. The byte-level trie jumps, well, a byte at a time, with
little cache pollution. Even more, the list off the pointer in the
trie will only have one entry for "usual" prefix lists, meaning we
only need to grab 3 cachelines to get to a /16. lib/table.c will load
17 cachelines, an order of magnitude more...
(and, additionally, the working set of frequently-accessed data is
larger with lots of nodes being candidates to load; on the other hand
the cachelines for the bytewise approach contain neighboring
information too and are fewer in total...)
The structure will start behaving badly for prefix list entries longer
than the depth of the table - where it regresses into linked list - but
testing this against actual data indicates it's not a serious problem.
We could try hanging off a lib/table.c trie instead of the linked
(That said, I guess lib/table.c would still perform in under a second
for all cases in the above table.)
> Also, the bucketing thing in this trie is sort of like a fixed-bucket
> B-tree, but without balancing. Maybe one day it should become a b-tree.
Since this is fixed bytewise, the only thing to balance would be the
depth of the trie. We could look at that in a future patch.
Moving to a B-tree seems to introduce complexity while making memory
access efficiency worse. The beauty of this trie implementation is in
its simplicity and cacheline efficiency.
> And that raises the question, is this bucketed tree correct for plist
> addresses data? If so, should lib/table use this? If not, why is the
> distribution of plist prefixes different from route prefixes?
... I don't understand the correctness question, can you re-word?
I've put some thought into adapting something similar for lib/table.c;
however I haven't come to a satisfying idea yet. Either way the code
added here is relatively special-purpose and not all that long, so I
don't see a major advantage in splitting it off.
More information about the Quagga-dev