[quagga-dev 12075] Re: Advise on implementation

Olivier Dugeon olivier.dugeon at orange.com
Mon Mar 2 14:56:35 GMT 2015

Hello David,

Thanks for your great advices.

Some answer in line.



Le 02/03/2015 08:18, David Lamparter a écrit :
> On Wed, Feb 18, 2015 at 12:58:22PM +0100, Olivier Dugeon wrote:
>> In complement to our TE works already submit, we would implement the BGP
>> Link State extension (see
>> https://datatracker.ietf.org/doc/draft-ietf-idr-ls-distribution). For
>> that purpose, we need inter-process communication with OSPFd and ISISd
>> process.
>> Same needs are also necessary to implement Path Computation
>> Element (PCE - RFC 4655). The primary goal is to exchange Database
>> contains, in particular OSPF LSA and IS-IS LSP including TE information.
> *sigh*  this has been coming for a long time - the IPC protocol between
> zebra and the daemons needed to be extended (or even overhauled?) for a
> long time.
> Let me try to pull together a list of things that can't be done with the
> current ZAPI socket protocol:
> - LS distribution & PCE
> - BFD peer status signalling & automatic session creation
> - exchanging MPLS labels
> - exchanging VPN route information (both intra- and inter-VRF)
> - matching on route properties from another daemon when redistributing
> - ... probably even more stuff I forgot
> Some of these can probably be added into the existing protocol, but in
> general what we have now can be described as anything but extensible.
> I'm not saying you need to support all of these - I'm saying we need to
> address extensibility.
Even if I just try to solve my current problem, I'm totally agree with you.
Adding such new communication between the various Quagga processes must 
be flexible and generic in order to
take into account further development and other way to use it.

It is exactly the spiritof my proposal and why I ask some advices. I 
would try to design(before developing) a generic
communication system that take into account most of the requirements and 
let it flexible for further development.

In parallel, digging around al possibility for IPC, in particular 
pthread mechanism, I discover that Quagga used its
own thread implementation instead of pthread. I don't know the complete 
history of Zebra/Quagga, but I suppose
that when first code was written pthread was not supported by majority 
of system. So, if we go to a different
system for communication between Quagga process, perhaps it is also the 
time to re-think the thread mechanism,
unless there is a valid reason (that I ignore, apologize) to keepit.
>> 1/ Extend Zebra protocol. Vincent Jardin already point me that it is not
>> a good option as the Zebra protocol, and Zebra daemon are heavy
>> solicited for VPN and adding more traffic will have a bad effect on
>> performance. But, as it will used in a particular case, perhaps it is
>> not an issue.
>> 2/ Move OSPF and ISIS database from user space to Shared Memory space.
>> Such architecture let others process / thread access to the database in
>> read_only mode, but what will be the impact in term of performance,
>> especially with large database ? In addition, it not gives the
>> possibility to send some commands to other process like the OSPF_API do.
>> 3/ Implement a dedicated bus/protocol similar to the Zebra one using
>> socket. Part of code could be reuse (coming from Zebra and OSPF_API),
>> but, like Zebra protocol, it uses intensively data copy in memory (at
>> least 4 to transfer a message to one process). Again, with large
>> database, there could be some issue with performance.
>> 4/ Implement a dedicated bus using Shared Memory and Semaphore/Mutex to
>> access the bus managing read/write mode. This option reduce the number
>> of time we copy data in memory (copy once, read multiple) but introduce
>> more complexity as we need to synchronise thread and process which could
>> be hard to debug. The objective is to add a dedicated thread per daemon
>> to manage the bus which will not disturb other thread in case of lock.
>> If it is powerful and provide good performance, it could be a candidate
>> to replace the Zebra communication based on socket to improve performance.
> There are 2 independent questions here:
> - should this be a separate communication channel or should it be
>    integrated with zebra communications?
> - what transport medium should this use, shm or socket?
> Your options match up mostly (though not exactly):
> 1) = "integrated, socket"
> 2) = "separate,   shm"
> 3) = "separate,   socket"
> 4) = "integrated, shm"
> I don't have a well-founded opinion on what to do (yet), though I'd like
> to make the following arguments:
> - shm is not neccessarily *noticably* faster than sockets.  Sure it
>    saves some copying and kernel calls, but if the overhead goes from 2%
>    to 1.5% you haven't won much.
> - shm should still use a well-isolated API/wrappers.  In fact I'd argue
>    the API should be the same between sockets or shm.  Accessing shm
>    directly without such wrappers is a recipe for crashes.
It is exactly my intention. I would write a ZBus library (zbus.c and 
zbus.h) that
offer a common API for all communication and that reuse as much as 
possible existing
code (e.g. stream API).
> - shm doesn't imply locking.  Particularly, RCU might help.
After looking to IPV literature, and in particular the reader/writer 
problem, I think
that our problem is quite different. The kind of communication is more a 
problem of
'write once' then 'read multiple' to transmit some information like we 
do with a
socket or message queuing. Using SHM means that we need to lock the shm 
before the
process start to write a new message and until we are sure that all 
readers consume
the message.Of course, we could use a 'write once' / 'read once' system, 
but we loose
the benefit that the message could be address to several process (e.g. 
zebra advertise
all process on a modification of interface parameters) avoiding to write 
the message
multiple. Making a parallel to network protocol, we need a multicast 
system.So, perhaps a multicast socket is a good approach.
> - socket protocols should probably use some "standard" external encoding
>    library, simply to be more usable from other programming languages.
> - I don't see much gain from forcing all communication through a single
>    point, but I do think we should use some uniform encoding & mechanism.
>    If you use shm-based messaging, we should probably use that
>    everywhere.  Same if you use protobuf over sockets, it should be
>    protobuf over sockets everywhere.
Yes of course. Currently ZEBRA API and OSPF API don't use the same 
semantic. I'm
in favour of a simple encoding schema based on TLV like routing protocol 
> NB: I'm not against SHM, but I do think SHM is more difficult to get
> right, and it's not an automatic performance win.  I did some thinking
> about a shared memory RCU-based replacement for ZAPI, but never had the
> time to try that.  It probably *does* help moving Quagga towards
> supporting multiple threads in the individual daemons.
>    [quote moved]
>> But, such exchange could be useful for other purpose like hot restart,
>> monitoring ... OSPF already provide such facility through the OSPF_API,
>> but it is dedicated to OSPFd only and we need to generalize it to other
>> Quagga daemon. From this API, we would take the capabilities to send
>> commands to a given process and get back some information, synchronously
>> (answer to the command) or asynchronously (LSA/LSP update).
>> We study several option for the implementation and would get some advise
>> from the community before really start coding. Up to now, we have
>> identify 4 options:
>> Option 1 and 2 have not our favour, but we are open to discussion. We
>> hesitate between option 3 and 4 and we appreciate greatly some advises
>> to help us making decision.
> To be honest, I think this will need to be "evaluated" instead of
> "decided".  Pick one, prototype implement it with the least effort
> possible and show it.  You will have gained some insights from
> implementing it, and we'll know how it performs...
Yes for sure. I'll try to design and test some ideas and submit them to 
the mailing list.
> ... ultimately, this may be something that needs doing by trial & error,
> I'm afraid.
> -David

More information about the Quagga-dev mailing list