[quagga-dev 4184] Re: possible ospfd bug: in state ExStart with routers that are neither DR nor BDR

Paul Jakma paul at clubi.ie
Thu Jun 29 07:54:10 BST 2006

On Wed, 28 Jun 2006, Andrew J. Schorr wrote:

> Yes, that's what I think.  We had turned off a T3 wan link to test 
> that the backup route (4 T1 circuits bonded with TEQL) was working 
> properly.  When we turned it back on, we started getting the 
> errors.


> for more debugging info.  But I imagine this will happen again (I 
> believe it has happened in the past), so I think we'll get more 
> chances to debug it.

In the absolute worst case, as a hack^defensive-programming, we can 
set a timer on entering ExStart to generate an event to clear it if 
it doesn't progress after some time, e.g. AdjOK? (which afaict would 
be enough for your case for ti77 to set the 76 and 75 adjacencies to 

> There are 3 hosts on each side of the T3 link, so that explains the 
> 3 AdjChg Down messages.  On one side we have ti74, ti75, & ti76, 
> and on the other ti77, ti78, & ti79.  So with the T3 bridge down, 
> the network is partitioned into two pieces.

> We then brought the T3 back up 34 seconds later, and here's what's 
> in the log:

> Note that, from ti77's perspective, the DR & BDR did not change 
> when the link came back up (still ti78 & ti79, which are on ti77's 
> side of the T3).  But the election on the other side (ti75's side) 
> did result in a change of DR & BDR.  When we took down the link, we 
> saw this on ti75:

Interesting, and yet it's ti77 which got confused.

> And when the link came back up, we have this in ti75's log:

What about ti77's view of what happened?

> It almost seems as if ti75 & ti77 moved into ExStart before the new 
> election was completed, and then ti77 failed to reset state 
> properly?  I say this because of this message:
> 2006/06/28 19:15:01 OSPF: AdjChg: Nbr on 
> eth0.19: ExStart -> 2-Way

Nah, that's perfect - exactly what should happen.

> So this suggests that ti77 got into state ExStart before ti75 regressed
> it to 2-Way...

The above line shows ti75 properly raising AdjOK? on 77 and setting 
it back to 2-way.

Likely 77 also had 75 in ExStart there, but for some reason it didn't 
do same. Do you have 77's view of what happened at this point?

