[quagga-dev 8178] Re: [PATCH] bgpd: fix bgp_node locking issues

Balaji G balajig81 at gmail.com
Wed Aug 25 15:30:45 BST 2010


> Where is your branch located (sorry if this is a daft quesion) but seeing
I need a release with this patch applied ....

 http://github.com/balajig/quagga-next


<http://github.com/balajig/quagga-next>Some how it makes me feel whether my
efforts are going void :(

Cheers,
  - Balaji

<http://github.com/balajig/quagga-next>

On Wed, Aug 25, 2010 at 7:50 PM, Richard J Palmer <richard at merula.net>wrote:

>  Ah fun...
>
> Where is your branch located (sorry if this is a daft quesion) but seeing
> I need a release with this patch applied ....
>
>
> On 25/08/2010 15:20, Balaji G wrote:
>
> Hi Richard
>
>  I am maintaining a separate branch in which lot of patches are getting
> merged and once Paul is back he could pull it from this location and hence i
> have applied your patch to my branch which i maintain. I am surprised that a
> new version has been released without merging these fixes so it makes me
> think whether what i maintain is useless ?
>
>  Cheers,
>   - Balaji
>
> On Wed, Aug 25, 2010 at 2:24 PM, Richard J Palmer <richard at merula.net>wrote:
>
>>  Can you confirm if this is in 0.99.17 ?
>>
>> I don't see this patch listed on the changelog
>>
>> Thanks
>>
>> Richard
>>
>>
>> On 28/07/2010 04:45, Balaji G wrote:
>>
>>  Hi Chris
>>
>> Applied. Thanks
>>
>> Thanks,
>> Cheers,
>>   - Balaji
>>
>> On Tue, Jul 27, 2010 at 9:58 PM, Chris Caputo <ccaputo at alt.net> wrote:
>>
>>> Balaji, please consider this for quagga-next.
>>>
>>> Thanks,
>>> Chris
>>>
>>> ---------- Forwarded message ----------
>>> Date: Sun, 25 Apr 2010 20:58:27 +0000 (UTC)
>>> From: Chris Caputo <ccaputo at alt.net>
>>> To: quagga-dev at lists.quagga.net, Denis Ovsienko <infrastation at yandex.ru>
>>> Cc: Richard Palmer <richard at merula.net>
>>> Subject: [quagga-dev 7960] Re: [PATCH] bgpd: fix bgp_node locking issues
>>>
>>> I believe the below patch is ready for inclusion.
>>>
>>> Richard Palmer has been running it for almost 2 months now and it fixed
>>> the assert problem he was experiencing.
>>>
>>> Thanks,
>>> Chris
>>>
>>> ---------- Forwarded message ----------
>>> Date: Sat, 27 Feb 2010 04:45:29 +0000 (UTC)
>>> From: Chris Caputo <ccaputo at alt.net>
>>> To: Richard Palmer <richard at merula.net>
>>> Cc: quagga-dev at lists.quagga.net
>>> Subject: [quagga-dev 7809] [PATCH] bgpd: fix bgp_node locking issues
>>>
>>> On Wed, 16 Dec 2009, Richard Palmer wrote:
>>> > i've just had another crash in BGPd same as all previous:
>>> >
>>> > 2009/12/15 11:01:59 BGP: 195.66.224.250 [Error] bgp_read_packet error:
>>> Connection reset by peer
>>> > 2009/12/15 11:04:14 BGP: Assertion `node->lock > 0' failed in file
>>> bgp_table.c, line 252, function bgp_unlock_node
>>> > 2009/12/15 11:04:14 BGP: Backtrace for 12 stack frames:
>>> > 2009/12/15 11:04:14 BGP: [bt 0]
>>> /usr/local/quagga/lib/libzebra.so.0(zlog_backtrace+0x1f) [0x1571c5]
>>> > 2009/12/15 11:04:14 BGP: [bt 1]
>>> /usr/local/quagga/lib/libzebra.so.0(_zlog_assert_failed+0x99) [0x157322]
>>> > 2009/12/15 11:04:14 BGP: [bt 2] ./bgpd [0xc53dc7]
>>> > 2009/12/15 11:04:14 BGP: [bt 3] ./bgpd [0xc525d4]
>>> > 2009/12/15 11:04:14 BGP: [bt 4] ./bgpd [0xc41e85]
>>> > 2009/12/15 11:04:14 BGP: [bt 5] ./bgpd [0xc42937]
>>> > 2009/12/15 11:04:14 BGP: [bt 6] ./bgpd [0xc490a3]
>>> > 2009/12/15 11:04:14 BGP: [bt 7] ./bgpd(bgp_read+0xa72) [0xc4aa1d]
>>> > 2009/12/15 11:04:14 BGP: [bt 8]
>>> /usr/local/quagga/lib/libzebra.so.0(thread_call+0x67) [0x14b799]
>>> > 2009/12/15 11:04:14 BGP: [bt 9] ./bgpd(main+0x418) [0xc230e1]
>>> > 2009/12/15 11:04:14 BGP: [bt 10] /lib/libc.so.6(__libc_start_main+0xe6)
>>> [0x18b5d6]
>>> > 2009/12/15 11:04:14 BGP: [bt 11] ./bgpd [0xc22bb1]
>>> >
>>> > We have had several of these :(
>>> >
>>> > Any thoughts on how we can fix this ?
>>>
>>> Richard,
>>>
>>> I've worked up a fix the crash you have been experiencing.  Please apply
>>> this patch to your 0.99.15 install and report any problems or success.
>>> Feel free to keep using the diagnostic code I sent, so you can check your
>>> BGP log for the overflow indication we were watching for previously.  Or
>>> feel free to apply this to a virgin 0.99.15 codebase.
>>>
>>> All,
>>>
>>> I've been working with Richard on his crashing issue.  The problem turned
>>> out to be connected table locks being locked but not unlocked, such that
>>> eventually a lock would exceed 2^31 and become negative, thus triggering
>>> an assert later on.
>>>
>>> Patch below...
>>>
>>> Thanks,
>>> Chris
>>>
>>> ---
>>> [PATCH] bgpd: fix bgp_node locking issues
>>>
>>> This patch fixes bugs with respect to bgp_node locking.  Also included
>>> are
>>> improvements to alloc/free & cleanup code.
>>>
>>> Thanks to Richard Palmer for assistance in narrowing down the locking
>>> problem on his system, which was regularly experiencing a lock count
>>> exceeding 2^31 and becoming negative.
>>>
>>> bgpd/bgp_main.c:
>>>  - Improve bgp_exit() cleanup code to enable developers to better detect
>>>   leaks.
>>>
>>> bgpd/bgp_nexthop.c:
>>>  - Correct use of bgp_node_match() and bgp_node_lookup() by calling
>>>   bgp_unlock_node() as appropriate.
>>>  - Track bgp_connected_ref allocations.
>>>  - Fix cleanup code by releasing nexthop cache in bgp_scan_finish().
>>>
>>> bgpd/bgp_route.c:
>>>  - Correct use of bgp_node_match() by calling bgp_unlock_node() as
>>>   appropriate.
>>>
>>> lib/memtypes.c:
>>>  - Add MTYPE_BGP_CONN.
>>>
>>> Signed-off-by: Chris Caputo <ccaputo at alt.net>
>>> ---
>>>  bgpd/bgp_main.c    |   10 ++++++-
>>>  bgpd/bgp_nexthop.c |   22 ++++++++++++---
>>>  bgpd/bgp_route.c   |   77
>>> +++++++++++++++++++++++++++++++---------------------
>>>  lib/memtypes.c     |    1 +
>>>  4 files changed, 74 insertions(+), 36 deletions(-)
>>>
>>> diff --git a/bgpd/bgp_main.c b/bgpd/bgp_main.c
>>> index 9d14683..1a460c6 100644
>>> --- a/bgpd/bgp_main.c
>>> +++ b/bgpd/bgp_main.c
>>> @@ -243,7 +243,15 @@ bgp_exit (int status)
>>>   if (retain_mode)
>>>     if_add_hook (IF_DELETE_HOOK, NULL);
>>>   for (ALL_LIST_ELEMENTS (iflist, node, nnode, ifp))
>>> -    if_delete (ifp);
>>> +    {
>>> +      struct listnode *c_node, *c_nnode;
>>> +      struct connected *c;
>>> +
>>> +      for (ALL_LIST_ELEMENTS (ifp->connected, c_node, c_nnode, c))
>>> +        bgp_connected_delete (c);
>>> +
>>> +      if_delete (ifp);
>>> +    }
>>>   list_free (iflist);
>>>
>>>   /* reverse bgp_attr_init */
>>> diff --git a/bgpd/bgp_nexthop.c b/bgpd/bgp_nexthop.c
>>> index 0cde665..719cb96 100644
>>> --- a/bgpd/bgp_nexthop.c
>>> +++ b/bgpd/bgp_nexthop.c
>>> @@ -276,6 +276,8 @@ bgp_nexthop_lookup_ipv6 (struct peer *peer, struct
>>> bgp_info *ri, int *changed,
>>>
>>>                  if (bnc->metric != oldbnc->metric)
>>>                    bnc->metricchanged = 1;
>>> +
>>> +                  bgp_unlock_node (oldrn);
>>>                }
>>>            }
>>>        }
>>> @@ -365,6 +367,8 @@ bgp_nexthop_lookup (afi_t afi, struct peer *peer,
>>> struct bgp_info *ri,
>>>
>>>                  if (bnc->metric != oldbnc->metric)
>>>                    bnc->metricchanged = 1;
>>> +
>>> +                  bgp_unlock_node (oldrn);
>>>                }
>>>            }
>>>        }
>>> @@ -571,7 +575,7 @@ bgp_connected_add (struct connected *ifc)
>>>        }
>>>       else
>>>        {
>>> -         bc = XCALLOC (0, sizeof (struct bgp_connected_ref));
>>> +         bc = XCALLOC (MTYPE_BGP_CONN, sizeof (struct
>>> bgp_connected_ref));
>>>          bc->refcnt = 1;
>>>          rn->info = bc;
>>>        }
>>> @@ -596,7 +600,7 @@ bgp_connected_add (struct connected *ifc)
>>>        }
>>>       else
>>>        {
>>> -         bc = XCALLOC (0, sizeof (struct bgp_connected_ref));
>>> +         bc = XCALLOC (MTYPE_BGP_CONN, sizeof (struct
>>> bgp_connected_ref));
>>>          bc->refcnt = 1;
>>>          rn->info = bc;
>>>        }
>>> @@ -636,7 +640,7 @@ bgp_connected_delete (struct connected *ifc)
>>>       bc->refcnt--;
>>>       if (bc->refcnt == 0)
>>>        {
>>> -         XFREE (0, bc);
>>> +         XFREE (MTYPE_BGP_CONN, bc);
>>>          rn->info = NULL;
>>>        }
>>>       bgp_unlock_node (rn);
>>> @@ -662,7 +666,7 @@ bgp_connected_delete (struct connected *ifc)
>>>       bc->refcnt--;
>>>       if (bc->refcnt == 0)
>>>        {
>>> -         XFREE (0, bc);
>>> +         XFREE (MTYPE_BGP_CONN, bc);
>>>          rn->info = NULL;
>>>        }
>>>       bgp_unlock_node (rn);
>>> @@ -1136,11 +1140,15 @@ bgp_multiaccess_check_v4 (struct in_addr nexthop,
>>> char *peer)
>>>   rn1 = bgp_node_match (bgp_connected_table[AFI_IP], &p1);
>>>   if (! rn1)
>>>     return 0;
>>> +  bgp_unlock_node (rn1);
>>>
>>>   rn2 = bgp_node_match (bgp_connected_table[AFI_IP], &p2);
>>>   if (! rn2)
>>>     return 0;
>>> +  bgp_unlock_node (rn2);
>>>
>>> +  /* This is safe, even with above unlocks, since we are just
>>> +     comparing pointers to the objects, not the objects themselves. */
>>>   if (rn1 == rn2)
>>>     return 1;
>>>
>>> @@ -1316,6 +1324,9 @@ bgp_scan_init (void)
>>>  void
>>>  bgp_scan_finish (void)
>>>  {
>>> +  /* Only the current one needs to be reset. */
>>> +  bgp_nexthop_cache_reset (bgp_nexthop_cache_table[AFI_IP]);
>>> +
>>>   bgp_table_unlock (cache1_table[AFI_IP]);
>>>   cache1_table[AFI_IP] = NULL;
>>>
>>> @@ -1326,6 +1337,9 @@ bgp_scan_finish (void)
>>>   bgp_connected_table[AFI_IP] = NULL;
>>>
>>>  #ifdef HAVE_IPV6
>>> +  /* Only the current one needs to be reset. */
>>> +  bgp_nexthop_cache_reset (bgp_nexthop_cache_table[AFI_IP6]);
>>> +
>>>   bgp_table_unlock (cache1_table[AFI_IP6]);
>>>   cache1_table[AFI_IP6] = NULL;
>>>
>>> diff --git a/bgpd/bgp_route.c b/bgpd/bgp_route.c
>>> index a92ca4e..617f150 100644
>>> --- a/bgpd/bgp_route.c
>>> +++ b/bgpd/bgp_route.c
>>> @@ -6554,7 +6554,10 @@ bgp_show_route_in_table (struct vty *vty, struct
>>> bgp *bgp,
>>>               if ((rm = bgp_node_match (table, &match)) != NULL)
>>>                 {
>>>                   if (prefix_check && rm->p.prefixlen != match.prefixlen)
>>> -                    continue;
>>> +                    {
>>> +                      bgp_unlock_node (rm);
>>> +                      continue;
>>> +                    }
>>>
>>>                   for (ri = rm->info; ri; ri = ri->next)
>>>                     {
>>> @@ -6568,6 +6571,8 @@ bgp_show_route_in_table (struct vty *vty, struct
>>> bgp *bgp,
>>>                       display++;
>>>                       route_vty_out_detail (vty, bgp, &rm->p, ri, AFI_IP,
>>> SAFI_MPLS_VPN);
>>>                     }
>>> +
>>> +                  bgp_unlock_node (rm);
>>>                 }
>>>             }
>>>         }
>>> @@ -6591,6 +6596,8 @@ bgp_show_route_in_table (struct vty *vty, struct
>>> bgp *bgp,
>>>                   route_vty_out_detail (vty, bgp, &rn->p, ri, afi, safi);
>>>                 }
>>>             }
>>> +
>>> +          bgp_unlock_node (rn);
>>>         }
>>>     }
>>>
>>> @@ -11348,41 +11355,49 @@ bgp_clear_damp_route (struct vty *vty, const
>>> char *view_name,
>>>
>>>          if ((table = rn->info) != NULL)
>>>            if ((rm = bgp_node_match (table, &match)) != NULL)
>>> -             if (! prefix_check || rm->p.prefixlen == match.prefixlen)
>>> -               {
>>> -                 ri = rm->info;
>>> -                 while (ri)
>>> -                   {
>>> -                     if (ri->extra && ri->extra->damp_info)
>>> -                       {
>>> -                         ri_temp = ri->next;
>>> -                         bgp_damp_info_free (ri->extra->damp_info, 1);
>>> -                         ri = ri_temp;
>>> -                       }
>>> -                     else
>>> -                       ri = ri->next;
>>> -                   }
>>> -               }
>>> +              {
>>> +                if (! prefix_check || rm->p.prefixlen ==
>>> match.prefixlen)
>>> +                  {
>>> +                    ri = rm->info;
>>> +                    while (ri)
>>> +                      {
>>> +                        if (ri->extra && ri->extra->damp_info)
>>> +                          {
>>> +                            ri_temp = ri->next;
>>> +                            bgp_damp_info_free (ri->extra->damp_info,
>>> 1);
>>> +                            ri = ri_temp;
>>> +                          }
>>> +                        else
>>> +                          ri = ri->next;
>>> +                      }
>>> +                  }
>>> +
>>> +                bgp_unlock_node (rm);
>>> +              }
>>>         }
>>>     }
>>>   else
>>>     {
>>>       if ((rn = bgp_node_match (bgp->rib[afi][safi], &match)) != NULL)
>>> -       if (! prefix_check || rn->p.prefixlen == match.prefixlen)
>>> -         {
>>> -           ri = rn->info;
>>> -           while (ri)
>>> -             {
>>> -               if (ri->extra && ri->extra->damp_info)
>>> -                 {
>>> -                   ri_temp = ri->next;
>>> -                   bgp_damp_info_free (ri->extra->damp_info, 1);
>>> -                   ri = ri_temp;
>>> -                 }
>>> -               else
>>> -                 ri = ri->next;
>>> -             }
>>> -         }
>>> +        {
>>> +          if (! prefix_check || rn->p.prefixlen == match.prefixlen)
>>> +            {
>>> +              ri = rn->info;
>>> +              while (ri)
>>> +                {
>>> +                  if (ri->extra && ri->extra->damp_info)
>>> +                    {
>>> +                      ri_temp = ri->next;
>>> +                      bgp_damp_info_free (ri->extra->damp_info, 1);
>>> +                      ri = ri_temp;
>>> +                    }
>>> +                  else
>>> +                    ri = ri->next;
>>> +                }
>>> +            }
>>> +
>>> +          bgp_unlock_node (rn);
>>> +        }
>>>     }
>>>
>>>   return CMD_SUCCESS;
>>> diff --git a/lib/memtypes.c b/lib/memtypes.c
>>> index 05d9322..5902067 100644
>>> --- a/lib/memtypes.c
>>> +++ b/lib/memtypes.c
>>> @@ -108,6 +108,7 @@ struct memory_list memory_list_bgp[] =
>>>   { MTYPE_BGP_NODE,            "BGP node"                      },
>>>   { MTYPE_BGP_ROUTE,           "BGP route"                     },
>>>   { MTYPE_BGP_ROUTE_EXTRA,     "BGP ancillary route info"      },
>>> +  { MTYPE_BGP_CONN,            "BGP connected"                 },
>>>   { MTYPE_BGP_STATIC,          "BGP static"                    },
>>>   { MTYPE_BGP_ADVERTISE_ATTR,  "BGP adv attr"                  },
>>>   { MTYPE_BGP_ADVERTISE,       "BGP adv"                       },
>>>
>>
>>
>> _______________________________________________
>> Quagga-dev mailing listQuagga-dev at lists.quagga.nethttp://lists.quagga.net/mailman/listinfo/quagga-dev
>>
>>
>>
>> _______________________________________________
>> Quagga-dev mailing list
>> Quagga-dev at lists.quagga.net
>> http://lists.quagga.net/mailman/listinfo/quagga-dev
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.quagga.net/pipermail/quagga-dev/attachments/20100825/a5c0fb6d/attachment-0001.html>


More information about the Quagga-dev mailing list