SWITCHlan Trouble Tickets

Transparency is very important to us. We therefore publish all trouble tickets about issues that have an impact on our customers. We show both currently open tickets as well as all closed tickets.

Ticket 20070510_1

Ticket Number: 20070510_1Ticket State: CLOSED
Ticket Opened: 2007-05-10 15:40Ticket Closed: 2007-06-17 18:46
Ticket Description: BGP issue on EPFL's primary link (faulty linecard)

Problem Description:

Instability of the BGP-4 routing session for EPFL's primary access. We suspect that the linecard to which EPFL is connected was defective. The linecard has been replaced.

Affected:

From 2007-06-14 11:28 until 2007-06-14 11:40
Impact: loss of redundancy
Sites/Services: IMD


From 2007-5-15 17:06 until 2007-5-16 08:42
Impact: no more redundancy
Sites/Services: EPFL


From 2007-5-15 16:48 until 2007-5-15 17:06
Impact: Partial loss of connectivity
Sites/Services: EPFL, IMD


From 2007-5-10 14:01 until 2007-5-14 17:03
Impact: no more redundancy
Sites/Services: EPFL


From 2007-5-10 13:37 until 2007-5-10 14:01
Impact: Partial loss of connectivity (brief outages of primary connection, leading to oscillation between primary and secondary connection)
Sites/Services: IMD

From 2007-5-10 13:37 until 2007-5-10 14:01
Impact: Partial loss of connectivity (brief outages of primary connection, leading to oscillation between primary and secondary connection)
Sites/Services: EPFL


Actions:


2007-06-17 18:47
Since the connection has been stable for three days, the ticket can be closed.

2007-06-14 12:04
Maintenance has been completed by moving EPFL's primary access to its original port, but on the new replacement card.

2007-06-14 11:26
We will now start to disconnect links from the old line card that will be replaced.

2007-06-14 11:15
Staff from SWITCH will visit EPFL on Thursday (starting around 11:30), and replace the presumed-faulty line card with the replacement card the vendor had sent us. There will be a short interruption on the primary links of both EPFL and IMD, as well as several backbone links (to Neuch√Ętel, Yverdon, Martigny and others). The switch should not take longer than fifteen minutes, during which traffic will flow over other links.

2007-05-21 12:06
Swapped EPFL's primary and secondary connection back to the normal setup (primary on swiEL2.switch.ch, secondary on swiLS2.switch.ch), but with the primary link connected to the temporary spare line card in swiEL2 provided by EPFL.

2007-05-21 10:50
Asked our Cisco service provider for a replacement of the 16*GigE line card to which EPFL used to be connected.

2007-05-16 16:17
In order to minimise risk during the coming long weekend, we swapped the primary and secondary connection. EPFL traffic will now use the swiLS2 router, rather than swiEL2, by default.

2007-05-16 08:42
EPFL kindly supplied a spare linecard for the SWITCH router. We have moved the EPFL primary access to a port on that spare card, and traffic is flowing over the primary access again.

2007-05-16 08:00
EPFL will provide a spare Gigabit Ethernet linecard and connect the primary SWITCH peering to an interface on that card.

2007-05-15 17:06
EPFL disabled the primary interface to stop the oscillations.

2007-05-15 16:48
After almost 24 hours of stable operation under significant load (> 100 Mb/s), problems started again. The primary peerings to IMD and EPFL both started to "flap".

2007-05-14 17:02
We re-enabled the primary BGP peering between SWITCH and EPFL. Traffic is flowing over the primary link again, and no issues have been observed so far.

2007-05-14 14:00
Extensive ICMP tests over the primary peering interface between SWITCH and EPFL. Not a single error was observed in over 6.5 million packets. The BGP peering is still inactive, but we will activate it at 17:00 to check whether the IOS upgrade has actually improved the situation.

2007-05-14 13:30
Upgraded router software (IOS) to latest rebuild, and rebooted to activate it. See also ticket 20070514_1 (http://www.switch.ch/network/noc/tts/?action=show&id=74).

2007-05-10 17:56
The BGP peering started to flap again. EPFL shut down the primary link again in order to stop the oscillations. We will investigate further what causes these problems - we suspect a software or hardware (linecard?) problem on our router swiEL2.

2007-05-10 17:33
We re-enabled the primary BGP peering between EPFL and SWITCH. At first things seemed to work fine...

2007-05-10 15:30
We enabled the physical connection between swiEL2 and EPFL's router again, but with BGP deactivated on both routers. The IMD peering is still stable.

2007-05-10 14:01
Georges Aubry administratively turned off the Interface on EPFL's router towards swiEL2.switch.ch. This disabled the primary BGP peering for EPFL. Since the shutdown, the BGP peering to IMD has become stable again.

2007-05-10 13:37
The BGP peerings from swiEL2 to EPFL and IMD became instable. Our router swiEL2 would terminate the connections because it failed to see "keepalive" packets from the EPFL/IMD peers. A while later the connection would be retried, only to be terminated again, and so on.


For all questions about this ticket, please send mail to noc@switch.ch
or call +41 44 268 15 30.