Ticket 20080926_2

Ticket Number: 20080926_2Ticket State: CLOSED
Ticket Opened: 2008-09-26 12:50Ticket Closed: 2008-10-03 15:51
Ticket Description: Internet access to University of Bern unstable

Problem Description:

CPU overload at the University of Berne's access router had caused some instability of routing protocols, which caused intermittent connectivity problems for the organizations connected to that router. The reason of the overload was a large amount of traffic with inadequate IP TTL.


From 2008-10-01 09:34 until 2008-10-01 11:34
Impact: Partial loss of connectivity
Sites/Services: UNIBE

From 2008-09-26 11:50 until 2008-09-26 15:30
Impact: Partial loss of connectivity
Sites/Services: UNIBE


2008-10-03 16:07
We provided some guidance to the Vetsuisse team about the sensible range for outgoing TTLs, taking into account possible re-routing in case of link failures in our backbone.

2008-10-02 09:06
The sending TTL on all Vetsuisse transmission equipment was increased by five hops, and we have received confirmation that this has in fact reestablished end-to-end audio/video communication.

2008-10-01 22:30
After investigation of some anomalous ICMP traffic flows that our security team had noticed earlier that day, we found that the reason for the router's CPU overload was that the IP TTL (time-to-live, actucally a hop-count limit) of the video traffic was not sufficient to reach the other endpoint. When the TTLs expired on UniBE's router, that router had to generate ICMP messages. Because of the high packet rate, this tended to consume all CPU resources. The Vetsuisse team was advised to increase the TTL. In addition, we performed some tests in our "lab" network and proposed a configuration (mls rate-limit) to increase routers' robustness against large amounts of expiring-TTL traffic.

2008-09-29 17:10
Sent Uni Bern and Uni Zurich some suggestions for improving robustness against these types of traffic floods, after some experimentation with similar traffic loads against a similar (test) router with different configurations for control-plane protection.

2008-09-26 15:30
Investigations at Uni Bern and Uni Zurich showed that the high traffic is caused by the Vetsuisse project. The Zurich endside was sending traffic, but the receiver in Bern wasn't enabled. The traffic was discarded by UniBE's access router causing an high load at process switching level.

The sender at UZH has been disabled by UZH network staff. Connectivity to Uni Bern is now stable.

