FOSDEM 2017: a view from the NOC

FOSDEM 2017 was again a great success. We did a bit less analysis compared to 2016, but the numbers we got indicate the number of visitors grew significantly compared to last year: the total number of unique MAC addresses went from 9711 to a stunning 11918, an increase of 22.7%.

The number of mobile devices, a more accurate indication of the number of visitors, also went up. For Android, the number of unique MAC addresses went from 3892 to 4640 (+19.2%) and for iOS from 1060 to 2579 (+143.3%).

As in the past years we had a IPv6-only main network and a dual-stack legacy network for the people who needed it. The SSID of the dual-stack network was changed to encourage visitors to try the IPv6-only network. This seems to have worked as the IPv6-only network was used more to connect to IPv4-only hosts compared to the previous edition: this NAT64 traffic went from 6.1 million sessions in 2016 to 10.1 million in 2017 (+65%).

The traffic towards the internet rose from a mere 2982 million packets and 979.8 GB to 7924 million packets and 9.321 TB of traffic (+65% and +851%). From the internet, we received 2621 million packets and 2.912 TB of traffic in 2016, in 2017 it was 3620 million packets and 2.733 TB (+38% and -6.14%).

Most of this increase in outgoing traffic was due to the amount of traffic the Video team were pushing. They report: The video team pushed ~288 GB over the internet to the primary restreamer, the same amount to the backup one, and 7.1 TB (sustained 300 Mbps) to the small monitoring/control host that generated the thumbnails used in the control of the video mixer. This probably makes us the biggest user of the internet connection 🙂.

In fact, they were pushing too much traffic. We had not planned for this increase in traffic and the switches we used for the last few years were reaching their limits. We noticed this when we got reports of packets getting dropped. First we checked the load on the switches:

video-switch-1#show controllers utilization
Port       Receive Utilization  Transmit Utilization
Gi0/1              1                    1
...
Gi0/25             12                   22
Gi0/26             10                   16

Total Ports : 26
Switch Receive Bandwidth Percentage Utilization  : 1
Switch Transmit Bandwidth Percentage Utilization : 2

Switch Fabric Percentage Utilization : 1

This seemed normal, but when checking for drops we noticed the hard truth:

video-switch-1#show mls qos interface statistics | i GigabitEthernet|queue|dropped
...
GigabitEthernet0/26
  output queues enqueued:
 queue:    threshold1   threshold2   threshold3
 queue 0:           0           0           0
 queue 1:           0      150564      119978
 queue 2:           0           0           0
 queue 3:           0           0  1645256287
  output queues dropped:
 queue:    threshold1   threshold2   threshold3
 queue 0:           0           0           0
 queue 1:           0           0           0
 queue 2:           0           0           0
 queue 3:           0           0     7154647

Clearly we were dropping a number of packets (0.43% of packets) because we ran out of buffers on some queues. We tried to fix the problem using flow-control, but that was a mistake and it did not help. Trying to change the buffer allocation was not possible as these switches are limited in their QoS features. In the end we were unable to fix this problem without risking interrupting the traffic.

Designing, configuring and testing a proper QoS architecture and replacing the old switches which have served us well for the last 8 years with switches more adapted to these higher amounts of traffic, is an action point for the next year for us.

This year we used a more general http-user-agent analysis, so the client numbers are not directly comparable, but we detected the following client distribution:

OS	Clients
Android	4640
iOS	2579
Unknown	1038
Ubuntu	884
Linux	697
Debian	499
Other	447
Mac OS X	366
Fedora	223
Windows 10	105
Windows 7	98
Maemo	94
Windows	85
Windows XP	33
openSUSE	26
Windows 8.1	24
Windows 8	22
Firefox OS	12
Windows Phone	10
Windows 95	9
FreeBSD	8
OpenBSD	5
Solaris	3
Symbian OS	3
Windows Vista	2
NetBSD	1
Windows 98	1
Windows Mobile	1
MeeGo	1
Windows RT 8.1	1
SUSE	1

I’m really hoping that these machines running Windows 95 and friends were virtual machines or emulations.

See you all next year where we hope to be able to use telemetry instead of snmp/netflow!

Charles Eckel says:

February 8, 2017 at 9:31 am

Great job with the network. Thanks for all your efforts and for sharing this data. Glad to see the continued IPv6 progression. Its interesting to see the breakdown of clients. Would it be straightforward to determine the amount of traffic/bandwidth used by each class of client as well?
- Peter Van Eynde says:
  
  February 8, 2017 at 11:21 pm
  
  For the traffic per class: the netflow logger I wrote had a bug and we do not have this information.
  I’m trying to find a better netflow v10 implementation, but this seems… difficult.
Thomas says:

February 8, 2017 at 12:28 pm

One unfortunate exception was UD, which covered the embedded dev room on saturday (sunday it was elsewhere with good traffic) and python and community room on sunday.

Obtaining an IPv6 route took *minutes*. Tried the DHCPv4 network and it was same story there. If I managed to get an address, then the packet survival rate was in the low 30’s.

I have no idea what went wrong, not enough APs leading to overload or something else, but performance was really bad.
- Peter Van Eynde says:
  
  February 8, 2017 at 11:36 pm
  
  Yes for some rooms we noticed that we were hitting the limit of 200 clients per AP.
  
  It’s unclear if adding AP’s will help this situation: given that our connection to the U building is 100Mbit/sec, adding AP’s might send more traffic across this overloaded link.
  
  This would make the problem worse rather then better. Adding AP’s after upgrading the link to U would be labour intensive, but would fix the problems.
Wookey says:

February 8, 2017 at 2:54 pm

Interesting stats. Thanks for writing them up.

Network worked well for me this year, after not being able to connect at all for previous 2 years: it seems that wicd (debian)+x200s did not like the network, either IPv6 or IPv4 flavours – I never worked out why, given that it works nicely everywhere else. Presumalbly the v6-ness which one generally doesn’t get anywhere else. New t460x+nmcli worked much better.
Tomas Cechvala says:

February 13, 2017 at 5:52 am

Nice stats. Network worked really well for me. Some of my friends were watching online in pretty good quality.
Donald Axel - Vejby, DK-3210 Denmark says:

March 1, 2017 at 7:34 am

I would like to use your observations on mobile net on FOSDEM 2017 in our small club-magazine. I asked a friend if there wasn’t anything about routing on FOSDEM because I consider routing one of the most important technologies, if not THE most important. He referred to this article which is fun to read and nicely wrapped up. You can reach me on donald point axel weirdat gmail. Our association mail system is down, so use this address 🙂 And delete this comment afterwards 😉

Comments are closed.

Architect & DE Discussions

FOSDEM 2017: a view from the NOC

7 Comments

Architect & DE Discussions

FOSDEM 2017: a view from the NOC

7 Comments

CONNECT WITH CISCO

LET US HELP