Half-round-trip ping-pong latency may be the first metric that everyone looks at with MPI in HPC, but bandwidth is one of the next metrics examined.
40Gbps Ethernet has been available for switch-to-switch links for quite a while, and 40Gbps NICs are starting to make their way down to the host.
How does MPI perform with a 40Gbps NIC?
The graph below shows the latency and bandwidth results of a NetPIPE 3.7.1 run with Open MPI 1.8.1 over a pair of Cisco UCS C240 M3 servers (with Intel Xeon E5-2690 (v1) “Sandy Bridge” chips at 2.9Ghz), each with a 2x40Gbps Cisco 1285 VIC using the Cisco low-latency usNIC driver stack, connected back-to-back.
You can see that the HRT PP latency starts at 1.93us, and the bandwidth reaches 37.23Gbps.
Sweeeeeet….
What’s the MTU ? (to compute the actual max data rate)
For this graph, the MTU on both interfaces was 9000.
I should also mention that the underlying transport that the usNIC stack uses is UDP (via operating system bypass — not via the Linux UDP stack).
How’s the MPI-3 RMA perf?
To be honest, I haven’t benchmarked it.
Who uses that RMA stuff, anyway? 😉