You are here: Foswiki>ATLAS Web>WovenTest (30 Nov 2007, Carsten)Edit Attach

Initial set-up

Nodes o460-o498 (except o467) and nodes o536/o537 are connected to TRX100-1.

Nodes o499-o530 and o538-o545 are connected to TRX100-2.

Plan: The EFX1000 is connected to TRX100-1 (192.168.26.201) with lag1 using 4 CX4 cables from top line card, the TRX100-2 (192.168.26.202) with lag2 using 4 CX4 cable from bottom line card.

All nodes are connected by eth1 and the set-up can be found in the node list. Thus we form two pools of nodes. For tests within each TOR switch we used these node lists TOR1 and TOR2.

Netperf Tests

Automatic test set-up can also be viewed

full set-up

40 nodes using TCP simplex from .201 to .202 with MTU1500

Not full speed, misconfiguration somewhere

result file

reduce MTU size on computers

same test as before with computer MTU-size reduced 1400

result file

We still had problems with duplicate packages returning from the "other" TOR switch.

TOR switches back to back with a single 10 GBit/s connection

Looks good, with 10 boxes streaming data through (simplex), we practically got line speed

result file

TOR switches with single connection through core switch

This test involves only a single CX4 cable (each) between the core switch and the edge switches.

10 node results show an average speed of about 900MBit/s (note that "perfect" conditions show about 940 MBit/s) while the 20 node results show perfect scaling with about 450 MBit/s throughput.

TOR switches with 4 connections back to back

10 nodes

For 10 nodes it looks very good, full speed for all simplex connections, the average speed is 940 MBit/s ( result file)

40 nodes

However, with 40 nodes the average speed goes down to about 750 MBit/s (result file)

Now the TRX100 are connected with the EFX1000 (same line card)

10 nodes

For 10 nodes it looks good again, full speed for all simplex connections (940 MBit/s) (result file)

40 nodes - static

Again, the speed is (much) lower for 40 nodes - about 710 MBit/s (result file)

40 nodes - flow control

Now the speed is down to 450 MBit/s ( result file)

40 nodes - no flow control

Speed is better again, about 780 MBit/s ( result file)

10 nodes - no flow control - TCPduplex

Good speed at 820 MBit/s (result file), but still quite a large std. deviation

40 nodes - no flow control - TCPduplex

Average speed only, 640 MBit/s (result file)

Reference: Test List

Make sure every host is there

  • run
 nmap -sP -n 192.168.26.0/24
to check if all hosts are visible to each other

  • run
 seq 1 40 | xargs -i ping -c 3 -w 5 192.168.26.{}
on a box from pool2 (with IP address larger than 192.168.26.59) or
 seq 60 99 | xargs -i ping -c 3 -w 5 192.168.26.{}
on a box from pool1. This will detect DUP packages if they are present.

  • now the netperf tests can be run

All of the above can be achieved also by:

 #!/bin/sh
 nmap -sP -n 192.168.26.0/24|grep 'hosts up'
 echo "DUPS:"
 echo `seq 1 40` `seq 60 99` | tr ' ' '\n' | xargs -i ping -c 3 -w 5 -A 192.168.26.{} | grep DUP | \
 cut -f 4 -d ' ' | tr -d ':' | sort | uniq -c

Quick look result generator

 tail -n 1 -q netperf.*|awk "{sum+=\$5; sq+=\$5*\$5} END \
 {n=`wc -l netperf.*`; print sum/n,sqrt(sq/n-sum*sum/n/n)}"

This will display the mean and std. deviation (min/max to be don
Topic revision: r1 - 30 Nov 2007, Carsten
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback