10G(82599EB) 网卡测试优化-爱开源

在正式测试优化之前，需要熟悉下，一个包从进入 NIC 到 userspace 的处理过程(1, 2,3)。
服务器硬件的基本(cs 有微小差异，可忽略不计)配置为 E5-2630×2, 8Gx8, 6x600G 10krpm, raid 10。系统为 RedHat 6.2 2.6.32-279.el6.x86_64。交换机为 Nexus5548, 5.1(3)N2(1)。关于设备之间的连接，正常多膜(SFP-10G-SR(=))的足够了，但是要注意区分下不同波长(850nm/1310nm)的模块。
服务器的网卡为 82599EB 芯片，x520-2，PCIe x8, 5GT/s:
# lspci -vvv -s | grep net -A 20 -B 20
81:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01)
Subsystem: Intel Corporation Ethernet Server Adapter X520-2
…
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold-)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
Address: 0000000000000000 Data: 0000
Masking: 00000000 Pending: 00000000
Capabilities: [70] MSI-X: Enable+ Count=64 Masked-
Vector table: BAR=4 offset=00000000
PBA: BAR=4 offset=00002000
Capabilities: [a0] Express (v2) Endpoint, MSI 00
DevCap:   MaxPayload 512 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+
DevCtl:   Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
MaxPayload 256 bytes, MaxReadReq 512 bytes
DevSta:   CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
LnkCap:   Port #2, Speed 5GT/s, Width x8, ASPM L0s, Latency L0 <1us, L1 <8us
ClockPM- Surprise- LLActRep- BwNot-
LnkCtl:   ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta:   Speed 5GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-

ixgbe 支持的众多参数，比 broadcom57xx, intel i350 档次明显高很多:
# modinfo ixgbe
parm:           InterruptType:Change Interrupt Mode (0=Legacy, 1=MSI, 2=MSI-X), default IntMode (deprecated) (array of int)
parm:           IntMode:Change Interrupt Mode (0=Legacy, 1=MSI, 2=MSI-X), default 2 (array of int)
parm:           MQ:Disable or enable Multiple Queues, default 1 (array of int)
parm:           DCA:Disable or enable Direct Cache Access, 0=disabled, 1=descriptor only, 2=descriptor and data (array of int)
parm:           RSS:Number of Receive-Side Scaling Descriptor Queues, default 0=number of cpus (array of int)
parm:           VMDQ:Number of Virtual Machine Device Queues: 0/1 = disable, 2-16 enable (default=8) (array of int)
parm:           max_vfs:Number of Virtual Functions: 0 = disable (default), 1-63 = enable this many VFs (array of int)
parm:           L2LBen:L2 Loopback Enable: 0 = disable, 1 = enable (default) (array of int)
parm:           InterruptThrottleRate:Maximum interrupts per second, per vector, (0,1,956-488281), default 1 (array of int)
parm:           LLIPort:Low Latency Interrupt TCP Port (0-65535) (array of int)
parm:           LLIPush:Low Latency Interrupt on TCP Push flag (0,1) (array of int)
parm:           LLISize:Low Latency Interrupt on Packet Size (0-1500) (array of int)
parm:           LLIEType:Low Latency Interrupt Ethernet Protocol Type (array of int)
parm:           LLIVLANP:Low Latency Interrupt on VLAN priority threshold (array of int)
parm:           FdirPballoc:Flow Director packet buffer allocation level:
1 = 8k hash filters or 2k perfect filters
2 = 16k hash filters or 4k perfect filters
3 = 32k hash filters or 8k perfect filters (array of int)
parm:           AtrSampleRate:Software ATR Tx packet sample rate (array of int)
parm:           FCoE:Disable or enable FCoE Offload, default 1 (array of int)
parm:           LRO:Large Receive Offload (0,1), default 1 = on (array of int)
parm:           allow_unsupported_sfp:Allow unsupported and untested SFP+ modules on 82599 based adapters, default 0 = Disable (array of int)

各个 parm 的含义可以看这里。

测试监控的工具，这个很早之前都有提到。包括如下的一些:
* iftop
* jnettop
* iptraf
* nethogs
* vnstat
* ibmonitor
* iperf
* netserver
* ntop
* cacti(需要安装 Realtime plugins)
* mpstat
* vmstat
* netstat
* lspci
* dropwatch
* ifconfig
* ip

关于 netstat，使用不同的参数，读取的源文件不大一样，比如，读取 /proc/net/tcp 在连接数比较大的时候就很慢，而诸如 /proc/net/dev, /prco/net/unix 则相对会快的多。

而对于做 tweak 来说，主要集中在 ethtool, sysctl, ifconfig, setpci 这几个工具之间。

ping 延时跟千兆比基本不变，维持在 0.2ms 左右。在进行压测期间，会增加到 2ms 左右。以下的都是在未对服务器交换机做任何优化之前的测量结果，即所有的都是默认设置。整个过程主要涉及 iperf, netserver 两个工具。

关于 iperf 的使用，需要注意几点:
# iperf -c 192.168.10.2 -t 20 –format KBytes -d/-r -x CMS -w 400
如果想让 c 和 s 同时往对端发送流量，可以在 c 端使用 -d 参数；如果想依次进行，可以使用 -r 参数。
双方同时发 9.35 左右，单方向发 9.41 左右。

还有一点需要注意，有没有 -P 参数，即是否使用多线程，对测试的影响还是很大的，尤其涉及到中断时。

netperf 相比更为专业，使用也比较简单，最常见的如下:
# netperf -H 192.168.10.2 -l 15 -c -C -f g — -m 8192 -r 1024,1024

可以先对 localhost 发起一个请求确认网卡有无异常，正常情况下应该是打满了:
# netperf -f g
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost () port 0 AF_INET
Recv   Send    Send
Socket Socket Message Elapsed
Size   Size    Size     Time     Throughput
bytes bytes   bytes    secs.    10^9bits/sec

87380 16384 16384 10.00 10.75

默认会使用 TCP_STREAM 也就是 send() 来发包，也可以通过 TCP_SENDFILE 调用 sendfile() 来测试，不过在测试达到 9.4G 之后，使用 sendfile 提升效果微乎其微。

接下来几篇博客会从不同的方面总结一些通用的操作优化，每一个测试都是相互独立的。最终的目的就是充分使用带宽，打满 10G，系统的负载尽可能的低，latency 尽可能的低。并且，更多的是集中在优化而非测试上，因此并没有正式的测试报告，更没有漂亮的图表。在测试过程中，可以通过下面的命令观察基本的系统负载状况:
# sar -n DEV -u 10 -P ALL
# mpstat -P ALL 5
当然，最基本的 top 也能看出不少的数据。

对于 BIOS 方面的问题，基本是在 “profile” 上的选择上，细化就是 C1E, Turbo boost, HT 等问题，不再多说。有点需要注意的是，关于 CPU freq 的选择，需要着重注意下。

以下的几篇关于 10G 网卡测试优化的博客参考了不少下面的两个经典的文档，在此表示感谢。时间都是 09 年左右的，但是绝大部分的实践都是适用的。
一个是 IBM 写的《Tuning 10Gb network cards on Linux》，另外一个是 redhat 在 08 年的峰会上分享的名为《10 Gb Ethernet》，他的加强改良版出现在 12 年《achieving top network performance》这是他的视频。

转载请注明：爱开源 » 10G(82599EB) 网卡测试优化

10G(82599EB) 网卡测试优化

与本文相关的文章

您必须登录才能发表评论！

与本文相关的文章

您必须 登录 才能发表评论！

您必须登录才能发表评论！