pfring自带pfcount bps统计值偏小问题与解决方法-lujian19861986-ChinaUnix博客

pfring高性能抓包库自带pfcount示例程序，能够实时统计当前网卡的bps，pps等信息。

笔者在测试时发现pfring自带的pfcount程序统计的bps与真实值由一定的偏差。这是由于服务器默认开启了lro,gro,tso,gso等网络优化功能，导致bps统计值偏小。

解决办法有二：（1）关闭这些优化项（2）修改pfring源码，使其支持开启lro,gro,tso,gso功能时的统计功能。

下面介绍一下具体的过程：

笔者想要了解一下pfring统计出的网络bps，pps统计值是否准确，因此通过自带的pfcount与通过sar、ifconfig， ethtool等工具统计出的值做了对比。

pfcount统计方法：

./pfcount -i eth1

运行结果如下：

./pfcount -i eth1

Using PF_RING v.5.4.0

Capturing from eth1 [90:E2:BA:17:8F:4A][ifIndex: 3]

# Device RX channels: 24

# Polling threads: 1

=========================

Absolute Stats: [531718 pkts rcvd][0 pkts dropped]

Total Pkts=531718/Dropped=0.0 %

531'718 pkts - 265'985'268 bytes

=========================

Absolute Stats: [1043087 pkts rcvd][0 pkts dropped]

Total Pkts=1043087/Dropped=0.0 %

1'043'087 pkts - 527'970'766 bytes [1'042'956.63 pkt/sec - 4'223.23 Mbit/sec]

=========================

Actual Stats: 511369 pkts [1'000.12 ms][511'305.08 pps/2.10 Gbps]

其中Actual Stats中就有pps和bps两项统计结果。

为了验证这个数值是否准确，可以与其他统计工具的统计结果做一下对比：

1）通过sar统计

sar -n DEV 2 100

07:22:34 PM IFACE rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s

07:22:36 PM lo 0.00 0.00 0.00 0.00 0.00 0.00 0.00

07:22:36 PM eth0 0.00 0.00 0.00 0.00 0.00 0.00 0.00

07:22:36 PM eth1 613676.88 0.00 389848.42 0.00 0.00 0.00 3.47

07:22:36 PM eth2 1.73 1.73 0.12 1.03 0.00 0.00 0.00

07:22:36 PM eth3 0.00 0.00 0.00 0.00 0.00 0.00 0.00

07:22:36 PM eth4 0.00 0.00 0.00 0.00 0.00 0.00 0.00

07:22:36 PM eth5 0.00 0.00 0.00 0.00 0.00 0.00 0.00

07:22:36 PM bond0 1.73 1.73 0.12 1.03 0.00 0.00 0.00

2) ifconfig

ifconfig; sleep 2; ifconfig

eth1 Link encap:Ethernet HWaddr 90:E2:BA:17:8F:4A

UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1

RX packets:1117753621059 errors:0 dropped:589638 overruns:0 frame:0

TX packets:0 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:1000

RX bytes:837188512549252 (761.4 TiB) TX bytes:0 (0.0 b)

eth1 Link encap:Ethernet HWaddr 90:E2:BA:17:8F:4A

UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1

RX packets:1117754580978 errors:0 dropped:589638 overruns:0 frame:0

TX packets:0 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:1000

RX bytes:837189195584479 (761.4 TiB) TX bytes:0 (0.0 b)

计算出pps=（1117754580978-1117753621059）/2=479959.5

bps=(837189195584479-837188512549252)*8/2=2732140908

3）类似的原理，可以通过

ethtool -S eth1 | grep rx_packets； sleep 2; ethtool -S eth1 | grep rx_packets

可以通过ethtool来计算bps和pps

4）最准确的对比方式应该是定时读取/proc/net/dev下的内容，获取网卡计数的当前值，通过

bps=接收字节数增量/时间差

pps=接收包数/时间差

值得注意的是，在实测过程中，经常出现sar计数严重偏大的情况，上述几种方式中sar的数据很多时候不可信。

问题描述

通过上述几种方式结果的对比，发现pfcount统计的pps同其他数据基本吻合（误差很小），但bps这项却相差较大，经常出现5%以上的误差。

通过tcpdump抓包，可以发现有不少报文的长度>1500字节，但pfring统计出的包的最大长度就是1514（1500 IP包+14字节以太网帧头）。

问题找到了，但为什么会这样呢？

通过ethtool工具可以查看网卡的配置：

ethtool -k eth1

Offload parameters for eth1:

rx-checksumming: on

tx-checksumming: on

scatter-gather: on

tcp-segmentation-offload: on

udp-fragmentation-offload: off

generic-segmentation-offload: on

generic-receive-offload: on

large-receive-offload: on

tso，ufo，gso，gro，lro等优化的原理参加这篇文章：

http://www.ibm.com/developerworks/cn/linux/l-cn-network-pt/index.html

比较简单粗暴的解决办法是关闭这些优化项，以tso为例，关闭方法是：

ethtool -K eth1 tso off

这些修改后bps值会有明显的变化（无需重启统计程序），并且同其他的统计工具的结果能对应上