TCP: time wait bucket table overflow-爱开源

在redhat网站查到下面的信息，说是因为内存不够的原因。我觉得这个可以当作出现这个问题的解释，但是却解释得不够“完美”，我仍旧还在疑惑中：如果是是因为内存不够的原因，那么在每次测试之前，只要保证机器状态一样，那么TCP: time wait bucket table overflow这条信息的输出次数就应该是一样的，或者差的不是很多，因为每次有一个socket来了，就给一个数据结构，直到内存用完输出TCP: time wait bucket table overflow信息，但是我测试的结果是差异不小，这就让我感觉疑惑了，莫非每次分配的数据结构不一样大？这不可能。我每次单独测试都是重启开发板，然后不运行任何其他程序就进行测试的，这样应该可以保证每次测试机器的状态不是差的很多吧。

The “TCP: time wait bucket table overflow” message shows when the kernel is unable to allocate a data structure to put a socket in the TIME_WAIT state.

This is happening according to linux/net/ipv4/tcp_minisocks.c:

if (tcp_tw_count < sysctl_tcp_max_tw_buckets)
tw = kmem_cache_alloc(tcp_timewait_cachep, SLAB_ATOMIC);

if(tw != NULL) {
(..)
} else {
/* Sorry, if we’re out of memory, just CLOSE this
* socket up. We’ve got bigger problems than
* non-graceful socket closings.
*/
if (net_ratelimit())
printk(KERN_INFO “TCP: time wait bucket table overflow\\n”);
}

This problem is more likely to happen on systems creating a lot of TCP connections at a fast pace. RFC 793 decided that those sockets should stay in the TIME_WAIT state for 2*MSL (Maximum Segment Life), but the Linux implementation seems to make the TIME_WAIT state last for 1 minute.

Monitor the resources used by those time wait buckets by watching:

# cat /proc/slabinfo | grep tcp_tw_bucket

The size of the time wait bucket can be adjusted by writing to /proc/sys/net/ipv4/tcp_max_tw_buckets. However, the link between the client and the server cannot cause packets to arrive out of order, then the TIME_WAIT state can be skipped and sockets can be recycled immediately. Socket recycling can be configured in /proc/sys/net/ipv4/tcp_tw_recycle, but check with your network administrator to verify whether it’s safe to do so.

集群中的节点中每台在/var/log/messages中发现大量错误,内容如下：

root@real2 ~]# tail -f /var/log/messages

Oct 27 22:45:55 real2 kernel: printk: 1438 messages suppressed.

Oct 27 22:45:55 real2 kernel: TCP: time wait bucket table overflow

Oct 27 22:46:00 real2 kernel: printk: 1682 messages suppressed.

Oct 27 22:46:00 real2 kernel: TCP: time wait bucket table overflow

Oct 27 22:46:05 real2 kernel: printk: 1752 messages suppressed.

Oct 27 22:46:05 real2 kernel: TCP: time wait bucket table overflow

Oct 27 22:46:10 real2 kernel: printk: 1681 messages suppressed.

Oct 27 22:46:10 real2 kernel: TCP: time wait bucket table overflow

Oct 27 22:46:15 real2 kernel: printk: 1660 messages suppressed.

Oct 27 22:46:15 real2 kernel: TCP: time wait bucket table overflow