关于TCP内核实现的文章和书籍不少,讲解侧重点不一。在内核源码中,注释率普遍在20%以上,大约能看懂。
但是有些关键变量解释的不够详细,对源码流程也没有文档给出。【笔者按:文档应该是有的,没找到】
http://blog.csdn.net/zhangskd/article/details/7043071写的一系列文章都很好,值得佩服。这里在他文章的基础上,做些分析和扩展。
先从数据结构讲起,tcp_sock在整个TCP实现中,所占位置极其重要,因为该结构体包含拥塞窗口、阈值等一系列变量。
在 include/linux/tcp.h中:
- static inline struct tcp_sock *tcp_sk(const struct sock *sk)
- {
- return (struct tcp_sock *)sk ;
- }
tcp_sock结构体的关键变量定义:
- u32 snd_wl1; /* Sequence for window update */
- u32 snd_wnd; /* The window we expect to receive */
- u32 max_window; /* Maximal window ever seen from peer */
- u32 mss_cache; /* Cached effective mss, not including SACKS */
- u32 window_clamp; /* Maximal window to advertise */
- u32 rcv_ssthresh; /* Current window clamp */
- ·································
- /*
- * Slow start and congestion control (see also Nagle, and Karn & Partridge)
- */
- u32 snd_ssthresh; /* Slow start size threshold */
- u32 snd_cwnd; /* Sending congestion window */
- u32 snd_cwnd_cnt; /* Linear increase counter */
- u32 snd_cwnd_clamp; /* Do not allow snd_cwnd to grow above this */
- u32 snd_cwnd_used;
- u32 snd_cwnd_stamp;
- u32 rcv_wnd; /* Current receiver window */
- u32 write_seq; /* Tail(+1) of data held in tcp send buffer */
- u32 pushed_seq; /* Last pushed seq, required to talk to windows */
- u32 lost_out; /* Lost packets */
- u32 sacked_out; /* SACK'd packets */
- u32 fackets_out; /* FACK'd packets */
- u32 tso_deferred;
- u32 bytes_acked; /* Appropriate Byte Counting - RFC3465 */
这里有几个对于拥塞控制算法,十分重要的变量:
snd_cwnd//拥塞窗口
snd_ssthresh//慢启动阈值
snd_cwnd_clamp//拥塞窗口夹子
snd_cwnd就是发送端的拥塞窗口,用于发送端的流量控制,相应,接收方有一个接收窗口,用于接收端的流量控制。snd_ssthresh定义了慢启动和拥塞控制的分界点,也就是指数增长和线性增长的分界点。实际上,慢启动和拥塞控制的算法是在一起实现的,详情参见《TCP/IP详解卷一:协议》和我的上一篇文章 http://blog.csdn.net/hanrui90/article/details/8457863
关于snd_cwnd_clamp变量,在《linux内核源码剖析——TCP/IP实现(下册)》p717,讲到:snd_cwnd_clamp是允许的拥塞窗口最大值,初始值为65535,之后再接收SYN和ACK段时,会根据条件确定是否从路由配置项读取信息更新该字段,最后在TCP连接复位前,将更新后的值根据某种算法计算后再更新回相对应的路由配置项中,便于连接使用。
慢启动算法关键:
- /*
- * Slow start is used when congestion window is less than slow start
- * threshold. This version implements the basic RFC2581 version
- * and optionally supports:
- * RFC3742 Limited Slow Start - growth limited to max_ssthresh
- * RFC3465 Appropriate Byte Counting - growth limited by bytes acknowledged
- */
- void tcp_slow_start(struct tcp_sock *tp)
- {
- int cnt; /* increase in packets */
- /* RFC3465: ABC Slow start
- * Increase only after a full MSS of bytes is acked
- *
- * TCP sender SHOULD increase cwnd by the number of
- * previously unacknowledged bytes ACKed by each incoming
- * acknowledgment, provided the increase is not more than L
- */
- if (sysctl_tcp_abc && tp->bytes_acked < tp->mss_cache)
- return;
- if (sysctl_tcp_max_ssthresh > 0 && tp->snd_cwnd > sysctl_tcp_max_ssthresh)
- cnt = sysctl_tcp_max_ssthresh >> 1; /* limited slow start */
- else
- cnt = tp->snd_cwnd; /* exponential increase */
- /* RFC3465: ABC
- * We MAY increase by 2 if discovered delayed ack
- */
- if (sysctl_tcp_abc > 1 && tp->bytes_acked >= 2*tp->mss_cache)
- cnt <<= 1;
- tp->bytes_acked = 0;
- tp->snd_cwnd_cnt += cnt;
- while (tp->snd_cwnd_cnt >= tp->snd_cwnd) {
- tp->snd_cwnd_cnt -= tp->snd_cwnd;
- if (tp->snd_cwnd < tp->snd_cwnd_clamp)
- tp->snd_cwnd++;
- }
- }
- EXPORT_SYMBOL_GPL(tcp_slow_start);
参考文献:
[1] http://blog.csdn.net/zhangskd/article/details/7043071,写的文章很好。
[2] 《TCP/IP详解卷一:协议》
[3] http://blog.csdn.net/hanrui90/article/details/8457863
[4] kernel.org linux-2.6.35.13源码
[5] 《linux内核源码剖析——TCP/IP实现(下册)》p717