tcpflow源码分析（2）-xbjpkpk-ChinaUnix博客

上面一篇文章已经介绍了tcpflow的主体流程，本文介绍一下tcpflow中的一些关键问题、函数和数据结构分析。

1、数据流管理

对于同一个以太网接口，可能同时存在很多的数据流（对于繁忙的服务器更是如此），数据流的数据不断流入，查找该数据流先前数据的需求很多，因此需要设计一种合适的方式在内存中存储数据流数据。

对于这种查找很多的需求，最直观的想法是利用hash表来实现，而tcpflow中也是采用分支链接的hash表来存储tcp流的。

hash函数选择：

#define HASH_FLOW(flow) ( \
( (flow.sport & 0xff) | ((flow.dport & 0xff) << 8) | \
((flow.src & 0xff) << 16) | ((flow.dst & 0xff) << 24) \
) % HASH_SIZE)

将tcp流的四元组各取8位，然后通过求余实现hash函数

冲突处理：

flow_state_t *create_flow_state(flow_t flow, tcp_seq isn)
{
/* create space for the new state */
flow_state_t *new_flow = MALLOC(flow_state_t, 1);

/* determine where in the hash this goes */
int index = HASH_FLOW(flow); //计算flow的hash key

/* link it in to the hash bucket at the beginning */
new_flow->next = flow_hash[index]; //将新创建的flow插入到链表的头部
flow_hash[index] = new_flow;

/* initialize contents of the state structure */
new_flow->flow = flow;
new_flow->isn = isn; //tcp流起始序号
new_flow->fp = NULL; //存储文件的句柄
new_flow->pos = 0; //存储文件中的位置记录
new_flow->flags = 0; //流状态标识，如文件存在、流记录完成等
new_flow->last_access = current_time++; //current_time是个静态变量，用于表示每个流的生成时间

DEBUG(5) ("%s: new flow", flow_filename(flow));

return new_flow;
}

2、存储文件fd管理

不同的数据流会存到对应的文件中，是对每个包打开、关闭一次对应的fd，还是采用其他更高效的方式？我们来看看tcpflow是如何做的。

对于每个流都打开和关闭对应的存储文件fd显然是低效的（说句题外话，Squid为了解决小对象存储时频繁地打开、关闭文件，专门设计了一种coss文件系统，将众多的小文件放到大文件中存储）。

tcpflow做法总结：

利用循环队列来存储打开的fd

max_fds = get_max_fds() - NUM_RESERVED_FDS;

fd_ring = MALLOC(flow_state_t *, max_fds);

for (i = 0; i < max_fds; i++)
fd_ring[i] = NULL;

队列满时的处理策略

if (fd_ring[next_slot] != NULL) //如果队列满了，关闭下一个fd
close_file(fd_ring[next_slot]);

/* put ourslves in its place */
fd_ring[next_slot] = flow_state;

        /* set flags and remember where in the file we are */
        SET_BIT(flow_state->flags, FLOW_FILE_EXISTS);
        FGETPOS(flow_state->fp, &(flow_state->pos));

return flow_state->fp;

系统fd耗尽时的处理策略

          do {
              if (attempt_fopen(flow_state, filename) != NULL) {
              /* open succeeded... great */
                  done = 1;
              } else {
                  if (errno == ENFILE || errno == EMFILE) {
                      /* open failed because too many files are open... close one
                         and try again */
                     contract_fd_ring();
                     DEBUG(5) ("too many open files -- contracting FD ring to %d", max_fds);
                     done = 0;
                  } else {
                      /* open failed for some other reason... give up */
                      done = 1;
                  }
              }
          } while (!done);

3、数据流和存储文件的映射

通过数据流找文件的方法：

创建

char *filename = flow_filename(flow_state->flow);

根据flow的src，dst和src port和dst port来创建对应的文件名

查找

flow流对象中记录了对应文件的fd

    if (state->fp == NULL) {
        if (open_file(state) == NULL) {
            return;
        }
    }

4、数据结构

typedef struct {
u_int32_t src;  /* Source IP address */
u_int32_t dst;  /* Destination IP address */
u_int16_t sport;  /* Source port number */
u_int16_t dport;  /* Destination port number */
} flow_t; //TCP流四元组

typedef struct flow_state_struct {
struct flow_state_struct *next; /* Link to next one */ //单向链表next指针
flow_t flow;   /* Description of this flow */ //流ID：四元组
tcp_seq isn;   /* Initial sequence number we've seen */ //tcp 起始ID
FILE *fp;   /* Pointer to file storing this flow's data */ //存储文件的fd
long pos;   /* Current write position in fp */ //上次存储的位置
int flags;   /* Don't save any more data from this flow */ //流状态
int last_access;  /* "Time" of last access */ //记录访问时间先后
} flow_state_struct;