unix网络编程时从write到read发生了什么

技术标签:?linux tcp??write??read

1. write

packet transmission in kernel space

packet transmission in kernel space

packet transmission in device driver

packet transmission in device driver

1.1 数据被写入TCP send buff

a user application writes the data into the TCP send buffer by calling the write() system call. Like the TCP recv buffer, the send buffer is a crucial parameter to get maximum throughput. The maximum size of the congestion window is related to the amount of send buffer space allocated to the TCP socket. The send buffer holds all outstanding packets (for potential retransmission) as well as all data queued to be transmitted.The size of the send buffer can be set by modifying the /proc/sys/net/ipv4/tcp_wmem variable, which takes three different values, i.e., min, default, and max.

1.2 在TCP send buff里面的数据可用或者ACK被收到时,TCP层构建报文段。

The TCP layer builds packets when data is available in the send buffer or ACK packets in response to data packets received

1.3 报文段被传递到网络层的IP协议去处理。

Each packet is pushed down to the IP layer for transmission.

1.4 IP层将包放入与网卡相关的外出队列qdisc. qdisc的容量由txqueuelen参数控制。

The IP layer enqueues each packet in an output queue (qdisc) associated with the NIC. The size of the qdisc can be modified by assigning a value to the txqueuelen variable associated with each NIC device.If the output queue is full, the attempt to enqueue a packet generates a local-congestion event, which is propagated upward to the TCP layer. The TCP congestion-control algorithm then enters into the Congestion Window Reduced (CWR) state, and reduces the congestion window by one every other ACK (known as rate halving).

1.5 当包被成功加入外出队列qdisk,包的描述符信息被加入网卡的外出环形缓冲区tx_ring

After a packet is successfully queued inside the output queue, the packet descriptor (sk_buff ) is then placed in the output ring buffer tx_ring

1.6 当网卡的外出缓冲区中有数据时,网卡驱动调用网卡的DMA engine将内核缓冲区中的数据发送到网络上。

When packets are available inside the ring buffer, the device driver invokes the NIC DMA engine to transmit packets onto the wire

2. read

Packet Reception in Kernel Space

packet receive in kernel space

Packet Reception Device Driver

Packet Reception Device Driver

2.1 数据包到达网卡,网卡调用DMA engine将数据包拷贝到内核的缓冲区中(后续操作时,包依然在这里,避免多余的拷贝开销);包描述信息存入sk_buff结构,放入网卡接受缓冲区环形队列rx_ring中。如果rx_ring队列满了,包将会被丢弃。

The Linux kernel uses an sk_buff data structure to describe each packet.When a packet arrives at the NIC, it invokes the DMA engine to place the packet into the kernel memory via empty sk_buffs stored in a ring buffer called rx_ring. An incoming packet is dropped if the ring buffer is full. When a packet is processed at higher layers, packet data remains in the same kernel memory, avoiding any extra memory copies

2.2 包被成功接收后,网卡向cpu发起一个中断,cpu然后处理每个包并将包传递给ip层。

Once a packet is successfully received, the NIC raises an interrupt to the CPU, which processes each incoming packet and passes it to the IP layer.

2.3 ip层进行解码操作,如果解码后的数据是一个tcp包,就将包传递给tcp层。

The IP layer performs its processing on each packet, and passes it up to the TCP layer if it is a TCP packet.

2.4 tcp处理包,最后将包放到TCP recv buffer,read 从TCP recv buffer里面读取数据。

The TCP process is then scheduled to handle received packets. Each packet in TCP goes through a series of complex processing steps. The TCP state machine is updated, and finally the packet is stored inside the TCP recv buffer.

3. TCP重要参数

3.1 接收方的recv buffer大小/proc/sys/net/ipv4/tcp_rmem

The size of the recv buffer can be set by modifying the /proc/sys/net/ipv4/tcp_rmem variable. It takes three different values, i.e, min, default, and max. The min value defines the minimum receive buffer size even when the operating system is under hard memory pressure. The default is the default size of the receive buffer, which is used together with the TCP window scaling factor to calculate the actual advertised window. The max defines the maximum size of the receive buffer.(是packet receive in kernel space中read下面的tcp recv buff参数)
The number of packets a TCP sender is able to have outstanding (unacknowledged) is the minimum of the congestion window (cwnd) and the receiver’s advertised window (rwnd). The maximum size of the receiver’s advertised window is the TCP recv buffer size. Hence, if the size of the recv buffer is smaller than the the bandwidth delay product (BDP) of the end-to-end path(端到端的带宽时延乘积), the achievable throughput will be low. On the other hand, a large recv buffer allows a correspondingly large number of packets to remain outstanding, possibly exceeding the number of packets an endto-end path can sustain.

3.2 接收方的队列长度netdev_max_backlog

Also at the receiver, the parameter netdev_max_backlog dictates the maximum number of packets queued at a device, which are waiting to be processed by the TCP receiving process. If a newly received packet when added to the queue would cause the queue to exceed netdev max backlog then it is discarded.(应该是packet receive in kernel space中的Socket backlog参数)

3.3 发送方的缓冲区大小/proc/sys/net/ipv4/tcp_wmem

The size of the send buffer can be set by modifying the /proc/sys/net/ipv4/tcp_wmem variable, which also takes three different values, i.e., min, default, and max(是packet transmission in kernel space中write下面的tcp send buffer参数)
Like the TCP recv buffer, the send buffer is a crucial parameter to get maximum throughput. The maximum size of the congestion window is related to the amount of send buffer space allocated to the TCP socket. The send buffer holds all outstanding packets (for potential retransmission) as well as all data queued to be transmitted.

3.4 发送方队列长度txqueuelen

The analogue to the receiver’s netdev max backlog is the sender’s txqueuelen. The size of the qdisc can be modified by assigning a value to the txqueuelen variable associated with each NIC device.
A TCP sender is allowed to send the minimum of the congestion window and the receivers advertised window number of packets.(是packet transmission in kernel space中IP layer中qdisk的长度)

3.5 其他

There are many other parameters that are relevant to the operation of TCP in Linux, and each is at least briefly explained in the documentation included in the distribution (Documentation/networking/ip-sysctl.txt).

参考文献

http://www.ece.virginia.edu/cheetah/documents/papers/TCPlinux.pdf

原文地址: