Why do I have TCP connections in the CLOSE_WAIT state(Unix/Solaris)?

Description
This document provides information and details on TCP connections showing CLOSE_WAIT states.

Steps to Follow
Why does netstat -a show connections in the CLOSE_WAIT state?

The CLOSE_WAIT state on tcp connections occurs if the system has not received a close system call from the application, after having received notification (FIN packet) from the other system that it has closed its endpoint.

Details:

“CLOSE_WAIT” state means the other end of the connection has been closed while the local end is still waiting for the application to close. An indefinite CLOSE_WAIT state normally indicates some application level bug.

TCP connections will move to the CLOSE_WAIT state from the ESTABLISHED state after receiving a FIN from the remote system but before a close has called from the local application.

The CLOSE_WAIT state signifies that the endpoint has received a FIN from the peer, indicating that the peer has finished writing – It has no more data to send. This will be indicated by a 0 length read on the input. The connection is now half-closed or a simplex connection (one way) the receiver of the FIN still has the option of writing more data (*). The state can persist indefinitely as a it is perfectly valid, synchronized TCP state. The peer should be in FIN_WAIT_2 (i.e. sent fin, received ack, waiting for FIN). It is only an application fault if the application ignores the EOF (0 length read) and persists as if the connection is still a duplex connection.

The transition from CLOSE_WAIT -> LAST_ACK occurs when the application issues close. During the transition, TCP schedules a FIN to be sent. The FIN will be sent after remaining data, which may be delayed if the receiver has closed its window.

It is sometimes hard to determine what exactly happened with only “netstat -a” output on the server to go on. To get more details, it is best to “truss” the application and “snoop” the TCP session to help narrow down why the connections in CLOSE_WAIT are not being cleared.

# truss -o truss.out -laef -vall -p <the pid of the server process>
# snoop -o snoop.out port <tcp port number>

There are a couple of possibilities as to why the application has not issued a close on the TCP connection. The first possibility is that the application is not done sending data on the connection. As noted above, an application that only intends to receive data and not send any might very well close its end of the connection, which leaves the other end in CLOSE_WAIT until the process at that end is done sending data and issues a close.

Another possibility is that the application may not have received notification of the close of the other endpoint. With the sockets API, an application receives this notification when it attempts to read the socket and receives the EOF (i.e. 0 bytes) indication. IF the application does not attempt to read the socket, it will never know that the socket is closed at the other end. This would be an application error, since applications should correctly manage all of their sockets.

One other possibility has been seen, particularly when so-called “middleware” is in use. Since the close of a socket does not take effect until the last process that has a copy of the socket issues a close, it possible for a process to detect the close of the other end of the socket and to issue a close on its own end, or even to exit, without actually causing the socket to close. This will occur if a copy of the socket is also in another process. Sockets are copied from a parent process to a child via the fork system call. Some middleware servers have a bug such that a server will accept a TCP connection and then fork a child to handle the connection, but before the fork is issued, another thread will accept another TCP connection. Thus there will be two child processes, but one socket will be copied in both of them. Until both processes exit, this socket cannot close. If the process that mistakenly has the socket is particularly long-lived, while the child that was supposed to have the socket is short-lived, the socket will end up in the CLOSE_WAIT state until the long-lived process exits.

Note: if the FIN is recieving from the peer because it issues a close(2) against the fd of the socket, that will normally terminate both directions of data transfer but if peer uses shutdown(3SOCKET) with SHUT_WR option it will send FIN but still can read on that socket till other end closes the socket and read(2) on this end returns 0.

Leave a Comment