Protocol Modules as State machines
A part of the libwww thread model is to keep track of the current
state in the communication interface to the network. As an example,
this section describes the current implementation of the HTTP module and how it has
been implemented as a state machine. The HTTP module is based on the
HTTP 1.0 specification but is backwards compatible with the 0.9
version. The major difference between the implementation before
version 3.0 of the Library is that this version is a state machine
based on the state diagram illustrated below. This implementation has
several advantages even though the HTTP protocol is stateless by
nature.

The individual states and the transitions between them are explained
in the following sections.
- BEGIN State
- This state is the idle state or initial state where the HTTP
module awaits a new request passed from the application.
- NEED_CONNECTION State
- The HTTP module is now ready for setting up a connection to the
remote host. The connection is always initiated by a connect
system call. In order to minimize the access to the Domain Name
Server, all host names to previous visited hosts are stored in a local
host cache as explained in section "DNS Cache
and Host Name Canonicalization". The cache handles multi homed
hosts in a special way in that it measures the time it takes to
actually make a connection to one of the IP-addresses. This time is
stored together with the specific IP-address and the host name in the
cache and on the next connection to the same host the IP-address with
the fastest connect time is chosen.
- NEED_REQUEST State
- The HTTP Request is what the
application sends to the remote HTTP server just after the
establishment of the connection. The request consists of a HTTP header
line, a set of HTTP Headers, and possibly a data object to be posted
to the server. The header line has the following format:
<METHOD> <URI> <HTTP-VERSION> CRLF
- SENT_REQUEST State
- When the request is sent the module waits until a response is
given from the server or the connection is timed out in case or an
error situation. As the module does not know whether the remote server
is a HTTP 0.9 server or a HTTP 1.0 it must look at the first part of
the response to figure out what version of HTTP is returned. The
reason is that the HTTP protocol 0.9 does not contain a HTTP header
line in the response. It simply starts to send the requested data
object as soon as the GET request is handled.
- NEED_ACCESS_AUTHORIZATION State
- If a 401 Unauthorized status
code is returned the module asks the user for a user id and a
password, see also the " HTTP Basic
Access Authorization Scheme". The connection is closed before the
user is asked for the user-id and password so any new request
initiated upon a 401 status code
causes a new connection to be established. This is done in order to
avoid having the connection hanging around waiting while the
applications is waiting for user input.
- REDIRECTION State
- The remote server returns a redirection status code if the URI has either been
moved temporarily or permanent to another location, possibly on
another HTTP server or any other service, for example FTP or
gopher. The HTTP module supports both a temporarily and a permanent
redirection code returned from the server:
- 301 Moved
- The load procedure is recursively called on a 301 redirection
code. The new URI is parsed back to the user as information via the Error and
Information module, and a new request generated. The new request
can be of any
access scheme accepted in a URI. An upper limit of redirections
has been defined (default to 10) in order to avoid infinite loops.
- 302 Found
- The functionality is the same as for a 301 Moved return status. A
clever application can use the returned URI to change the document in
which the URI originates so that the URI points to the new location.
-
- NO_DATA State
- When a return code indicates that no data object or resource
follows the HTTP headers the HTTP module can terminate the request and
pass control back to the application.
- NEED_BODY State
- If a body is included in the response from the server, the module
must prepare to read the data from the network and direct it to the
destination set up by the application. This is done by setting up a
stream stack with the required conversions.
- GOT_DATA State
- When the data object has been parsed through the stream stack, the
HTTP module terminates the request and handles control back to the
application.
- ERROR or FAILURE State
- If at any point in the request handling a fatal error occurs the
request is aborted and the connection closed. All information about
the error is parsed back to the application via the Error and
Information Module. As the HTTP protocol is stateless, all errors
are fatal between the server and the server. If the erroneous request
is to be repeated, the request starts in the initial state.
Henrik Frystyk, libwww@w3.org, December 1995