Chapter 2

HTTP server

The SUnet HTTP Server is a complete industrial-strength implementation of the HTTP 1.0 protocol. It is highly configurable and allows the writing of dynamic web pages that run inside the server without going through complicated and slow protocols like CGI or Fast/CGI.

2.1  Starting and configuring the server

All procedures described in this section are exported by the httpd structure.

The Web server is started by calling the httpd procedure, which takes one argument, an options value:

(httpd options)     --->     no return value         (procedure) 
This procedure starts the server. The options argument specifies various configuration parameters, explained below.

The server's basic loop is to wait on the port for a connection from an HTTP client. When it receives a connection, it reads in and parses the request into a special request data structure. Then the server forks a thread which binds the current I/O ports to the connection socket, and then hands off to the top-level request handler (which must be specified in the options). The request handler is responsible for actually serving the request -- it can be any arbitrary computation. Its output goes directly back to the HTTP client that sent the request.

Before calling the request handler to service the request, the HTTP server installs an error handler that fields any uncaught error, sends an error reply to the client, and aborts the request transaction. Hence any error caused by a request handler will be handled in a reasonable and robust fashion.

The options argument can be constructed through a number of procedures with names of the form with-.... Each of these procedures either creates a fresh options value or adds a configuration parameter to an old options argument. The configuration parameter value is always the first argument, the (old) options value the optional second one. Here they are:

(with-port port [options])     --->     options         (procedure) 
This specifies the port on which the server listens. Defaults to 80.

(with-root-directory root-directory [options])     --->     options         (procedure) 
This specifies the current directory of the server. Note that this is not the document root directory. Defaults to /.

(with-fqdn fqdn [options])     --->     options         (procedure) 
This specifies the fully-qualified domain name the server uses in automatically generated replies, or #f if the server should query DNS for the fully-qualified domain name.. Defaults to #f.

(with-reported-port reported-port [options])     --->     options         (procedure) 
This specifies the port number the server uses in automatically generated replies or #f if the reported port is the same as the port the server is listening on. (This is useful if you're running the server through an accelerating proxy.) Defaults to #f.

(with-server-admin mail-address [options])     --->     options         (procedure) 
This specifies the email address of the server administrator the server uses in automatically generated replies. Defaults to #f.

(with-request-handler request-handler [options])     --->     options         (procedure) 
This specifies the request handler of the server to which the server delegates the actual work. More on that subject below in Section 2.5. This parameter must be specified.

(with-simultaneous-requests requests [options])     --->     options         (procedure) 
This specifies a limit on the number of simultaneous requests the server servers. If that limit is exceeded during operation, the server will hold off on new requests until the number of simultaneous requests has sunk below the limit again. If this parameter is #f, no limit is imposed. Defaults to #f.

(with-log-file log-file [options])     --->     options         (procedure) 
This specifies the name of a log file for the server where it writes Common Log Format logging information. It can also be a port in which case the information is logged to that port, or #f for no logging. Defaults to #f.

To allow rotation of log files, the server re-opens the log file whenever it receives a USR1 signal.

(with-syslog? syslog? [options])     --->     options         (procedure) 
This specifies whether the server will log information about incoming to the Unix syslog facility. Defaults to #t.

(with-resolve-ip? resolve-ip? [options])     --->     options         (procedure) 
This specifies whether the server writes the domain names rather than numerical IPs to the output log it produces. Defaults to #t.

To avoid paranthitis, the make-httpd-options procedure eases the construction of the options argument:

(make-httpd-options transformer value ...)     --->     options         (procedure) 
This constructs an options value from an argument list of parameter transformers and parameter values. The arguments come in pairs, each an option transformer from the list above, and a value for that parameter. Make-httpd-options returns the resulting options value.

For example,

(httpd (make-httpd-options
         with-request-handler (rooted-file-handler "/usr/local/etc/httpd")
         with-root-directory "/usr/local/etc/httpd"))

starts the server on port 80 with /usr/local/etc/httpd as its root directory and lets it serve any file out from this directory.

2.2  Requests

Request handlers operate on requests which contain the information needed to generate a page. The relevant procedures to dissect requests are defined in the httpd-requests structure:

(request? value)     --->     boolean         (procedure) 
(request-method request)     --->     string         (procedure) 
(request-uri request)     --->     string         (procedure) 
(request-url request)     --->     url         (procedure) 
(request-version request)     --->     pair         (procedure) 
(request-headers request)     --->     list         (procedure) 
(request-socket request)     --->     socket         (procedure) 
The procedure inspect request values. Request? is a predicate for requests. Request-method extracts the method of the HTTP request; it's a string such as "GET", "PUT". Request-uri returns the escaped URI string as read from request line. Request-url returns an HTTP URL value (see the description of the url structure in 4). Request-version returns (major . minor) integer pair representing the version specified in the HTTP request. Request-headers returns an association lists of header field names and their values, each represented by a list of strings, one for each line. Request-socket returns the socket connected to the client.1

2.3  Responses

A path handler must return a response value representing the content to be sent to the client. The machinery presented here for constructing responses lives in the httpd-responses structure.

(make-response status-code maybe-message seconds mime extras body)     --->     response         (procedure) 
This procedure constructs a response value. Status-code is an HTTP status code (more on that below). Maybe-message is a a message elaborating on the circumstances of the status code; it can also be #f meaning that the server should send a default message associated with the status code. Seconds natural number indicating the time the content was created, typically the value of (time). Mime is a string indicating the MIME type of the response (such as "text/html" or "application/octet-stream"). Extras is an association list with extra headers to be added to the response; its elements are pairs, each of which consists of a symbol representing the field name and a string representing the field value. Body represents the body of the response; more on that below.

(make-redirect-response location)     --->     response         (procedure) 
This is a helper procedure for constructing HTTP redirections. The server will serve the new file indicated by location. Location must be URI-encoded and begin with a slash.

(make-error-response status-code request [message] extras ...)     --->     response         (procedure) 
This is a helper procedure for constructing error responses. code is status code of the response (see below). Request is the request that led to the error. Message is an optional string containing an error message written in HTML, and extras are further optional arguments containing further message lines to be added to the web page that's generated.

Make-error-response constructs a response value which generates a web page containg a short explanatory message for the error at hand.


ok 200 OK
created 201 Created
accepted 202 Accepted
prov-info 203 Provisional Information
no-content 204 No Content

mult-choice

300 Multiple Choices
moved-perm 301 Moved Permanently
moved-temp 302 Moved Temporarily
method 303 Method (obsolete)
not-mod 304 Not Modified

bad-request

400 Bad Request
unauthorized 401 Unauthorized
payment-req 402 Payment Required
forbidden 403 Forbidden
not-found 404 Not Found
method-not-allowed 405 Method Not Allowed
none-acceptable 406 None Acceptable
proxy-auth-required 407 Proxy Authentication Required
timeout 408 Request Timeout
conflict 409 Conflict
gone 410 Gone
internal-error 500 Internal Server Error
not-implemented 501 Not Implemented
bad-gateway 502 Bad Gateway
service-unavailable 503 Service Unavailable
gateway-timeout 504 Gateway Timeout
Table 1:  HTTP status codes


(status-code <.name.>)     --->     status-code         (syntax) 
(name->status-code symbol)     --->     status-code         (procedure) 
(status-code-number status-code)     --->     integer         (procedure) 
(status-code-message status-code)     --->     string         (procedure) 
The status-code syntax returns a status code where <.name.> is the name from Table 1. Name->status-code also returns a status code for a name represented as a symbol. For a given status code, status-code-number extracts its number, and status-code-message extracts its associated default message.

2.4  Response Bodies

A response body represents the body of an HTTP response. There are several types of response bodies, depending on the requirements on content generation.

(make-writer-body proc)     --->     body         (procedure) 
This constructs a response body from a writer -- a procedure that prints the page contents to a port. The proc argument must be a procedure accepting an output port (to which proc prints the body) and the options value passed to the httpd invocation.

(make-reader-writer-body proc)     --->     body         (procedure) 
This constructs a response body from a reader/writer -- a procedure that prints the page contents to a port, possibly after reading input from the socket of the HTTP connection. The proc argument must be a procedure accepting three arguments: an input port (associated with the HTTP connection socket), an output port (to which proc prints the body), and the options value passed to the httpd invocation.

2.5  Request Handlers

A request handler generates the actual content for a request; request handlers form a simple algebra and may be combined and composed in various ways.

A request handler is a procedure of two arguments like this:

(request-handler path req)     --->     response         (procedure) 
Req is a request. The path argument is the URL's path, parsed and split at slashes into a string list. For example, if the Web client dereferences URL
http://clark.lcs.mit.edu:8001/h/shivers/code/web.tar.gz

then the server would pass the following path to the top-level handler:

("h" "shivers" "code" "web.tar.gz")

The path argument's pre-parsed representation as a string list makes it easy for the request handler to implement recursive operations dispatch on URL paths.

The request handler must return an HTTP response.

2.5.1  Basic Request Handlers

The web server comes with a useful toolbox of basic request handlers that can be used and built upon. The following procedures are exported by the httpd-basic-handlers structure:

null-request-handler         request-handler 
This request handler always generated a not-found error response, no patter what the request is.

(make-predicate-handler predicate handler default-handler)     --->     request-handler         (procedure) 
The request handler returned by this procedure first calls predicate on its path and request; it then acts like handler if the predicate returned a true vale, and like default-handler if the predicate returned #f.

(make-host-name-handler hostname handler default-handler)     --->     request-handler         (procedure) 
The request handler returned by this procedure compares the host name specified in the request with hostname: if they match, it acts like handler, otherwise, it acts like default-handler.

(make-path-predicate-handler predicate handler default-handler)     --->     request-handler         (procedure) 
The request handler returned by this procedure first calls predicate on its path; it then acts like handler if the predicate returned a true vale, and like default-handler if the predicate returned #f.

(make-path-prefix-handler path-prefix handler default-handler)     --->     request-handler         (procedure) 
This constructs a request handler that calls handler on its argument if path-prefix (a string) is the first element of the requested path; it calls handler on the rest of the path and the original request. Otherwise, the handler acts like default-handler.

(alist-path-dispatcher handler-alist default-handler)     --->     request-handler         (procedure) 
This procedure takes as arguments an alist mapping strings to path handlers, and a default request handler, and returns a handler that dispatches on its path argument. When the new request handler is applied to a path
("foo" "bar" "baz")

it uses the first element of the path -- foo -- to index into the alist. If it finds an associated request handler in the alist, it hands the request off to that handler, passing it the tail of the path, in this case

("bar" "baz")

On the other hand, if the path is empty, or the alist search does not yield a hit, we hand off to the default path handler, passing it the entire original path,

("foo" "bar" "baz")

This procedure is how you say: ``If the first element of the URL's path is `foo', do X; if it's `bar', do Y; otherwise, do Z.'' The slash-delimited URI path structure implies an associated tree of names. The request-handler system and the alist dispatcher allow you to procedurally define the server's response to any arbitrary subtree of the path space.

Example: A typical top-level request handler is

(define ph
  (alist-path-dispatcher
      `(("h"       . ,(home-dir-handler "public_html"))
        ("cgi-bin" . ,(cgi-handler "/usr/local/etc/httpd/cgi-bin"))
        ("seval"   . ,seval-handler))
      (rooted-file-handler "/usr/local/etc/httpd/htdocs")))

This means:

2.5.2  Static Content Request Handlers

The request handlers described in this section are for serving static content off directory trees in the file system. They live in the httpd-file-directory-handlers structure.

The request handlers in this section eventually call an internal procedure named file-serve for serving files which implements a simple directory-generation service using the following rules:

The httpd-file-directory-handlers all take an options value as an argument, similar to the options for httpd itself.

The options argument can be constructed through a number of procedures with names of the form with-.... Each of these procedures either creates a fresh options value or adds a configuration parameter to an old options argument. The configuration parameter value is always the first argument, the (old) options value the optional second one. Here they are:

(with-file-name->content-type proc [options])     --->     options         (procedure) 
This specifies a procedure for determining the MIME content type (``text/html,'' ``application/octet-stream'' etc.) from a file name. Proc takes a file name as an argument and must return a string. (This is relevant in directory listings.) The default is a procedure able to handle the more common file extensions.

(with-file-name->content-encoding proc [options])     --->     options         (procedure) 
This specifies a procedure for determining the MIME content encoding (if the file is compressed, gzipped, etc.) from a file name. (This is relevant in directory listings.) Proc takes a file name as an argument and must return two values: the equivalent, unencoded file name (i.e., without the trailing .Z or .gz) and a string representing the content encoding.

(with-file-name->icon-url proc [options])     --->     options         (procedure) 
This specifies a procedure for determining the icon to be displayed next to a file name in a directory listing. Proc takes a file name as an argument and must return a URL for the corresponding icon or #f.

(with-blank-icon-url file-name-or-#f [options])     --->     options         (procedure) 
This specifies a file name (or its absence) for the special icon that must be as wide as the icons returned by the previous procedure but that is blank.

(with-back-icon-url file-name-or-#f [options])     --->     options         (procedure) 
This specifies a file name (or its absence) for the special icon that is displayed next to the ``parent directory'' link in directory listings.

(with-unknown-icon-url file-name-or-#f [options])     --->     options         (procedure) 
This specifies a file name (or its absence) for the special icon that is displayed next to the unknown entries in directory listings.

The make-file-directory-options procedure eases the construction of the options argument:

(make-file-directory-options transformer value ...)     --->     options         (procedure) 
This constructs an options value from an argument list of parameter transformers and parameter values. The arguments come in pairs, each an option transformer from the list above, and a value for that parameter. Make-file-directory-options returns the resulting options value.
Here are procedure for constructing static content request handlers:

(rooted-file-handler root [options])     --->     request-handler         (procedure) 
This returns a request handler that serves files from a particular root in the file system. Only the GET operation is provided. The path argument passed to the handler is converted into a filename, and appended to root. The file name is checked for .. components, and the transaction is aborted if it does. Otherwise, the file is served to the client.

(rooted-file-or-directory-handler root [options])     --->     request-handler         (procedure) 
Dito, but also serve directory indices for directories without index.html.

(home-dir-handler subdir [options])     --->     request-handler         (procedure) 
This procedure builds a request handler that does basic file serving out of home directories. If the resulting request-handler is passed a path of the form (user . file-path), then it serves the file subdir/file-path inside the user's home directory.

The request handler only handles GET requests; the filename is not allowed to contain .. elements.

(tilde-home-dir-handler subdir default-request-handler [options])     --->     request-handler         (procedure) 
This returns request handler that examines the car of the path. If it is a string beginning with a tilde, e.g., " ziggy", then the string is taken to mean a home directory, and the request is served similarly to a home-dir-handler request handler. Otherwise, the request is passed off in its entirety to the default-request-handler.

2.6  CGI Server

The procedure(s) described here live in the httpd-cgi-handlers structure.

(cgi-handler bin-dir [cgi-bin-path])     --->     request-handler         (procedure) 
Returns a request handler for CGI scripts located in bin-dir. Cgi-bin-dir specifies the value of the PATH variable of the environment the CGI scripts run in. It defaults to
/bin:/usr/bin:/usr/ucb:/usr/bsd:/usr/local/bin

The CGI scripts are called as specified by CGI/1.12.

Note that the CGI handler looks at the name of the CGI script to determine how it should be handled:

2.7  Scheme-Evaluating Request Handlers

The httpd-seval-handlers structure contains a handler which demonstrates how to safely evaluate Scheme code uploaded from the client to the server.

seval-handler         request-handler 
This request handler is suitable for receiving code entered into an HTML text form. The Scheme code being uploaded is being POSTed to the server (from a form). The code should be URI-encoded in the URL as program=<stuff>. stuff must be an (URI-encoded) Scheme expression which the handler evaluates in a separate subprocess. (It waits for 10 seconds for a result, then kills the subprocess.) The handler then prints the return values of the Scheme code.

The following structures define environments that are R5RSwithout features that could examine or effect the file system. You can also use them as models of how to execute code in other protected environments in Scheme 48.

2.7.1  The loser structure

The loser package exports only one procedure:

(loser name)     --->     nothing         (procedure) 
Raises an error like ``Illegal call name''.

2.7.2  The toothless structure

The toothless structure contains everything of R5RSexcept that following procedure cause an error if called:

2.7.3  The toothless-eval structure

(eval-safely expression)     --->     any result         (procedure) 
Creates a brand-new structure, imports the toothless structure, and evaluates expression in it. When the evaluation is done, the environment is thrown away, so expression's side-effects don't persist from one eval-safely call to the next. If expression raises an error exception, eval-safely returns #f.

2.8  Writing Request Handlers

2.8.1  Parsing HTML Forms

In HTML forms, field data are turned into a single string, of the form <.name.>=<.val.>&<.name.>=<.val.>.... The parse-html-forms structure provides simple functionality to parse these strings.

(parse-html-form-query string)     --->     alist         (procedure) 
This parses "foo=x&bar=y" into (("foo" . "x") ("bar" . "y")). Substrings are plus-decoded (i.-e. plus characters are turned into spaces) and then URI-decoded.

This implementation is slightly sleazy as it will successfully parse a string like "a&b=c&d=f" into (("a&b" . "c") ("d" . "f")) without a complaint.

2.9  SSL encryption with Apache

Network traffic with a HTTP server is usually encrypted and protected from manipulation using the cryptographic algorithm provided by an implementation of the secure socket layer, SSL for short. SUnet does not have support for SSL yet. However, an Apache web-server with SSL support can be configured as a proxy. In this setup the Apache web-server accepts encrypted requests and forwards them to a SUnet web-server running locally. This section describes how to set up Apache as an encrypting proxy, assuming the reader has basic knowledge about Apache and its configuration directives.

The following excerpt shows a minimalist SSL virtual host that forwards requests to a SUnet server.

<VirtualHost 134.2.12.82:443>
  DocumentRoot "/www/some-domain/htdocs"
  ServerName www.some-domain.de
  ServerAdmin admin@some-domain.de
  ErrorLog /www/some-domain/logs/error_log

  ProxyRequests off
  ProxyPass / http://localhost:8080/
  ProxyPassReverse / http://localhost:8080/

  SSLEngine on
  SSLRequireSSL

  SSLCertificateFile /www/some-domain/cert/some-domain.cert
  SSLCertificateKeyFile /www/some-domain/cert/some-domain.key
</VirtualHost>

First, a virtual host is added to Apache's configuration file. This virtual host listens for incoming connections on port 443, which is the standard port for encrypted HTTP traffic. SSLRequireSSL ensures that server accepts encrypted connections only.

In terms of the Apache documentation, the web-server acts as a so called reverse proxy. The option ProxyRequests has a misleading name. Setting this option to off does only turns off Apache's facility to act as a forward proxy and has no effect on the configuration directives for reverse proxies. Actually, turning on ProxyRequests is dangerous, because this turns Apache into a proxy server that can be used from anywhere to access any site that is accessible to the Apache server.

In this setting, all requests get forwarded to a SUnet web-server which listens for incoming connections on localhost port 8080 only, thus, it is not reachable from a remote machine. Apache forwards all requests to the host and port specified by the ProxyPass directive. ProxyPassReverse specifies how Location-Header fields of HTTP redirect messages send by the SUNet server are translated.


1 Request handlers should not perform I/O on the request record's socket. Request handlers are frequently called recursively, and doing I/O directly to the socket might bypass a filtering or other processing step interposed on the current I/O ports by some superior request handler.

2 see http://hoohoo.ncsa.uiuc.edu/cgi/interface.html for a sort of specification.