Brought to you by the
Internet Development Technologies

A Specification for Writing Internet Server Applications

A High-Performance Alternative to CGI Executables

Proposal by Process Software Corporation
ramanathan@process.com

November 1995

Contents:

Introduction to CGI

CGI (Common Gateway Interface) is an interface for running external programs or gateways under an information server. Currently, the only supported information servers are HTTP servers. What we refer to as gateways are really programs that handle information requests and return the appropriate document or generate a document on the fly. With CGI, your server can serve information which is not in a form readable by the client (such as an SQL database), and act as a gateway between the two to produce something which the client can use.

With the ever-expanding services being made available through the Web, more and more CGI applications will be developed. This calls for a closer look at the existing method of server executing CGI applications and possible ways of improving its performance.

The way a server responds to a request for CGI execution from a client browser is to create a new process and pass the data received from the client browser through environment variables and stdin, and to expect the results gathered by the CGI application on stdout of the newly created process. The server creates as many processes as the number of requests received, one per request.

For more information on the CGI specification, please refer to http://hoohoo.ncsa.uiuc.edu/cgi/.

Drawbacks with Current Implementations

As explained above, the existing HTTP servers create a separate process for each request received. The more concurrent requests, the more concurrent processes created by the server. Creating a process for every request is time-consuming and expensive in terms of server RAM, and can be restrictive as far as sharing the resources of the server application itself.

One way to circumvent creating processes is to convert the current CGI executable into a DLL that the server loads the first time a request is received for that DLL. The DLL then stays in memory, ready to service other requests until the server decides that it's no longer needed.

Note Even though this specification talks specifically about writing Internet server applications for the Microsoft® Windows NT(TM) operating system, the same specification could be used to build a sharable image for any operating system, provided the operating system supports loadable shared images. Process Software has built an OpenVMS loadable image based on this specification for a Web server running on OpenVMS.

Advantages of DLLs over Executables

In the Microsoft Windows® operating system, dynamic linking provides a way for a process to call a function that is not part of its executable code. The executable code for the function is located in a dynamic-link library (DLL), which contains one or more functions that are compiled, linked, and stored separately from the processes using them. For example, the Microsoft Win32® application programming interface (API) is implemented as a set of dynamic-link libraries, so any process using the Win32 API uses dynamic linking.

There are two methods for calling a function in a DLL:

This specification is aimed at the latter category of DLLs. These DLLs (also called Internet Server Applications or ISAs) are loaded at run time by the HTTP server. They are called at common entrypoints GetExtensionVersion() and HttpExtensionProc (). Details of this interaction are explained in detail in the following chapters.

Unlike .EXE type script executables, the ISA DLLs are loaded in the same address space as the HTTP server. This means that all the resources that are made available by the HTTP server process are also available to the ISA DLLs. There is minimal overhead associated with executing these applications as there is no additional processes creation overhead for each request. Our preliminary benchmark programs show that loading ISA DLLs in-process can perform considerably faster than loading them into a new process. Additionally, in-process applications scale much better under heavy load.

Since an HTTP server knows the list of ISA DLLs that are already in memory, it is possible for the server to unload the ISA DLLs that have not been accessed in a configurable amount of time. By preloading an ISA DLL, the server could speed up even the first request for that ISA. Unloading ISA DLLs that have not been used for some time will free up system resources.

The following picture explains how an ISA DLL interacts with an HTTP server with respect to the interaction of script executables with an HTTP server.

[ISA1459F  4320 bytes ]

ISA Architecture and CGI Architecture

As you can see in the picture above, all of the ISA DLLs reside in the same process as the HTTP server, while the conventional CGI apps run in different processes. Interaction between an HTTP server and ISA DLL is via extension control blocks (ECBs). The ECB is explained in detail in the following chapter. As the picture above shows, multiple ISA DLLs can co-exist in the same process as the server. In the case of conventional CGI executables, the server creates a separate process for each request and communicates with the created process via environment variables and stdin/stdout.

ISA DLLs need to be multithread-safe since multiple requests will be received simultaneously. For more information on how to write multithread-safe DLLs, please refer to the articles on writing multithreaded applications on the Microsoft Development Library CD-ROM, or any of the books on Win32 programming.

Similarly, for information on thread-safe DLLs and the scope of usage of C-runtime routines in a DLL, please see the articles on sharing data in a DLL on the Microsoft Development Library CD-ROM.

Detailed Interaction Between the HTTP Server and ISA

The HTTP server communicates with the ISA via a data structure called an extension control block (ECB). A client uses an ISA just like its CGI counterpart, except that rather than referencing "http://scripts/foo.exe?Param1+Param2" in the CGI instance, the following form would be used:

"http://scripts/foo.dll?Param1+Param2"

This means that in addition to identifying the files with extensions .EXE and .BAT as CGI executables, the server will also identify a file with .DLL extension as a script to execute. When the server loads the .DLL, it calls the DLL at the entry point GetExtensionVersion() to get the version number of the specification on which the extension is based, and a short human-readable description for server administrators. For every client request, the HttpExtensionProc() entry point is called. The extension receives the commonly needed information such as the query string, path info, method name, and the translated path. Subsequent sections of this document explain in detail how to retrieve the data sent by the client browser. The way the server communicates with the extension DLL is via a data structure called the EXTENSION_CONTROL_BLOCK.

The extension control block contains the following fields:

[ISA1459B  16948 bytes ]

Mandatory Entry Points for Internet Web Server Applications

All the DLLs written as Internet Web server applications must export two entry points: GetExtensionVersion() and HttpExtensionProc().

When the HTTP server loads an ISA for the first time, after loading the DLL, it calls the GetExtensionVersion() function. If this function does not exist, the call to load the ISA will fail. The recommended implementation of this function is:

BOOL WINAPI GetExtensionVersion( HSE_VERSION_INFO  *pVer )
{
    pVer->dwExtensionVersion = MAKELONG( HSE_VERSION_MINOR,
                                         HSE_VERSION_MAJOR );
    lstrcpyn( pVer->lpszExtensionDesc,
              "This is a sample Web Server Application",
               HSE_MAX_EXT_DLL_NAME_LEN );
    return TRUE;
}

The second required entry point is:

DWORD  HttpExtensionProc( LPEXTENSION_CONTROL_BLOCK  *lpEcb );

This entry point is similar to a main() function. This entry point would use the callback functions to read client data and decide on the action to be taken. Before returning back to the server, a properly formatted response must be sent to the client via either the WriteClient() or the ServerSupportFunction() API.

Return Values

HSE_STATUS_SUCCESS

The ISA has finished processing and the server can disconnect and free up allocated resources.

HSE_STATUS_SUCCESS_AND_KEEP_CONN

The ISA has finished processing and the server should wait for the next HTTP request if the client supports persistent connections. The application should return this only if it was able to send the correct Content-Length header to the client. The server is not required to keep the session open. The application should return this value only if it has sent a Connection: keep-alive header to the client.

HSE_STATUS_PENDING

The ISA has queued the request for processing and will notify the server when it has finished. See HSE_REQ_DONE_WITH_SESSION under ServerSupportFunction().

HSE_STATUS_ERROR

The ISA has encountered an error while processing the request, and the server can disconnect and free up allocated resources.

GetServerVariable

Gets information about a connection or about the server itself.

Prototype

BOOL WINAPI GetServerVariable(
     HCONN  hConn,
     LPSTR    lpszVariableName,
     LPVOID  lpvBuffer,
     LPDWORD  lpdwSize );

Parameters

hConn (IN)

Connection handle.

lpszVariableName (IN)

Null-terminated string indicating which variable is being requested. Variable names are as defined in the CGI specification .

lpvBuffer (OUT)

Pointer to buffer to receive the requested information.

lpdwSize (IN/OUT)

Pointer to DWORD indicating the number of bytes available in the buffer. On successful completion, the DWORD contains the number of bytes transferred into the buffer (including the null-terminating byte).

Return Value

TRUE if successful; FALSE if error. The Win32 GetLastError API call can be used to determine why the call failed. Possible error values include:

[ISA1459C  3887 bytes ]

Description

This function copies information (including CGI variables) relating to an HTTP connection, or to the server itself, into a buffer supplied by the caller.

Possible lpszVariableNames include:

[ISA1459D  16954 bytes ]

ReadClient

Reads data from the body of the client's HTTP request.

Prototype

BOOL ReadClient(;
     HCONN hConn,
     LPVOID lpvBuffer,
     LPDWORD lpdwSize );

Parameters

hConn (IN)

Connection handle.

lpvBuffer (OUT)

Pointer to buffer area to receive the requested information.

lpdwSize (IN/OUT)

Pointer to DWORD indicating the number of bytes available in the buffer. On return, *lpdwSize will contain the number of bytes actually transferred into the buffer.

Return Value

TRUE on success; FALSE if error. If the call fails, the Win32 API GetLastError may be called to determine the cause of the error.

Description

This function reads information from the body of the Web client's HTTP request into the buffer supplied by the caller. Thus, the call might be used to read data from an HTML form that uses the POST method. If more than *lpdwSize bytes are immediately available to be read, ReadClient will return after transferring that amount of data into the buffer. Otherwise, it will block, waiting for data to become available. If the socket on which the server is listening to the client is closed, it will return TRUE but with zero bytes read.

WriteClient

Writes data to the client.

Prototype

BOOL WriteClient(
     HCONN hConn,
     LPVOID lpvBuffer,
     LPDWORD lpdwSize,
     DWORD dwReserved );

Parameters

hConn (IN)

Connection handle.

lpvBuffer (IN)

Pointer to the data to be written.

LpdwSize (IN/OUT)

Pointer to the number of bytes in the buffer. On return, this will be updated to the number of bytes actually sent on this call. Only if an error has occurred will this be less than the number of bytes in the buffer.

dwReserved

Reserved for future use.

Return Value

TRUE on success; FALSE if error. If the call fails, the Win32 API GetLastError may be called to determine the cause of the error.

Description

This function sends information to the HTTP client from the buffer supplied by the caller.

ServerSupportFunction

Provides the ISAs with some general-purpose functions as well as functions that are specific to HTTP server implementation.

Prototype

BOOL ServerSupportFunction(
     HCONN hConn,
     DWORD dwHSERequest,
     LPVOID lpvBuffer,
     LPDWORD lpdwSize,
     LPDWORD lpdwDataType );

Note General-purpose functions should have a dwHSERequest value larger than 1000. Values up to 1000 are reserved for mandatory ServerSupportFunction s and should not be used.

dwHSERequest

Various defined values for dwHSERequest are:

[ISA1459E  8697 bytes ]

lpvBuffer

Points to a null-terminated optional status string (for example, "401 Access Denied"). If this buffer is null, a default response of "200 Ok" will be sent by this function.

lpdwDataType

This is a zero-terminated string pointing to optional headers or data to be appended and sent with the header. If this is NULL, the header will be terminated by an '\r\n' pair.

lpdwSize

Points to the size of the buffer lpdwDataType.

HSE_REQ_DONE_WITH_SESSION

If the server extension wants to hold onto the session because of extended processing requirements, it needs to tell the server when the session is finished so that the server can close it and free the related structures. Variables lpvBuffer, lpdwSize, and lpdwDataType are all ignored.

lpvBuffer

Points to a DWORD indicating the status code of the request.

Header File Associated with This Specification


/********
*
*  Copyright (c) 1995  Process Software Corporation
*
*  Copyright (c) 1995  Microsoft Corporation
*
*
*  Module Name  : HttpExt.h
*
*  Abstract :
*
*     This module contains  the structure definitions and prototypes for the
*     version 1.0 HTTP Server Extension interface.
*
******************/

#ifndef _HTTPEXT_H_
#define _HTTPEXT_H_

#include <windows.h>

#ifdef __cplusplus
extern "C" {
#endif

#define   HSE_VERSION_MAJOR           1      // major version of this spec
#define   HSE_VERSION_MINOR           0      // minor version of this spec
#define   HSE_LOG_BUFFER_LEN         80
#define   HSE_MAX_EXT_DLL_NAME_LEN  256

typedef   LPVOID  HCONN;

// the following are the status codes returned by the Extension DLL

#define   HSE_STATUS_SUCCESS                       1
#define   HSE_STATUS_SUCCESS_AND_KEEP_CONN         2
#define   HSE_STATUS_PENDING                       3
#define   HSE_STATUS_ERROR                         4


// The following are the values to request services with the ServerSupportFunction.

//  Values from 0 to 1000 are reserved for future versions of the interface

#define   HSE_REQ_BASE                             0
#define   HSE_REQ_SEND_URL_REDIRECT_RESP           ( HSE_REQ_BASE + 1 )
#define   HSE_REQ_SEND_URL                         ( HSE_REQ_BASE + 2 )
#define   HSE_REQ_SEND_RESPONSE_HEADER             ( HSE_REQ_BASE + 3 )
#define   HSE_REQ_DONE_WITH_SESSION                ( HSE_REQ_BASE + 4 )
#define   HSE_REQ_END_RESERVED                     1000

//
//  These are Microsoft specific extensions
//

#define   HSE_REQ_MAP_URL_TO_PATH                  (HSE_REQ_END_RESERVED+1)
#define   HSE_REQ_GET_SSPI_INFO                    (HSE_REQ_END_RESERVED+2)


//
// passed to GetExtensionVersion
//

typedef struct   _HSE_VERSION_INFO {

    DWORD  dwExtensionVersion;
    CHAR   lpszExtensionDesc[HSE_MAX_EXT_DLL_NAME_LEN];

} HSE_VERSION_INFO, *LPHSE_VERSION_INFO;

//
// passed to extension procedure on a new request
//
typedef struct _EXTENSION_CONTROL_BLOCK {

    DWORD     cbSize;                 // size of this struct.
    DWORD     dwVersion;              // version info of this spec
    HCONN     ConnID;                 // Context number not to be modified!
    DWORD     dwHttpStatusCode;       // HTTP Status code

    CHAR      lpszLogData[HSE_LOG_BUFFER_LEN];// null terminated log info specific to this Extension DLL


    LPSTR     lpszMethod;             // REQUEST_METHOD
    LPSTR     lpszQueryString;        // QUERY_STRING
    LPSTR     lpszPathInfo;           // PATH_INFO
    LPSTR     lpszPathTranslated;     // PATH_TRANSLATED

    DWORD     cbTotalBytes;           // Total bytes indicated from client
    DWORD     cbAvailable;            // Available number of bytes
    LPBYTE    lpbData;                // pointer to cbAvailable bytes

    LPSTR     lpszContentType;        // Content type of client data

    BOOL (WINAPI * GetServerVariable) ( HCONN       hConn,
                                        LPSTR       lpszVariableName,
                                        LPVOID      lpvBuffer,
                                        LPDWORD     lpdwSize );

    BOOL (WINAPI * WriteClient)  ( HCONN      ConnID,
                                   LPVOID     Buffer,
                                   LPDWORD    lpdwBytes,
                                   DWORD      dwReserved );

    BOOL (WINAPI * ReadClient)  ( HCONN      ConnID,
                                  LPVOID     lpvBuffer,
                                  LPDWORD    lpdwSize );

    BOOL (WINAPI * ServerSupportFunction)( HCONN      hConn,
                                           DWORD      dwHSERRequest,
                                           LPVOID     lpvBuffer,
                                           LPDWORD    lpdwSize,
                                           LPDWORD    lpdwDataType );

} EXTENSION_CONTROL_BLOCK, *LPEXTENSION_CONTROL_BLOCK;

//
//  these are the prototypes that must be exported from the extension DLL
//

BOOL  WINAPI   GetExtensionVersion( HSE_VERSION_INFO  *pVer );
DWORD WINAPI   HttpExtensionProc(  EXTENSION_CONTROL_BLOCK *pECB );

// the following type declarations is for the server side

typedef BOOL  (WINAPI * PFN_GETEXTENSIONVERSION)( HSE_VERSION_INFO  *pVer );

typedef DWORD (WINAPI * PFN_HTTPEXTENSIONPROC )( EXTENSION_CONTROL_BLOCK *pECB );


#ifdef __cplusplus
}
#endif

#endif  // end definition _HTTPEXT_H_

Notes to Application Developers

The application will get called at HttpExtensionProc() and will be passed a pointer to the ECB structure. The application will then decide on what exactly needs to be done, by reading the client input (by calling the functions GetServerVariable() and, if necessary, ReadClient() ). This is similar to setting up environment variables and reading stdin.

Since the ISA DLL is loaded in the same process as the HTTP server, an access violation by the ISA may crash some HTTP servers. As a result, you should ensure the integrity of the ISA by testing it thoroughly. ISAs that misbehave may also corrupt the server's memory space or may result in memory or resource leaks if they fail to clean up properly after themselves. To help with this problem, many HTTP servers will wrap the ISA entry points in a __try/__except clause so that access violations (or other exceptions) will not directly affect the server. For more information on __try/__except clause, please refer to the Win32 API documentation

The main entry point in the ISA, HttpExtensionProc(), takes only one input parameter: a pointer to structure of type EXTENSION_CONTROL_BLOCK. Application developers are not expected to change the following fields in the ECB structure: cbSize, dwVersion, and ConnID.

Application developers are encouraged to initialize their DLL automatically by defining an entry-point function for the DLL (for example, DllMain() ). The operating system will call this entry-point function by default, the first time a LoadLibrary() call or the last time a FreeLibrary() call is made for that DLL, or when a new thread is created or destroyed in the process.

Application developers are encouraged to maintain statistical information or any information pertaining to the DLL within the DLL itself. By creating appropriate forms one could measure the usage/performance of a DLL remotely. Also, this information could be exposed via the performance APIs for integration with PerfMon. The lpszLogData field of the ECB can also be used to log data to the Windows NT event viewer.

Steps to Convert Existing CGI Scripts to ISA DLLs

This chapter explains the basic requirements for converting an existing CGI script executable to an ISA DLL. Like any other DLL, Web server applications should be thread-safe. This means that more than one client will be executing the same function at the same time, so the code should follow safety procedures in modifying a global or static variable. By using appropriate synchronization techniques, such as critical sections and semaphores, this issue can be handled properly. For additional information on writing thread-safe DLLs, please refer to the documentation in the Win32 SDK and in the Microsoft Development Library.

The primary differences between an ISA DLL and a CGI executable include the following:

Conclusion, Acknowledgments, and Contact Information

The above proposal by Process Software Corporation is aimed at helping the third-party script executable developers optimize their applications and improve the performance. We welcome any suggestions or concerns that you may have. Please send them directly to me at ramanathan@process.com.

I would appreciate your feedback, especially on the ServerSupportFunction. If you have any specific server variable to be returned to an extension DLL, please send me e-mail and we'll consider including that.

I would like to acknowledge the valuable suggestions made by Chris Adie of EMWAC, UK and I thank him for reviewing the document as it was being written.

© 1996 Microsoft Corporation