JAWS: Understanding High Performance Web Systems

Key Resource
Links2Go Key Resource
Apache Topic

Introduction

The emergence of the World Wide Web (Web) as a mainstream technology has forced the issue on many hard problems for network application writers, with regard to providing a high quality of service (QoS) to application users. Client side strategies have included client side caching, and more recently, caching proxy servers. However, the other side of the problem persists, which is that of a popular Web server which cannot handle the request load that has been placed upon it. Some recent implementation of Web servers have been designed to deal specifically with high load, but they are tied down to a particular platform (e.g., SGI WebFORCE) or employ a specific strategy (e.g., single thread of control). I believe that the key to developing high performance Web systems is through a design which is flexible enough to accommodate different strategies for dealing with server load and is configurable from a high level specification describing the characteristics of the machine and the expected use load of the server.

There is a related role of server flexibility, namely that of making new services, or protocols, available. The Service Configurator pattern has been identified as a solution towards making different services available, where inetd being cited as an example of this pattern in use. While a per process approach to services may be the right abstraction to use some of the time, a more integrated (yet modular!) approach may allow for greater strategic reuse. That is, a per process model of services requires each server to redesign and reimplement code which is common to all, and at the same time making it difficult to reuse strategies developed for one service in another. To gain ground in this area, the server should be designed so that new services can be easily added, and can easily use strategies provided by the adaptive server framework. But generalizing the notion of server-side service adaptation, one can envision a framework in which clients negotiate with servers about how services should handled. Most protocols today have been designed so that data manipulation is handled entirely on one side or the other. An adaptive protocol would enable a server and a client to negotiate which parts of a protocol should be handled on each end for optimal performance.


Motivation

Web servers are synonymous with HTTP servers and the HTTP 1.0 and 1.1 protocols are relatively straightforward. HTTP requests typically name a file and the server locates the file and returns it to the client requesting it. On the surface, therefore, Web servers appear to have few opportunities for optimization. This may lead to the conclusion that optimization efforts should be directed elsewhere (such as transport protocol optimizations, specialized hardware, and client-side caching).

Empirical analysis reveals that the problem is more complex and the solution space is much richer. For instance, our experimental results show that a heavily accessed Apache Web server (the most popular server on the Web today) is unable to maintain satisfactory performance on a dual-CPU 180 Mhz UltraSPARC 2 over a 155 Mbps ATM network, due largely to its choice of process-level concurrency. Other studies have shown that the relative performance of different server designs depends heavily on server load characteristics (such as hit rate and file size).

The explosive growth of the Web, coupled with the larger role servers play on the Web, places increasingly larger demands on servers. In particular, the severe loads servers already encounter handling millions of requests per day will be confounded with the deployment of high speed networks, such as ATM. Therefore, it is critical to understand how to improve server performance and predictability.

Server performance is already a critical issue for the Internet and is becoming more important as Web protocols are applied to performance-sensitive intranet applications. For instance, electronic imaging systems based on HTTP (e.g., Siemens MED or Kodak Picture Net) require servers to perform computationally-intensive image filtering operations (e.g., smoothing, dithering, and gamma correction). Likewise, database applications based on Web protocols (such as AltaVista Search by Digital or the Lexis Nexis) support complex queries that may generate a higher number of disk accesses than a typical Web server.


Modeling

Benchmarking Configuration
Overview of the JAWS Model

Underlying Assumptions

Research questions


Benchmarking

Benchmarking Configuration
Benchmarking Testbed Overview

Hardware Testbed

Our hardware testbed consisted of one Sun Ultra-1 and four Sun Ultra-2 workstations. The Ultra-1 has 128MB of RAM with an 167MHz UltraSPARC processor. Each Ultra-2 has 256MB of RAM, and is equipped with 2 UltraSPARC processors running at 168MHz. Each processor has 1MB of internal cache. All the machines are connected to a regular Ethernet configuration. The four Ultra-2 workstations are connected via an ATM network running through a Bay Networks LattisCell 10114 ATM, with a maxmimum bandwidth of 155Mbps. One of the Ultra-2 workstations hosted the Web server, while the three remaining Ultra-2 workstations were used to generate requests to benchmark the server. The Ultra-1 workstation served to coordinate the startup of the benchmarking clients and the gathering of data after the end of benchmarking runs.

Software Request Generator

Request load was generated by the WebSTONE webclient, that was modified to be multi-threaded. Each ``child'' of the webclient iteratively issues a request, receives the requested data, issues a new request, and so on. Server load can be increased by increasing the number of webclient ``children''. The results of the tests are collected and reported by the webclients after all the requests have completed.

Experiments

Each experiment consists of several rounds, one round for each server in our test suite. Each round is conducted as a series of benchmarking sessions. Each session consists of having the benchmarking client issue a number of requests (N) for a designated file of a fixed size (Z), at a particular load level beginning at l. Each successive session increases the load by some fixed step value (d) to a maximum load (L).

The webclient requests a standard file mix distributed by WebSTONE, which is representative of typical Web server request patterns.

Findings

By far, the greatest impediment to performance is the host filesystem of the Web server. However, factoring out I/O, the primary determinant to server performance is the concurrency strategy.

For single CPU machines, single-threaded solutions are acceptable and perform well. However, they do not scale for multi-processor platforms.

Process-based concurrency implementations perform reasonably well when the network is the bottleneck. However, on high-speed networks like ATM, the cost of spawning a new process per request is relatively high.

Multi-threaded designs appear to be the choice of the top Web server performers. The cost of spawning a thread is much cheaper than that of a process.

Additonal information is available in this paper.


Adaptation

Concurrency Strategies

Each concurrent strategy has positive and negative aspects, which are summarized in the table below. Thus, to optimize performance, Web servers should be adaptive, i.e., be customizable to utilize the most beneficial strategy for particular traffic characteristics, workload, and hardware/OS platforms. In addition, workload studies indicate that the majority of the requests are for small files. Thus, Web servers should adaptively optimize themselves to provide higher priorities for smaller requests. These techniques combined could potentially produce a server capable of being highly responsive and maximizes throughput. The next generation of the JAWS server plans to implement the prioritized strategy.

Strategy Advantages Disadvantages
Single Threaded No context switching overhead. Highly portable. Does not scale for multi-processor systems.
Process-per-request More portable for machines without threads. Creation cost high. Resource intensive.
Process pool Avoids creation cost. Requires mutual exclusion in some operating systems.
Thread-per-request Much faster than fork. May require mutual exclusion. Not as portable.
Thread pool Avoids creation cost. Requires mutual exclusion in some operating systems.
Summary of Concurrency Strategies

Protocol Processing

There are instances where the contents being transferred may require extra processing. For instance, in HTTP/1.0 and HTTP/1.1 files may have some encoding type. This generally corresponds to a file having been stored in some compressed format (e.g., gzip). In HTTP, it has been customary for the client to perform the decoding. However, there may be cases where the client lacks the proper decoder.

To handle such cases, it would be nice if the server would do the decoding on behalf of the client. A more advanced server may detect that a particularly large file would transfer more quickly for the client in some compressed form. But this kind of processing would require negotiation between the client and the server as to the kinds of content transformations are possible by the server and acceptable to the client. Thus, the server would be required to adapt to the abilities of the client, as well as the conditions of the network connection.


JAWS Adaptive Web Server

Here we will breifly describe the object-oriented architecture of the JAWS Web server framework. In order to understand the design, it is important to motivate the need for framework architectures.

Solutions to the Reuse Problem

Software reuse is a vital issue in successful development of large software systems. Software reuse can reduce development effort and maintenance costs. Thus, much effort in software engineering techniques has been devoted to the problem of creating reusable software.

The techniques for developing reusable software have evolved through several generations of language features (e.g., structured programming, functional programming, 4GLs, object-oriented programming), compilation tools (e.g., source file inclusion, compiled object files, class libraries, components), and system design methods (e.g., functional design, complexity analysis, formal methods, object-oriented design, design patterns). While each of these techniques help to facilitate the development and integration of reusable software, their roles are passive. This means that the software developer must make the decisions of how to put together the software system from the repository of reusable software. The figure below illustrates the passive nature of these solutions.

Class library architecture
Application development with class libraries and design patterns.

The advantages of this approach is that it maximizes the number of available options to software developers. This can be important in development environments with open-ended requirements, so that design flexibility is of premium value. However, the disadvantage is that every new project must be implemented from the ground up every single time.

To gain architectural reuse, software developers may utilize an application framework to create a system. An application framework provides reusable software components for applications by integrating sets of abstract classes and defining standard ways that instances of these classes collaborate. Thus, a framework provides an application skeleton which can be customized by inheriting and instantiating from reuseable components in the framework. The result is pre-fabricated design at the cost of reduced design flexibility. An application framework architecture is shown in the figure below.

framework architecture
Application development with an application framework.

Frameworks can allow developers to gain greater reuse of designs and code. This comes from leveraging the knowledge of an expert applications framework developer who has pre-determined largely what libraries and objects to use, what patterns they follow, and how they should interact. However, frameworks are much more difficult to develop than a class library. The design must provide an adequate amount of flexibility and at the same time dictate enough structure to be a nearly complete application. This balance must be just right for the framework to be useful.

The JAWS Web Server Framework

The figure below illustrates the object-oriented software architecture of the JAWS Web server framework. As indicated earlier, our results demonstrate the performance variance that occurs as a Web server experiences changing load conditions. Thus, performance can be improved by dynamically adapting the server behavior to these changing conditions. JAWS is designed to allow Web server concurrency and event dispatching strategies to be customized in accordance with key environmental factors. These factors include static characteristics, such as support for kernel-level threading and/or asynchronous I/O in the OS, and the number of available CPUs, as well as dynamic factors, such as Web traffic patterns, and workload characteristics.

JAWS Framework
JAWS Framework Overview

JAWS is structured as a framework that contains the following components: an Event Dispatcher, Concurrency Strategy, I/O Strategy, Protocol Pipeline, Protocol Handlers, and Cached Virtual Filesystem. Each component is structured as a set of collaborating objects implemented with the ACE C++ communication framework. The components and their collaborations follow several design patterns which are named along the borders of the components. Each component plays the following role in JAWS:


Papers and Talks

Continually under construction...

JAWS 2: Refactorization (Framework Design and Utilization Overview)
This talk provides an overview of how to use the next generation of JAWS. The new JAWS implementation is much more flexible and easier to use than the previous JAWS implementation. (However, it is not yet polished enough for general consumption.) (PDF)

Applying the Proactor Pattern to High-Performance Web Servers
This paper explains how the complexities of applying the proactive programming model can be alleviated by applying the Proactor pattern. It has been published in the 10th International Conference on Parallel and Distributed Computing and Systems, Las Vegas, Nevada, October 28-31, 1998. (pdf)

JAWS: A Framework for High-performance Web Servers
This technical overview of the JAWS Web server framework describes in detail the patterns and components which comprise JAWS. The paper has been published in a book covering framework programming techniques. (pdf)

Developing Flexible and High-performance Web Servers with Frameworks and Patterns
This paper explains how complexities occurring in the development of high-performance Web servers can be alleviated with the use of design patterns and object-oriented application frameworks. A subset of this paper is to appear in ACM Computing Surveys, May 1998. (pdf)

Techniques for Developing and Measuring High-Performance Web Servers over ATM Networks
This paper describes new benchmarking experiments comparing various Web server implementations, including Netscape Enterprise, Zeus, and Sun's Java Server. The analysis reveals several key performance optimization strategies, which were then incorporated into our own high-performance Web server, JAWS. The performance of the optimized version of JAWS against these servers are then compared. This paper was published at INFOCOM '98. (pdf)

Measuring the Impact of Event Dispatching and Concurrency Models on Web Server Performance over High-speed Networks
This is a technical paper describing the tradeoffs of applying Windows NT specific I/O system calls to our Web server. It was accepted by the Globecom Program Committee and was presented at the Global Internet mini-conference. (pdf)

High-performance Web Servers on Windows NT: Design and Performance
This is a position paper describing a poster session we were invited to participate for in the USENIX Windows NT Workshop. It discusses issues addressed while porting the JAWS framework to Windows NT. (HTML)