JAWS: Understanding High Performance Web Systems
The emergence of the World Wide Web (Web) as a mainstream technology
has forced the issue on many hard problems for network application
writers, with regard to providing a high quality of service (QoS)
to application users. Client side strategies have included client
side caching, and more recently, caching proxy servers. However, the
other side of the problem persists, which is that of a popular Web
server which cannot handle the request load that has been placed upon
it. Some recent implementation of Web servers have been designed
to deal specifically with high load, but they are tied down to a
particular platform (e.g., SGI WebFORCE) or employ a specific strategy
(e.g., single thread of control). I believe that the key to developing
high performance Web systems is through a design which is flexible
enough to accommodate different strategies for dealing with server
load and is configurable from a high level specification describing
the characteristics of the machine and the expected use load of the
There is a related role of server flexibility, namely that of
making new services, or protocols, available. The Service
Configurator pattern has been identified as a solution towards making
different services available, where inetd being cited as
an example of this pattern in use. While a per process approach to
services may be the right abstraction to use some of the time, a more
integrated (yet modular!) approach may allow for greater strategic
reuse. That is, a per process model of services requires each server
to redesign and reimplement code which is common to all, and at the
same time making it difficult to reuse strategies developed for one
service in another. To gain ground in this area, the server should be
designed so that new services can be easily added, and can easily use
strategies provided by the adaptive server framework. But
generalizing the notion of server-side service adaptation, one can
envision a framework in which clients negotiate with servers
about how services should handled. Most protocols today have been
designed so that data manipulation is handled entirely on one side or
the other. An adaptive protocol would enable a server and a client to
negotiate which parts of a protocol should be handled on each end for
Web servers are synonymous with HTTP servers and the HTTP 1.0 and
1.1 protocols are relatively straightforward. HTTP requests typically
name a file and the server locates the file and returns it to the
client requesting it. On the surface, therefore, Web servers appear
to have few opportunities for optimization. This may lead to the
conclusion that optimization efforts should be directed elsewhere
(such as transport protocol optimizations, specialized hardware, and
Empirical analysis reveals that the problem is more complex and the
solution space is much richer. For instance, our experimental results
show that a heavily accessed Apache Web server (the most popular
server on the Web today) is unable to maintain satisfactory
performance on a dual-CPU 180 Mhz UltraSPARC 2 over a 155 Mbps ATM
network, due largely to its choice of process-level concurrency.
Other studies have shown that the relative performance of different
server designs depends heavily on server load characteristics (such as
hit rate and file size).
The explosive growth of the Web, coupled with the larger role
servers play on the Web, places increasingly larger demands on
servers. In particular, the severe loads servers already encounter
handling millions of requests per day will be confounded with the
deployment of high speed networks, such as ATM. Therefore, it is
critical to understand how to improve server performance and
Server performance is already a critical issue for the Internet and
is becoming more important as Web protocols are applied to
performance-sensitive intranet applications. For instance, electronic
imaging systems based on HTTP (e.g., Siemens MED or Kodak Picture
Net) require servers to perform computationally-intensive image
filtering operations (e.g., smoothing, dithering, and gamma
correction). Likewise, database applications based on Web protocols
(such as AltaVista Search by Digital or the Lexis Nexis) support
complex queries that may generate a higher number of disk accesses
than a typical Web server.
|Overview of the JAWS Model
- Infinite network bandwidth.
This is consistent with my interests in high-speed networks.
For a model of Web servers which limites the network bandwidth,
- Fixed network latency.
We assume the contribution of network latency to be negligible.
This is will be more true with persistent HTTP connections, and
true request multiplexing.
- Client requests are "serialized".
Simply meaning that the server will process successive requests
from a single client in the order they are issued from the
- What is performance when average server rate is constant.
- What is performance when average server rate degrades with
- What degradation best models actual performance?
|Benchmarking Testbed Overview
Our hardware testbed consisted of one Sun Ultra-1 and four Sun Ultra-2
workstations. The Ultra-1 has 128MB of RAM with an 167MHz UltraSPARC
processor. Each Ultra-2 has 256MB of RAM, and is equipped with 2
UltraSPARC processors running at 168MHz. Each processor has 1MB of
internal cache. All the machines are connected to a regular Ethernet
configuration. The four Ultra-2 workstations are connected via an ATM
network running through a Bay Networks LattisCell 10114 ATM, with a
maxmimum bandwidth of 155Mbps. One of the Ultra-2 workstations hosted
the Web server, while the three remaining Ultra-2 workstations were
used to generate requests to benchmark the server. The Ultra-1
workstation served to coordinate the startup of the benchmarking
clients and the gathering of data after the end of benchmarking runs.
Software Request Generator
Request load was generated by the WebSTONE webclient, that was
modified to be multi-threaded. Each ``child'' of the webclient
iteratively issues a request, receives the requested data, issues a
new request, and so on. Server load can be increased by increasing
the number of webclient ``children''. The results of the tests are
collected and reported by the webclients after all the requests have
Each experiment consists of several rounds, one round for each server
in our test suite. Each round is conducted as a series of benchmarking
sessions. Each session consists of having the benchmarking client
issue a number of requests (N) for a designated file of a fixed
size (Z), at a particular load level beginning at l. Each
successive session increases the load by some fixed step value
(d) to a maximum load (L).
The webclient requests a standard file mix distributed by WebSTONE,
which is representative of typical Web server request patterns.
By far, the greatest impediment to performance is the host filesystem
of the Web server. However, factoring out I/O, the primary
determinant to server performance is the concurrency strategy.
For single CPU machines, single-threaded solutions are acceptable and
perform well. However, they do not scale for multi-processor
Process-based concurrency implementations perform reasonably well when
the network is the bottleneck. However, on high-speed networks like
ATM, the cost of spawning a new process per request is relatively
Multi-threaded designs appear to be the choice of the top Web server
performers. The cost of spawning a thread is much cheaper than that
of a process.
Additonal information is available in this
Each concurrent strategy has positive and negative aspects, which are
summarized in the table below. Thus, to optimize performance, Web
servers should be adaptive, i.e., be customizable to utilize the most
beneficial strategy for particular traffic characteristics, workload,
and hardware/OS platforms. In addition, workload studies indicate that
the majority of the requests are for small files. Thus, Web servers
should adaptively optimize themselves to provide higher priorities for
smaller requests. These techniques combined could potentially produce
a server capable of being highly responsive and maximizes
throughput. The next generation of the JAWS server plans to implement
the prioritized strategy.
|| No context switching overhead. Highly portable.
|| Does not scale for multi-processor systems.
|| More portable for machines without threads.
|| Creation cost high. Resource intensive.
|| Avoids creation cost.
|| Requires mutual exclusion in some operating systems.
|| Much faster than fork.
|| May require mutual exclusion. Not as portable.
|| Avoids creation cost.
|| Requires mutual exclusion in some operating systems.
|Summary of Concurrency Strategies
There are instances where the contents being transferred may require
extra processing. For instance, in HTTP/1.0 and HTTP/1.1 files may
have some encoding type. This generally corresponds to a file having
been stored in some compressed format (e.g., gzip). In HTTP,
it has been customary for the client to perform the
decoding. However, there may be cases where the client lacks the
To handle such cases, it would be nice if the server would do
the decoding on behalf of the client. A more advanced server may
detect that a particularly large file would transfer more quickly for
the client in some compressed form. But this kind of processing would
require negotiation between the client and the server as to
the kinds of content transformations are possible by the
server and acceptable to the client. Thus, the server would be required
to adapt to the abilities of the client, as well as the
conditions of the network connection.
Here we will breifly describe the object-oriented architecture of the
JAWS Web server framework. In order to understand the design, it is
important to motivate the need for framework architectures.
Solutions to the Reuse Problem
Software reuse is a vital issue in successful development of
large software systems. Software reuse can reduce development effort
and maintenance costs. Thus, much effort in software engineering
techniques has been devoted to the problem of creating reusable
The techniques for developing reusable software have evolved through
several generations of language features (e.g., structured
programming, functional programming, 4GLs, object-oriented
programming), compilation tools (e.g., source file inclusion,
compiled object files, class libraries, components), and system design
methods (e.g., functional design, complexity analysis, formal
methods, object-oriented design, design patterns). While each of
these techniques help to facilitate the development and integration of
reusable software, their roles are passive. This means that
the software developer must make the decisions of how to put together
the software system from the repository of reusable software. The
figure below illustrates the passive nature of these solutions.
|Application development with class libraries and
The advantages of this approach is that it maximizes the number of
available options to software developers. This can be important in
development environments with open-ended requirements, so that design
flexibility is of premium value. However, the disadvantage is that
every new project must be implemented from the ground up every single
To gain architectural reuse, software developers may utilize an
application framework to create a system. An application framework
provides reusable software components for applications by integrating
sets of abstract classes and defining standard ways that instances of
these classes collaborate. Thus, a framework provides an application
skeleton which can be customized by inheriting and instantiating from
reuseable components in the framework. The result is pre-fabricated
design at the cost of reduced design flexibility. An application
framework architecture is shown in the figure below.
|Application development with an application framework.
Frameworks can allow developers to gain greater reuse of designs and
code. This comes from leveraging the knowledge of an expert
applications framework developer who has pre-determined largely what
libraries and objects to use, what patterns they follow, and how they
should interact. However, frameworks are much more difficult to
develop than a class library. The design must provide an adequate
amount of flexibility and at the same time dictate enough structure to
be a nearly complete application. This balance must be just right for
the framework to be useful.
The JAWS Web Server Framework
The figure below illustrates the object-oriented software architecture
of the JAWS Web server framework. As indicated earlier, our results
demonstrate the performance variance that occurs as a
Web server experiences changing load conditions. Thus, performance
can be improved by dynamically adapting the server behavior to these
changing conditions. JAWS is designed to allow Web server
concurrency and event dispatching strategies to be customized in
accordance with key environmental factors. These factors include
static characteristics, such as support for kernel-level
threading and/or asynchronous I/O in the OS, and the number of
available CPUs, as well as dynamic factors, such as Web traffic
patterns, and workload characteristics.
|JAWS Framework Overview
JAWS is structured as a framework that contains the following
components: an Event Dispatcher, Concurrency
Strategy, I/O Strategy, Protocol Pipeline,
Protocol Handlers, and Cached Virtual Filesystem.
Each component is structured as a set of collaborating objects
implemented with the ACE C++ communication
framework. The components and their collaborations follow several
design patterns which are named along the borders of the
components. Each component plays the following role in JAWS:
- Event Dispatcher: This component is responsible for
coordinating the Concurrency Strategy with the
I/O Strategy. The passive establishment of
connections with Web clients follows the
Acceptor Pattern. New incoming requests are
serviced by some concurrency strategy. As events are processed,
they are dispensed to the Protocol Handler,
which is parametized by I/O strategy. The ability to
dynamically bind to a single concurrency strategy and I/O
strategy from a number of choices follows the Strategy
- Concurrency Strategy: This implements concurrency
mechanisms (such as single-threaded, thread-per-request, or
thread pool) that can be selected adaptively at run-time,
using the State Pattern or pre-determined at
initialization-time. Configuring the server as to which
concurrency strategies are available follows the Service Configurator Pattern. When concurrency involves
multiple threads, the strategy creates protocol handlers that
follow the Active Object Pattern.
- I/O Strategy: This implements the I/O mechanisms (such
as asynchronous, synchronous and reactive). Multiple I/O
mechanisms can be used simultaneously. Asynchronous I/O is
implemented utilizing the Asynchronous
Completion Token Pattern. Reactive I/O is accomplished
through the Reactor
Pattern. Both Asynchronous and Reactive I/O utilize the
Memento Pattern to capture and externalize the state of
a request so that it can be restored at a later time.
- Protocol Handler: This object allows system developers
to apply the JAWS framework to a variety of Web system
applications. A Protocol Handler object is
parameterized by a concurrency strategy and an I/O strategy, but
these remain opaque to the protocol handler. In JAWS, this
object implements the parsing and handling of HTTP request
methods. The abstraction allows for other protocols
(e.g., HTTP/1.1 and DICOM) to be incorporated easily into
JAWS. To add a new protocol, developers simply write a new
Protocol Handler implementation, which is then
configured into the JAWS framework.
- Protocol Pipeline: This component provides a framework
to allow a set of filter operations to be incorporated easily
into the data being processed by the Protocol
Handler. This integration is achieved by employing the
Adapter Pattern. Pipelines follow the Streams
Pattern for input processing. Pipeline components are
made available with the Service Configurator Pattern.
- Cached Virtual Filesystem: The component improves Web
server performance by reducing the overhead of filesystem
accesses. The caching policy is strategized (e.g., LRU,
LFU, Hinted, and Structured) following the Strategy
Pattern. This allows different caching policies to be
profiled for effectiveness and enables optimal strategies to be
configured statically or dynamically. The cache is instantiated
using the Singleton Pattern.
- Tilde Expander: This mechanism is another cache
component that uses a perfect hash table that maps abbreviated
user login names (e.g. ~schmidt to user home directories
(e.g., /home/cs/faculty/schmidt). When personal Web
pages are stored in user home directories, and user directories
do not reside in one common root, this component substantially
reduces the disk I/O overhead required to access a system user
information file, such as /etc/passwd.
Continually under construction...
- JAWS 2: Refactorization (Framework Design and Utilization
- This talk provides an overview of how to use the next
generation of JAWS. The new JAWS implementation is much more
flexible and easier to use than the previous JAWS
implementation. (However, it is not yet polished enough for
- Applying the Proactor Pattern to High-Performance
- This paper explains how the
complexities of applying the proactive programming model can be
alleviated by applying the Proactor pattern. It has
been published in the 10th International Conference on Parallel
and Distributed Computing and Systems, Las Vegas, Nevada,
October 28-31, 1998. (pdf)
- JAWS: A Framework for High-performance Web
- This technical overview of the JAWS Web server
framework describes in detail the patterns and components which
comprise JAWS. The paper has been published in a book covering
framework programming techniques. (pdf)
- Developing Flexible and High-performance Web Servers with
Frameworks and Patterns
- This paper explains how complexities occurring in the
development of high-performance Web servers can be alleviated
with the use of design patterns and object-oriented application
frameworks. A subset of this paper is to appear in ACM
Computing Surveys, May 1998.
- Techniques for Developing and Measuring High-Performance Web
Servers over ATM Networks
- This paper describes new benchmarking experiments comparing
various Web server implementations, including Netscape
Enterprise, Zeus, and Sun's Java Server. The analysis reveals
several key performance optimization strategies, which were then
incorporated into our own high-performance Web server, JAWS.
The performance of the optimized version of JAWS against these
servers are then compared. This paper was published at
- Measuring the Impact of Event Dispatching and Concurrency
Models on Web Server Performance over High-speed Networks
- This is a technical paper describing the tradeoffs of applying
Windows NT specific I/O system calls to our Web server. It was
accepted by the Globecom Program Committee and was presented at
the Global Internet mini-conference.
- High-performance Web Servers on Windows NT: Design and
- This is a position paper describing a poster session we were
invited to participate for in
the USENIX Windows NT Workshop. It discusses issues addressed
while porting the JAWS framework to Windows NT.