Research on Fault-Tolerant and Dependable CORBA

The following papers describe our work on Fault-Tolerant and dependable CORBA.

Aniruddha Gokhale, Balachandran Natarajan, Joseph Cross, and Douglas C. Schmidt, Towards Dependable Real-time CORBA Middleware, Cluster Computing: the Journal on Networks, Software, and Applications Special Issue on Dependable Distributed Systems, edited by Alan George, to appear, 2002.
An increasing number of applications are being developed using distributed object computing (DOC) middleware, such as CORBA. Many of these applications require the underlying middleware, operating systems, and networks to provide dependable end-to-end quality of service (QoS) support to enhance their efficiency, predictability, scalability, and reliability. The Object Management Group (OMG), which standardizes CORBA, has addressed many of these application requirements individually in the Real-time CORBA (RT-CORBA) and Fault-tolerant CORBA (FT-CORBA) specifications. Though the implementations of RT-CORBA are suitable for mission-critical commercial or military distributed real-time and embedded (DRE) systems, the usage of FT-CORBA with RT-CORBA implementations are not yet suitable for systems that have stringent simultaneous dependability and predictability requirements.
This paper provides three contributions to the study and evaluation of dependable CORBA middleware for performance-sensitive DRE systems. First, we provide an overview of FT-CORBA and illustrate the sources of unpredictability associated with conventional FT-CORBA implementations. Second, we discuss the QoS requirements of an important class of mission-critical DRE systems to show how these requirements are not well served by FT-CORBA today. Finally, we empirically evaluate new dependability strategies for FT-CORBA that can help make the use of DOC middleware for mission-critical DRE systems a reality.
Chris Gill, Joe Loyall, Rick Schantz, Douglas C. Schmidt, Experiences Using Adaptive Middleware in Distributed Real-time Embedded Application Contexts: a Dependability Perspective, Proceedings of the IEEE Workshop on Dependable Middleware-Based Systems, Washington, D.C., June 23-26, 2002.
Over the past few years we have developed a number of real world applications that have both motivated and used middleware technologies for COTS-based distributed object computing in general, and runtime adaptive behavior in particular. In this context, a new twist on the theme of dependability involves adaptation as an effective means for dealing with the less-than-optimum situations that often arise due to failures and other forms of sudden, unexpected behavior. This paper briefly describes two of these applications, Weapons Systems Open Architecture (WSOA) and Unmanned Aerial Vehicles (UAV), which have already undergone a series of evaluation steps to determine the suitability of their concepts and implementations under realistic usage scenarios.
Our focus in this paper is on the adaptivity exhibited by WSOA and UAV applications under changing operating conditions, and on the technical basis for the effective marshaling of modified workplans to keep application mission objectives focused using available resources. We summarize some of the lessons learned thus far in developing these applications and the underlying middleware technologies, and show how they influence each other. We also assess the suitability of the solutions offered, discuss some of the difficulties encountered along the way, and outline the means we are applying to overcome them. We conclude with some of the key research challenges that must be resolved to field distributed real-time and embedded systems that can be depended upon to perform adequately, even under extreme and unusual operating circumstances.
Balachandran Natarajan, Joseph Cross, Aniruddha Gokhale, Douglas C. Schmidt, Christoper Andrews, and Sylvester Fernandez, and Chris Gill, Towards Dependable Real-time and Embedded CORBA Systems, Proceedings of the IEEE Workshop on Dependable Middleware-Based Systems, Washington, D.C., June 23-26, 2002.
Commercial off-the-shelf components (COTS) based on distributed object computing (DOC) middleware, such as CORBA, are increasingly being used to develop and deploy distributed applications rapidly and cost effectively. Conventional COTS middleware has been considered less suitable for mission-critical distributed real-time and embedded (DRE) applications that require support for multiple quality of service (QoS) properties, such as dependability, efficiency, and predictability. The CORBA Real-time and Fault-tolerance specifications individually address the issues of predictability and dependability, respectively. However, implementations of these specifications do not yet support DRE applications with stringent simultaneous dependability and predictability requirements. This paper provides three contributions to the development of middleware services that simultaneously address dependability and predictability requirements of key classes of DRE applications, such as commercial or military avionics systems, that require a high degree of reliability and bounded latency even in the case of faults. First, we outline the QoS requirements of an important class of DRE applications that possess both stringent time/space constraints and high dependability needs. Second, we show that meeting DRE application dependability and timing requirements by naively applying the strategies in the existing CORBA specification is replete with contradictions and pitfalls. Finally, we propose and empirically evaluate a new strategy that enables the composition of semantically compatible strategies from the Real-time and Fault-tolerant CORBA specifications to support DRE applications more effectively.
Andy Gokhale, Bala Natarajan, Douglas C. Schmidt and Shalini Yajnik, Applying Patterns to Improve the Performance of Fault-Tolerant CORBA, Proceedings of the 7th International Conference on High Performance Computing (HiPC 2000), ACM/IEEE, Bangalore, India, December 2000.
There is a significant trend to develop mission-critical, embedded, telecommunications, and financial distributed systems based on distributed object computing middleware, such as CORBA. Applications for these systems often require the underlying middleware, operating systems, and networks to provide end-to-end quality of service (QoS) support to enhance their efficiency, predictability, scalability, and fault tolerance. The Object Management Group (OMG), which standardizes CORBA, has addressed many of these application requirements recently in the Real-time CORBA and Fault-tolerant CORBA specifications.
This paper provides two contributions to the study and design of CORBA middleware that provides multiple QoS properties. First, we describe results of experiments conducted to measure the performance of a fault-tolerant CORBA services framework called DOORS and illustrate how common implementation pitfalls can adversely affect performance. Second, we describe the patterns we are incorporating into the DOORS fault-tolerant CORBA service to simultaneously improve its performance and fault-tolerance.
Andy Gokhale, Bala Natarajan, Douglas C. Schmidt and Shalini Yajnik, DOORS: Towards High-performance Fault-Tolerant CORBA, Proceedings of the 2nd International Symposium on Distributed Objects and Applications (DOA '00), OMG, Antwerp, Belgium, September 2000.
An increasing number of applications are being developed using distributed object computing middleware, such as CORBA. Many of these applications require the underlying middleware, operating systems, and networks to provide end-to-end quality of service (QoS) support to enhance their efficiency, predictability, scalability, and fault tolerance. The Object Management Group (OMG), which standardizes CORBA, has addressed many of these application requirements recently in the Real-time CORBA and Fault-tolerant CORBA specifications.
This paper provides four contributions to the study of fault-tolerant CORBA middleware for performance-sensitive applications. First, we provide an overview of the Fault Tolerant CORBA specification. Second, we describe a framework called DOORS, which is implemented as a CORBA service to provide end-to-end application-level fault tolerance. Third, we outline how the DOORS' reliability and fault-tolerance model has been incorporated into the standard OMG Fault-tolerant CORBA specification. Finally, we outline the requirements for CORBA ORB core and higher-level services to support the Fault Tolerant CORBA specification efficiently.
Constructing Reliable Distributed Communication Systems with CORBA. (updated August 11th, 1996). Appeared in the IEEE Communications Magazine feature topic issue on Distributed Object Computing. (with Silvano Maffeis)
Communication software and distributed services for next-generation applications must be reliable, efficient, flexible, and reusable. These requirements motivate the use of the Common Object Request Broker Architecture (CORBA). However, building highly available applications with CORBA is very hard. Neither the CORBA standard nor conventional implementations of CORBA directly address complex problems related to distributed computing, such as real-time or high-speed quality of service, partial failures, group communication, and causal ordering of events.
This paper makes three contributions to the study of reliable distributed object computing systems with CORBA. First, we examine the question of whether reliable applications can (or should) be implemented with CORBA today. Next, we present an extension to the Object Management Architecture that improves support for reliability and fault-tolerance. Finally, we propose a CORBA-based framework based on the Virtual Synchrony model that supports reliable data- and process-oriented distributed systems. In addition, our proposed framework supports applications requiring loosely coupled processes that communicate through asynchronous messaging.

Back to my CORBA Research page.

Last modified 18:06:18 CST 25 January 2019