TAO Real-time ORB Endsystem Architecture

Overview of the TAO Real-time ORB Endsystem Architecture

TAO is an ORB endsystem that contains the following network interface, operating system, communication protocol, and CORBA middleware components and features shown in Figure 1.

Figure 1. TAO Architectural Components	TAO Features
	Optimized presentation layer Real-time Scheduling Service Real-time ORB Core Optimized Object Adapter Real-time IDL (RIDL) schemas for specifying QoS attributes Efficient zero-copy buffer management across OS protection domains A high-performance ATM Port Interface Controller (APIC) Real-time scheduling of OS and network resources

Each component in Figure 1 is summarized below. Complete information about TAO's ORB endsystem architecture is available online.

1. Gigabit I/O Subsystem

An I/O subsystem is responsible for mediating ORB and application access to low-level network and OS resources such as device drivers, protocol stacks, and the CPU(s). The key challenges in building a real-time I/O subsystem are (1) to enforce QoS guarantees while minimizing priority inversion and non-determinism, (2) to make it convenient for applications to specify their QoS requirements, and (3) to enable ORB middleware to leverage the QoS guarantees provided by the underlying network.

To meet these challenges, we are developing a high-performance I/O subsystem for TAO that is designed to run over Washington University's Gigabit ATM network. The components in TAO's I/O subsystem include (1) a high-speed ATM Port Interface Controller (APIC), (2) a real-time I/O subsystem, (3) a Run-Time Scheduler, and (4) an admission controller, shown in Figure 2.

Figure 2. TAO's Gigabit I/O Subsystem

To guarantee the QoS needs of applications, TAO requires guarantees from the underlying I/O subsystem. To accomplish this task, we are developing a high-performance network I/O subsystem. The components of this subsystem are described below.

1.1. High-speed Network Adaptor

At the heart of our I/O subsystem is a daisy-chained interconnect comprising one or more ATM Port Interconnect Controller (APIC) chips. APIC can be used both as a endsystem/network interface, as well as an I/O interface chip. It sustains an aggregate bi-directional data rate of 2.4 Gbps. In addition, TAO is designed with a layered architecture that can run on conventional embedded platforms linked via QoS-enabled networks (such as IPv6 with RSVP) and real-time interconnects (such as VME backplanes and multi-processor shared memory environments).

1.2. Real-time I/O Subsystem

TAO enhances the STREAMS model provided by Solaris and real-time operating systems like VxWorks. TAO's real-time I/O subsystem minimizes priority inversion and hidden scheduling problems that arise during protocol processing. Our strategy for avoiding priority inversion is to have a pool of kernel threads dedicated to protocol processing and to associate these threads with application threads. The kernel threads run at the same priority as the application threads, which prevents various real-time scheduling hazards such as priority inversion and hidden scheduling.

1.3. Run-Time Scheduler

TAO supports QoS guarantees via a real-time I/O scheduling class that supports periodic real-time applications. Once a thread of the real-time I/O class is admitted by the OS, the scheduler is responsible for (1) computing its priority relative to other threads in the class and (2) dispatching the thread periodically so that its deadlines are met.

TAO's real-time I/O scheduling class allows applications to specify their requirements in a high-level, intuitive manner. For instance, one implementation of TAO's real-time I/O scheduling class is based on rate monotonic scheduling, where applications can specify their processing requirements in terms of computation time C and period P. The OS then grants priorities to real-time I/O threads so that schedulability is guaranteed.

1.4. Admission Controller

To ensure that application QoS requirements can be met, TAO performs admission control for the real-time I/O scheduling class. Admission control allows the OS to either guarantee the specified computation time or to refuse to admit the thread. Admission control is useful for real-time systems with deterministic and statistical QoS requirements.

2. Real-time ORB Core

TAO's ORB Core manages transport connections, delivers client requests to an Object Adapter, and returns responses (if any) to clients. It is also responsible for handling the concurrency model used by application components. Figure 3 illustrates the components in the client-side and server-side of TAO's ORB Core.

Figure 3. Components in TAO's ORB Core

TAO's ORB Core is based on the high-performance, cross-platform ACE components such as Acceptors and Connectors, Reactors, and Tasks. Figure 3 illustrates how the client side of TAO's ORB Core uses ACE's Strategy_Connector to cache connections to the server, thus saving connection setup time and minimizing latencies between invocation and execution. The server side uses ACE's Strategy_Acceptor, in conjunction with Reactor, to accept connections. The acceptor delegates activation of the connection handler to one of ACE's various activation strategies (e.g., a threaded activation strategy shown in the diagram), which turns each handler into an Active Object. The connection handler extracts Inter-ORB Protocol (IOP) requests and hands them to TAO's Object Adapter, which dispatches the request to the servant operation.

3. Real-time Object Adapter

TAO's Object Adapter is responsible for demultiplexing and dispatching client requests onto target object implementations. A standard GIOP-compliant client request contains the identity of its remote object implementation and remote operation. The remote object implementation is represented by an object key and the remote operation is represented as a string. Conventional ORB endsystems demultiplex client requests to the appropriate operation of the target object implementation using the following steps (shown in Figure 4(A)):

Steps 1 and 2 -- The OS protocol stack demultiplexes the incoming client request multiple times (e.g., through the data link, network, and transport layers, as well as the user/kernel boundary) to the ORB's Object Adapter.
Steps 3, 4, and 5 -- The ORB core uses the addressing information in the client's object key to locate the appropriate Object Adapter, servant, and the skeleton of the target IDL operation;
Step 6 -- The IDL skeleton locates the appropriate operation, demarshals the request buffer into operation parameters, and performs the operation upcall.

Figure 4. Layered and De-layered Demultiplexing

Demultiplexing client requests through all these layers is expensive, particularly when a large number of operations appear in an IDL interface and/or a large number of objects are managed by an ORB. To minimize this overhead, TAO utilizes de-layered demultiplexing (shown in Figure 4(B)). This approach uses demultiplexing keys that the ORB assigns to clients. These keys map client requests to object/operation tuples in O(1) time without requiring any hashing or searching.

To further reduce the number of demultiplexing layers, the APIC can be programmed to directly dispatch client requests associated with ATM virtual circuits. This strategy reduces demultiplexing latency and supports end-to-end QoS on a per-request or per-object basis.

4. QoS Specification via Real-time IDL Schemas

Real-time applications that use TAO must specify their scheduled resource requirements to TAO's Real-time Scheduling Service. This QoS information is currently provided to TAO on a per-operation basis before program execution. For CPU requirements, the QoS requirements are expressed by RT_Operations using the attributes of the RT_Info IDL struct shown in Figure 5.

Figure 5. TAO's QoS Specification Model

An RT_Operation is a scheduled operation, i.e., one that has expressed its scheduled resource requirements to TAO using an RT_Info struct. The attributes in an RT_Info include worst-case execution time, period, importance, and data dependencies. Using scheduling techniques like RMS and analysis approaches like RMA, TAO's Real-Time Scheduling Service determines if there is a feasible schedule based on knowledge of all RT_Info data for all the RT_Operations in an application.

This set of attributes is sufficient for rate monotonic analysis and is used by TAO to (1) validate the feasibility of the schedule and (2) allocate ORB endsystem and network resources. Currently, developers must determine these parameters manually and provide them to TAO's Real-time Scheduling Service through its CORBA interface. We are planning to enhance this process by creating a tool that (1) monitors the execution of applications in example scenarios and (2) automatically extracts the necessary run-time parameters. Likewise, instead of actual execution, simulation results could be used to define RT_Info attributes for each operation.

5. Real-time Scheduling Service

TAO's real-time Scheduling Service is a CORBA object that has the following off-line and run-time responsibilities:

Off-line feasibility scheduling analysis -- It performs off-line feasibility analysis of IDL operations register with the Scheduling Service's RT_Info repository to determine whether there are sufficient CPU resources to perform all requested tasks.
Thread priority assignment -- During that off-line analysis, the Scheduling Service assigns priorities to threads. At run-time, the Scheduling Service provides an interface that allows TAO's ORB Core to access these priorities, which are the mechanism for interfacing with the OS-level dispatcher.
Coordinate mode changes -- At run-time, the Scheduling Service coordinates mode changes.

Participants in the TAO run-time scheduling model are shown in Figure 6 and described below:

Figure 6. TAO's Scheduling Service

Work Task -- A Work_Task is a unit of work that encapsulates application-level processing and communication activity. In some MDA projects, a work task is also called a module or process, but we avoid these terms because of their overloaded usage.
RT_Task -- An RT_Task is a work task that has timing constraints. Each RT_Task is considered to be a ``method'' (function) that has its own QoS information specified in terms of the attributes in its run-time information (RT_Info) descriptor. Thus, an application-level object with multiple methods may require multiple RT_Task instances.
Thread -- A unit of concurrency. A thread corresponds to, e.g., a Solaris or POSIX thread, an Ada task, a VxWorks task, or a Win32 thread. All threads are contained within RT_Tasks; an RT_Task can contain zero or more threads. An RT_Task that does not contain any of its own threads will only execute in the context of another RT_Task, i.e., it must ``borrow'' another task's (e.g., the Object Adapter's) thread of control to run.
OS Dispatcher -- The OS dispatcher uses thread priorities to select the next runnable thread that it will assign to a CPU. It removes a thread from the CPU when the thread blocks (and therefore is no longer runnable), or when the thread is preempted by a higher priority thread. With preemptive dispatching, any runnable thread with a priority higher than any running thread will preempt the lower priority thread. At that point the higher priority, runnable thread can be dispatched onto the CPU.
Our analysis, based on RMA, assumes fixed priority, i.e., the operating system does not change the priority of a thread. This contrasts with time-shared OS Schedulers, which typically age long-running processes by decreasing their priority over time. Thus, from the point of view of the OS dispatcher, the priority of each thread is constant.
RT_Info -- An RT_Info structure specifies an RT_Task's scheduling characteristics (such as computation time and execution period).
Run-Time Scheduler -- At run-time, the primary visible vestige of the Scheduling Service is the Run-Time Scheduler. The Run-Time Scheduler manages one RT_Info structure for each RT_Task in the system. By using an RT_Task's RT_Info, the Run-Time Scheduler can be queried for scheduling characteristics (e.g., a task's priority) of the RT_Task. Currently, the data represented in the RT_Info structures are computed off-line, i.e., priorities are statically assigned prior to run-time.

6. Presentation Layer Components and Optimizations

The presentation layer is a major bottleneck in high-performance communication subsystems. This layer transforms typed operation parameters from higher-level representations to lower-level representations (marshaling) and vice versa (demarshaling). In TAO, this transformation process is performed by client-side stubs and server-side skeletons that are generated by a highly-optimizing IDL compiler. The optimizations performed in TAO's presentation layer implementation are described below.

6.1. Presentation Layer Optimizations

The transformation between IDL definitions and the target programming language is automated by TAO's IDL compiler. In addition to reducing the potential for inconsistencies between client stubs and server skeletons, this compiler support innovative automated optimizations. TAO's IDL compiler is designed to generate and configure multiple strategies for marshaling and demarshaling IDL types. For instance, based on measures of a type's run-time usage, TAO can link in either compiled and/or interpreted IDL stubs and skeletons. This flexibility can achieve an optimal tradeoff between interpreted code (which is slow, but compact in size) and compiled code (which is fast, but larger in size).

Likewise, TAO can cache premarshaled application data units (ADUs) that are used repeatedly. Caching improves performance when ADUs are transferred sequentially in ``request chains'' and each ADU varies only slightly from one transmission to the other. In such cases, it is not necessary to marshal the entire every time. This optimization requires that the real-time ORB perform flow analysis of application code to determine what request fields can be cached.

Although these techniques can significantly reduce marshaling overhead for the common case, applications with strict real-time service requirements often consider only worst-case execution. As a result, the flow analysis optimizations described above can only be employed under certain circumstances, e.g., for applications that can accept statistical real-time service or when the worst-case scenarios are still sufficient to meet deadlines.

6.2. Memory Management Optimizations

Conventional implementations of CORBA suffer from excessive dynamic memory management and data copying overhead. Dynamic memory management is problematic for hard real-time systems because heap fragmentation can yield non-uniform behavior for different message sizes and different workloads. Likewise, excessive data copying throughout an ORB endsystem can significantly lower end-to-end performance.

Existing ORBs use dynamic memory management for several purposes. The ORB Core typically allocates a memory buffer for each incoming client request. IIOP demarshaling engines typically allocate memory to hold the decoded request parameters. Finally, IDL dynamically allocate and delete copies of client request parameters before and after an upcall, respectively.

These memory management policies are important in some circumstances (e.g., to protect against corrupting internal CORBA buffers when upcalls are made in threaded applications that modify their input). However, this strategy needlessly increases memory and bus overhead for real-time applications, as well as for streaming applications (such as satellite surveillance and teleconferencing) that consume their input immediately without modifying it.

TAO is designed to minimize and eliminate data copying at multiple points. For instance, TAO's ``zero-copy'' buffer management system described in allows client requests to be sent and received to and from the network without incurring any data copying overhead. Moreover, these buffers can be preallocated and passed between various processing stages in the ORB. In addition, Integrated Layer Processing (ILP) can be used to reduce data movement. Because ILP requires maintaining ordering constraints, we are applying compiler techniques (such as control and data flow analysis) to determine where ILP can be employed effectively.

Using compiler techniques for presentation layer and memory management functionality allows us to optimize performance without modifying standard OMG IDL and CORBA applications.

Back to the TAO intro page.

Last modified 18:06:18 CST 25 January 2019