DeSiDeRaTa: Resource and QoS Management for Dynamic, Scalable, Dependable Real-Time Systems

D-Spec: A QoS Specification Language for Dynamic Real-time Systems

(manual for use of the specification language and compiler)

Laboratory for Parallel and Distributed Real-time Systems

Department of Computer Science and Engineering

University of Texas at Arlington

This work was sponsored in part by DARPA/NCCOSC contract N66001-97-C-8250, and by NSWC/NCEE contracts NCEE/A303/41E-96 and NCEE/A303/50A-98.

1 Introduction *

2 The Specification Language *

2.1 Software System Specifications *

2.2 Software Subsystem Specifications *

2.3 Dynamic Real-time Path Specifications *

2.4 Application Specifications *

2.5 Hardware System Specifications *

2.6 Network System Specifications *

3 Installation * System Requirements *

Setup *

4 Compiling a Specification *

Introduction

To address the needs of those who engineer the emerging generation of distributed real-time systems, we have developed a programming-language-independent meta-language for describing real-time QoS in terms of end-to-end paths through application programs. Language constructs are provided for the description of deterministic, stochastic, and dynamic characteristics of environment-dependent paths. The provision for description of periodic, transient and hybrid (transient-periodic) paths is also made. Another novel feature of the language is that it allows the description of multi-level timing constraints through simple deadlines (which apply to periodic, transient and transient-periodic paths) and super-period deadlines (which can be specified for transient-periodics). The language also permits description of scalability and fault tolerance features of the end-to-end paths and their application program constituents.

This is significantly different from other real-time languages, which can be divided into two groups: application development languages, and specification languages (or formalisms) that are used to describe time constraints at the application level, or even the system level. Examples of real-time application programming languages are Tomal, Pearl, Real-Time Euclid, RTC++, Real-Time Concurrent C, Dicon, Chaos, Flex, TCEL, Ada95, and MPL. These languages include a wide variety of features that allow the compiler (and possibly run-time system) to check assertions or even to transform code to ensure adherence to timing constraints. Specification languages or "meta-languages", such as ACSR, GCSR, and RTL formalize the expression of different types of timing constraints and in some cases allow proofs of properties of programs based on these constraints. In some cases, these features have been folded into an application development language. Our work is a specification meta-language that is independent of an application development language. Rather than providing real-time support within a particular application language, it provides support for expressing timing constraints for systems of application programs, which may be written in a wide variety of programming languages. Unlike previous work in which timing constraints are associated at a relatively small granularity, such as the task grain or the object grain, our language allows timing constraints to be expressed at large granularities, i.e., to span multiple programs.

Another way of characterizing real-time languages is in the way they permit characterization of a system?s behavior when interacting with the physical environment. Prior work has typically assumed that the effects of the environment on the system can be modeled deterministically. Our work expands this to include systems that interact with environments that are either deterministic, stochastic, or dynamic. This is done by modeling interactions with the environment as data streams and event streams that may have deterministic, stochastic or dynamic properties.

In order to handle dynamic environments, it is useful if the language includes features that can be related to dynamic mechanisms for monitoring, diagnosis and recovery. Language support for run-time monitoring of real-time programs has been addressed in previous work such as Real-Time Euclid, and Ada95. However, limited support was provided for diagnosis and recovery actions. Our work extends the language features pertaining to diagnosis of timing problems, and application scalability via migration and/or replication of software components to handle varying data stream or event stream loads.

Previous real-time languages allow the description of behaviors that are purely periodic or aperiodic. Our work extends language support to describe hybrid behaviors such as the transient-periodic. We also allow for dynamic multi-dimensional timing constraints, which, to our knowledge, cannot be described in any existing real-time language. Our system environment also requires support for fault-tolerance, an issue that has not yet been addressed in many real-time languages. This is addressed by providing abstractions to describe minimum redundancy levels of software components.

D-Spec supports the dynamic path paradigm. A path-based real-time subsystem (see Figure 1) typically consists of a situation assessment (or sensing) path, an action initiation path and an action guidance path. The paths interact with the environment via evaluating streams of data from sensors, and by causing actuators to respond (in a timely manner) to events detected during evaluation of sensor data streams. The system operates in an environment that is either deterministic, stochastic or dynamic. A deterministic environment exhibits behavior that can be characterized by a constant value. A stochastic environment behaves in a manner that can be characterized by a statistical distribution. A dynamic environment (e.g., a war-fighting environment) depends on conditions which cannot be known in advance.

A (partial) air defense subsystem can be modeled using three dynamic paths: threat detection, engagement, and missile guidance. The threat detection path examines radar sensor data (radar tracks) and detects potential threats. The path consists of a radar sensor, a sensor data stream, a filtering program and an evaluation program. When a threat is detected and confirmed, the engagement path is activated, resulting in the firing of a missile to engage the threat. After a missile is in flight, the missile guidance path uses sensor data to track the threat, and issues guidance commands to the missile. The missile guidance path involves sensor hardware, software for filtering, software for evaluating & deciding, software for acting, and actuator hardware.

The remainder of this section describes the features of dynamic real-time paths. Recall that a path may be one of three types: situation assessment, action initiation, or action guidance. The first path type, situation assessment, continuously evaluates the elements of a sensor data stream to determine if environmental conditions are such that an action should be taken (if so, the action initiation path is notified). Thus, this type of path is called continuous. Typically, there is a timeliness objective associated with completion of one review cycle of a continuous path, i.e., on the time to review all of the elements of one instance of a data stream. (Note: A data stream is produced by sampling the environment. One set of samples is called a data stream instance.)

The threat detection path of the air defense system is an example of a continuous path. It is a sensor-data-stream-driven path, with desired end-to-end cycle latencies for evaluation of radar track data. If it fails to meet the desired timeliness Quality of Service in a particular cycle, the path must continue to process track data, even though desired end-to-end latencies cannot be achieved. Peak loads cannot be known in advance for the threat detection path, since the maximum number of radar tracks can never be known. Furthermore, average loading of the path is a meaningless notion, since the variability in the sensor data stream size is very large (it may consist of zero tracks, or it may consist of thousands of tracks).

The second path type, action initiation, is driven by a stream of events sent by a continuous (situation assessment) path. It uses inputs from sensors to determine which actions should be taken and how the actions should be performed, notifies actuators to start performing the actions, and informs the action guidance path that an action has been initiated. We call this type of path transient, since it performs work in response to events. Typically, a timing objective is associated with the completion of the initiation sequence. The importance of the timing objective for a transient path is often very high, since performance of an action may be mission-critical or safety-critical.

For example, the engagement path of the air defense example is a transient path. It is activated by an event from the threat detection path, and has a QoS objective of end-to-end timeliness. The real-time QoS of this path has a higher priority than the real-time QoS of the continuous threat detection path.

The third path type, action guidance, is activated by an action initiation event, and is deactivated upon completion of the action. Action guidance repeatedly uses sensor inputs to monitor the progress of an actuator, to plan corrective actions needed to guide the actuator to its goal, and to issue commands to the actuator. This type of path is called quasi-continuous, since it behaves like a continuous path when it is active. A quasi-continuous path has two timeliness objectives: (1) cycle completion time: the duration of one iteration of the "monitor, plan, command" loop, and (2) action completion time (or deactivation time): the time by which the action must complete in order for success. Note that it is more critical to perform the required processing before the action completion deadline than it is to meet the completion time requirement for every cycle (although the two deadlines are certainly related). Thus, it is acceptable for the completion times of some cycles to violate the cycle deadline requirement, as long as the desired actions are successfully completed by the deactivation deadline.

The missile guidance path of the air defense example is a quasi-continuous path. It is activated by the missile launch event. Once activated, the path continuously issues guidance commands to the missile until it detonates (the deactivation event). The required completion time for one iteration is dynamically determined, based on characteristics of the threat. If multiple threat engagements are active simultaneously, the threat engagement path is responsible for issuing guidance commands to all missiles that have been launched.

The Specification Language

This section presents D-Spec, a specification language for describing the characteristics and requirements of dynamic, path-based real-time systems. The language provides abstractions to describe the properties of the software, such as hierarchical structure, inter-connectivity relationships, and run-time execution constraints. It also allows description of the physical structure or composition of the hardware such as LANs, hosts, interconnecting devices or ICs (such as bridges, hubs, and routers), and their statically known properties (e.g., peak capacities). Further, the Quality-of-Service requirements on various system components can be described.

At the highest level, a D-Spec specification consists of a collection of software systems, hardware systems, and network systems. The language rules for specifying systems are described in the remainder of this section. A high-level system specification is shown below:

SOFTWARE SYSTEM D {?}

SOFTWARE SYSTEM Z {?}

HARDWARE SYSTEM D_H {?}

NETWORK SYSTEM D_N {?}

Software System Specifications

A software specification is collection of software systems, each of which consists of one or more software subsystems. This is illustrated below:

SOFTWARE SYSTEM D

{ // This line is a comment

SUBSYSTEM A {?}

SUBSYSTEM B {?}

SUBSYSTEM C {?}

} //end of software system D

SOFTWARE SYSTEM Z

{ // This line is a comment

SUBSYSTEM A {?}

SUBSYSTEM B {?}

SUBSYSTEM E {?}

} //end of software system Z

Note that a comment begins with "//" and extends to the end of a line.

Qualified references: D:A, D:B, and D:C denote the subsystems of software system D. This is distinguished from the subsystems of Z, Z:A, Z:B and Z:E.

Software Subsystem Specifications

A software subsystem has a priority, has a set of dynamic real-time path definitions, and has a set of application program definitions. A sample subsystem specification is shown below:

SUBSYSTEM B {

Priority 2;

PATH Sensing {?}

PATH Action_Initiation{?}

PATH Action_Guidance{?}

PATH Display_Console{?}

Application sensor{?}

Application filter{?}

Application FM{?}

Application EDM{?}

Application ED{?}

} //end subsystem B

Dynamic Real-time Path Specifications

The definition of a path includes a set of constituent applications, various path attributes, QoS requirements and data/event stream definitions (see example below). The attributes of a path include priority, type, and importance. Path type, which defines the execution behavior of the path, is either continuous, transient, or quasi_continuous. The importance attribute (a string) is interpreted as the name of a dynamically linked library procedure that may be passed arguments such as priority and the current time, and returns an integer value that represents the importance of the path.

PATH Sensing {

Connectivity{?}

Type?;

Priority?;

Importance?;

RealTimeQoS{?}

Scalability{}

DATASTREAM{}

} // end PATH sensing

Path Connectivity Graphs

The connectivity specification represents the communication relationships among applications in a path. These relationships form a graph, which is specified as a set of order application pairs. The sample specification (below) indicates that application D:B:sensor sends data to application D:B:FM and that application D:B:FM sends data to application D:B:filter. Note that the names of applications are fully qualified, as system:subsystem:application (e.g., D:B:FM).

Connectivity {

(D:B:sensor, D:B:FM);

(D:B:FM, D:B:filter);

(D:B:filter, D:B:EDM);

(D:B:EDM, D:B:ED);

} //connectivity

Real-time QoS

RealTimeQoS {

SimpleDeadline 1.0; //cycle deadline

BatchLatency 5.0;

BatchInterArrival 5.0;

MaxSlack 80; //maximum deadline slack/cycle

MinSlack 20; //minimum deadline slack/cycle

SlidingWindowSize 20; //#cycles to monitor real-time QoS

Violations 15; //max. #QoS violations w/in window

} //real-time QoS

As seen in the above example, a real-time QoS specification includes timing constraints such as simple deadlines, inter-processing times, throughputs, and super-period deadlines. A simple deadline is defined as the maximum end-to-end path latency during a cycle of a continuous or quasi-continuous path, or during an activation of a transient. Inter-processing time is defined as a maximum allowable time between processing of a particular element of a continuous or quasi-continuous path?s data stream in successive cycles. The throughput requirement is defined as the minimum number of data items that the path must process during a unit period of time. A super-period deadline is defined as the maximum allowed latency for all cycles of a quasi-continuous path. A super-period deadline is specified as the name of a dynamically linked library procedure that is called dynamically to determine the estimated super-period deadline. Each timing constraint specification may also include items that relate to the dynamic monitoring of the constraint. These include minimum and maximum slack values (that must be maintained at run-time), the size of a moving window of measured samples that should be observed, and the maximum tolerable number of violations (within the window).

Scalability

The language and model considers the scalability of the end-to-end paths and their application program constituents. Some paths permit replication of their constituent applications to scale to dynamic data stream or event stream loads. If a scalable path is unable to meet its real-time requirements, one or more of its constituent applications may be replicated. Similarly, if a path is exceeding its real-time requirements by a large margin, one or more of its replicas may be removed. The scalability specification contains a flag that is TRUE or FALSE, indicating if the path is scalable. Also specified is the PathSettlingTime, which indicates the amount of time that must be allowed between successive "scalings" of the path. Below is an example of a scalability specification.

Scalability {

Scalable TRUE; //path has scalable components

PathSettlingTime 40.00; // secs between reconfigurations

} //scalability

Datastream Specification

DATASTREAM {

Type Dynamic; // Deterministic, Stochastic, or Dynamic

SlackQoS 400; // Additional number of data items that the

// continuous/quasicontinous path should be

// able to handle at any given time

} //datastream

A datastream specification appears above. The stream type can be deterministic, stochastic, or dynamic. The data stream size or event arrival rate of a deterministic stream is a constant (scalar or interval). A stochastic stream has a data stream size or an event arrival rate that is characterized by a probability distribution function. The distribution is described as a string containing the name of a distribution and its parameters, or is described as the name of a data file containing a data set that characterizes the stream?s behavior. The data stream size or event arrival rate of a dynamic stream is not described in the specification, since it must be observed at run time.

The SlackQoS specification for a datastream indicates an amount of additional data stream elements that the path should be able to process in a timely manner. The resource allocator should consider this quantity when assessing possible allocations.

Application Specifications

APPLICATION ED {

TimeDelay 3 //delay this long after initial startup

Automatic TRUE; //automatically start this app.

Console TRUE; //Start this App in an XTerm

Display "virginia"; //Where to place the display for this App

Memory 5; //Min amount of RAM needed

STARTUP{?}

SHUTDOWN{?}

DEPENDENCY{?}

RestartDelay 10;

SurvivabilityQoS {?}

ScalabilityQoS {?}

Initial Profile {...}

} //end application ED

An application is an executable image that may be started as an autonomous process on a host. Ass seen in the above example specification, the first two attributes of an application are related. Automatic indicates whether the application can be started by resource management software, or whether it must be started manually. TimeDelay indicates the amount of time that must elapse from the startup of the application, until another application may be started (this is sometimes needed to insure proper initialization of cooperating applications). Application attributes also include all information necessary to startup and shutdown applications (not elaborated in this paper). Console and Display are attributes pertaining to the graphical capabilities required by the application. Memory indicates the minimum amount of memory that the process requires to execute. The startup block and the shutdown block describe how to automatically start and stop the application, and the dependency block indicates any dependencies the application may have with the startup and/or shutdown of other applications (e.g., it may be required that a particular application be started before another application can be started). RestartDelay indicates the amount of time that must elapse before a resource reallocation action should be considered for the application, after restarting the application (this delay allows the application to perform its initialization process). The SurvivabilityQoS and ScalabilityQoS blocks indicate if and how survivability and scalability services are to be provided to the application. Inital Profile contains the profiling information about the application that is used for initial allocation.

Application Startup Information

An application startup block contains all the information necessary to (automatically or manually) start an application. This information includes required hardware (host) type(s) and operating system type(s) and version(s) (see example below). This may be further constrained by an optional list of the names of hosts that can run the application. The startup information also includes the directory to the application?s executable, the name of the executable, and an ordered list of arguments that must be passed on the command line when the application is started.

STARTUP {

//Type of h/w required; ie SGI, SUN, HP, DEC, PC

Type "SUN_SPARC_5";

Type "SUN_Ultra_1";

//Name of reqd. operating system; ie IRIX, SOLARIS, ...

OS "SUN_Solaris";

Version "SUN_Solaris_2.1.5"; //Version(s) of OS

Version "SUN_Solaris_2.1.6";

//Working directory; ie. location of binaries, etc...

Directory "./";

//name of executable, or startup script

Execute "ED";

//Ordered list of command line arguments

Arg "ED";

Arg "1";

Arg "texas";

Arg "7300";

Arg "EDM";

Arg "D:B:Sensing";

//list of hosts that can run this app

Host virginia;

Host nujersy;

Host texas;

Host desidrta;

} //end startup

Application Shutdown Information

An application shutdown block indicates the command(s) to be used for termination of the application. A shutdown command may be a Unix signal or may be a shell script. A sample shutdown block is shown below.

SHUTDOWN {

//Kill signal to use on process list if no script is available

Signal "-9";

} //end shutdown block

Inter-application Dependencies

DEPENDENCY {

//Type of this Dependency (STARTUP, SHUTDOWN)

Type STARTUP;

//Full unique name of the application dependency

Name "D:B:EDM";

//Time (secs) the specified app must be up

Delay 10 //secs;

}

A dependency block describes a temporal relationship between applications (see above example). The relationship pertains to startup and/or shutdown, and indicates the type of the dependency (startup or shutdown), the name of the program with which the dependency exists, and the time value associated with the relationship; the time value indicates the duration which must elapse between start or stop of the named application and the start or stop of the application which has the dependency block in its specification.

Application Suvivability QoS

SurvivabilityQoS {

Survivable TRUE; //application is survivable

MinCopies 1; //min # of replicas of this application

SameHost FALSE; //restart on same host?

} //survivabilityQoS

As shown in the above example, a survivability QoS specification includes a boolean variable that indicates (1) whether the application should be managed to ensure survivability and (2) the minimum required level of redundancy. Note that replicating an application entails replicating all of the applications that make up the path.

Application Scalability

Scalability {

Scalable TRUE; //is app scalable?

Combining FALSE; //doesn?t combine inputs

Splitting FALSE; //app does not divide outputs

} //scalability

The scalability specification for an application indicates if an application can be scaled via replication (see example above). Scalable applications are programmed to exploit load sharing among replicas, and can adapt dynamically to varying numbers of replicas. The specification also indicates whether an application combines its input stream (which may be received from different predecessor applications and/or devices), and splits its output stream (which may be distributed to different successor applications and/or devices) are also specified. "Combining" and "splitting" are commonly called "forking" and "joining" in parallel computing paradigms.

Application Profiling

Initial Profile{

{

Data Stream Size = 1000 //data stream size when profiled

Host = nujersy // host where the app was profiled

Cpu Usage = 0.506595% // cpu usage

}

The Initial Profile block provides information to RM for making intial allocation decision.

Hardware System Specifications

A hardware system specification construct allows the description of one or more hardware subsystems (see example below). Each hardware subsystem consists of one or more hosts. A host specification describes the host?s name, type, operating system version, speed, RAM capacity, CPU quantity, Spec Ratings, and default network.

HARDWARE SYSTEM D_H {

SUBSYSTEM B_H {

HOST texas {

Type "SUN_Ultra_1";

OS "SUN_Solaris";

Version "SUN_Solaris_2.1.6";

Speed 300; //MHz

Memory 64; //MB

NumCPUs 1;

Threshold 0.1 ;

SPECint95 3.53 ;

SPECfp95 3.0 ;

Default-Network D_N:B_N:NH_250_Ethernet;

}

HOST desidrta {?}

} //end SUBSYSTEM B_H

SUBSYSTEM C_H {?}

SUBSYSTEM A_H {?}

} //end HARDWARE SYSTEM D_H

Network System Specifications

NETWORK SYSTEM D_N {

SUBSYSTEM B_N {

LAN NH_250_Ethernet { Bandwidth 10; }

IC NH_250_Switch { Network D_N:B_N:NH_250_Ethernet; }

IC NH_250_Hub { Network D_N:B_N:NH_250_Ethernet; }

} //end SUBSYSTEM B_N

} //end NETWORK SYSTEM D_N

Installation

System Requirements

A Sun workstation or a PC.
Solaris 2.5.1, Windows NT.

Setup

See instructions in DeSiDeRaTa user manual.

Compiling a Specification

The compiler is run automatically when the DeSiDeRaTa middleware executes. Additionally, the compiler may be run manually. Use of the compiler in both modes is described in the following man page entry.

NAME

parser - Parser for DeSiDeRaTa Specification Language.

SYNOPSIS

parser <spec-file-name>

OPERANDS

spec-file-name : Name of the spec file that is to be parsed.

EXAMPLE

Parser DeSiDeRaTa.spec

DESCRIPTION

After editing a specification file, save the file with extension .spec (e.g., DeSiDeRaTa.spec). Move the spec file to desiderata/specfiles directory. When the DeSiDeRaTa middleware is executed, the parser is automatically run. To enable this, set the Unix environment variable 'RMSPECDIR' to the specfiles directory in following windows: RM, QoS-Manager and Program-Control (the command to set the environment variable is: setenv RMSPECDIR ../specfiles).

Important: Only files in the specfiles directory with extension .spec will be parsed by the compiler.

RUNNING THE COMPILER AS A STANDALONE APPLICATION: If it is required to test the compiler, move the spec file to desiderata/srcs/spec-lang directory and type 'parser <spec-file-name>'.