On the Concepts and Derivation of Reliability in Stochastic Systems with States of Reduced Effiency

Ilkka Virtanen

General overview

The objectives of the study are in two directions: 1) the general problem of extending the concepts of reliability, and 2) the formulation, solution and use of the reliability model for a particular stochastic system with states of reduced efficiency. Due to the complexity of the system and the general nature of the assumptions regarding the random variables of the reliability model we also attain new results in the development and application of the mathematical methods.

1) The extension of the concepts of reliability

When we consider a system which is not only in one of the two states: up-state (the system is failure-free and thus capable of full performance) or down-state (the system is totally inoperable and under repair), but may also perform its function at one or more levels of reduced efficiency, the conventional concepts of reliability are found to be unsuitable and inadequate: the reliability of the system remains unresolved (there exist situations when the system is neither fully operable nor fully inoperable, so that reliability cannot be determined at all), or it gets a value which contradicts empirical observation (if operation with reduced efficiency is regarded as normal operation, too high reliability is obtained; if a reduction in efficiency is regarded as total inoperability, too low reliability is obtained).

Our first objective is to extend the concepts of reliability in order to make it possible to determine also the reliability of systems with states of reduced efficiency. In making the extension, care must be taken that it is done in a theoretically wellfounded and empirically adequate way. Furthermore there must be no violation of the traditional concepts. These conditions will be fulfilled, when we set for the new concepts the following requirements:

1. Failures having a limiting effect on the efficiency of the system are referred to factors which decrease the reliability of the system, but do this to an extent less than the decrease in reliability caused by a failure resulting in total system inoperability. Further, the degree of reliability decrease is dependent on the degree of reduction in efficiency: the more serious the consequences of the failure, the greater the decrease in reliability of the system.

2. When the new, more comprehensive concepts of reliability are applied to general systems with many levels of performance, we get empirical interpretations analogical to those which result when the conventional concepts of reliability are applied to ordinary two-stage, operable or inoperable systems.

3. When a two-stage, operable or inoperable system is under consideration the new concepts are in agreement with the traditional concepts of reliability.

4. The mathematical definition of the new concepts remains within the limits of the general mathematical definition of reliability (Gnedenko et al. 1969)

5. The numerical value of reliability can be determined directly from the behaviour of the system, i.e. from the state probabilities of the system.

This conceptual analysis of reliability will be carried out in Chapter 3. Explicitly we carry out the extension of the concepts only for the quantitative characteristics of reliability. We give the general principles of the extension procedure and derive in detail new, more comprehensive reliability characteristics corresponding to the characteristics 'availability', 'reliability' and 'mean time to system failure' of traditional reliability. In the course of the derivation we show that the new characteristics are theoretically well-founded and empirically adequate; the five requirements, and more generally, the objectives laid down for the extension procedure are thereby met.

2) Formulation and use of the reliability model for the reliability analysis of a stochastic system with states of reduced efficiency

The second main objective in the study is to determine and analyze the reliability of a stochastic system, which besides the modes of normal operation and total failure also possesses the property of operation at several different levels of performance (i.e. with reduced efficiency). The system has three operation modes: "normal operation", "operation with reduced efficiency", and "non- operation". We have chosen it as a general representative of systems with states of reduced efficiency. We have tried especially to include in our system the typical main features of a processing factory. In our system these main features have been described by means of the following four types of components:

(i) the ordinary two-stage operable or inoperable components; the failure of any one of the components renders the whole system inoperable (subsystem S1)

(ii) the functionally multi-stage components, the failure of any one of which makes the component (and the whole system) operate with reduced efficiency; the degree of reduction in efficiency depends on which component has failed (subsystem S2)

(iii) the ordinary two-stage operable or inoperable components in parallel redundancy; the system fails only when all the redundant components have failed (subsystem S3)

(iv) the ordinary two-stage operable or inoperable components in the subsystem formed by independent, parallel branches; the failure of one or more of the components (branches) makes the system operate with reduced efficiency, the degree of reduction in efficiency depending on the number of simultaneous failures among the components (subsystem S4)

The total inoperability of the subsystem or its operability at a level of reduced performance is a consequence of one or more failures among the components of the subsystem. At any time, the subsystem functioning at the lowest level of performance determines the performance level of the whole system, the subsystems being connected in series. Due to the combinations of the performance levels of the subsystems, the system has a great number of possible levels of performance, ranging from normal operability thought different degrees of reduced efficiency to total inoperability.

The system is assumed to be maintained by a single repair facility so that only one failure can be repaired at a time. Because there may be several failures among the components at the same time and there is only one repair facility, the failed components must sometimes queue for repair. In the handling of this queue we assume that the preemptive repeat repair discipline is followed. Under this repair policy, different repair priorities are assigned to different components and different types of failures, and the repairs are carried out according to these priorities.

The components of the system are assumed to fail with constant failure rates, i.e. the failure times are governed by exponential distributions. The repair times of the components have general distributions, i.e. the repair rates of the components are allowed to be wholly arbitrary functions of time (some regularity conditions must be met, however). Both failure and repair time distributions are peculiar to individual components.

We can now point to the following contributions concerning the structure and properties of the system under study:

1. The system contains a unit (the subsystem with parallel branches) of a type not considered earlier in mathematical reliability literature.

2. The system, consisting of four different types of subsystems with a general number of components in each subsystem, is the largest and most general theoretical system, the reliability of which has been analyzed in the dynamic form.

3. The inclusion in the system of a new type of subsystem and the complexity of the system itself are not only theoretically interesting but also empirically relevant. For all the subsystems there exist clear counterparts in reality among production systems, for example and especially in processing factories.

For the reliability analysis of the system we construct a mathematical model. The formulation of the model starts with the definition of the states for the system. Because the state (at time t) is an exact description of the circumstances prevailing in the system at that time, the behaviour of the system with the passage of time may be found by determining the state probabilities of the system. Due to the general repair time distributions the system is not Markovian. However, by the inclusion of the supplementary variables we provide a complete Markovian characterization of the system. After the inclusion of the supplementary variables we can set up the model. It gets the form of partial differential - difference equations with variable coefficients. The solution of the model is derived by the application of Laplace transforms and discrete transforms. Both the time-dependent (transient state) and steady state solution are considered. With general repair time distributions, the transient state solution of the model stops at the Laplace transforms of the state probabilities (which, with given repair distributions, we may invert to give the state probabilities). Under the steady state on the other hand, the use of the limit properties of Laplace transforms leads us straight to the state probabilities proper.

In the reliability analysis of the system we link the two main objectives of the study together. The reliability analysis of the system is carried out within the framework of the new extended reliability concepts. The characteristics of this extended reliability are now derived on the basis of the solution of the model, on the basis of the state probabilities, either directly (the generalized availability characteristics) or after some modifications in the original model ( the generalized reliability and mean time to system failure characteristics).

3) Development and wider application of the methods

Multi-component repairable systems with general failure and/or repair time distributions are always difficult to handle mathematically. Renewal theory and the Markov process approach with the inclusion of supplementary variables are examples of the probability tools, with the help of which the reliability analysis of this type of complex system has turned out to be successful. We use the latter approach in this study.

Due to the general repair time distributions in all of the components, the system is not Markovian. But we can characterize the system as a Markov system by employing a set of variables, the supplementary variables, with the help of which a part of the system's history (the time the component under repair has already been being repaired) is included in the state definition of the system. The supplementary variable technique proves to be very efficient also in the complex system under study, in the system of four subsystems with a general number of components in each. The dynamic model for the behaviour of the system can be set up. It gets the form of a set of differential - difference equations with respective boundary and initial conditions. The equations have variable coefficients.

As a consequence of the use of supplementary variables, the equations become partial differential equations in the two time variables. After using the Laplace transforms the equations become algebraic in one variable and remain differential equations in other variable. The equations thus become easier to solve in the Laplace transforms domain than in the original time domain. At the same time the equations are, however, difference equations in two (state index) variables. The usual technique for solving difference equations is to employ generating functions (z-transforms). But because of the variable coefficients in the equations, the transformed equations would now become partial differential equations also in the transform variables.

These twice-transformed equations (Laplace transforms and generating functions) with variable coefficients would then not be much easier to solve than the original ones.

Because the use of generating functions in order to solve the model turned out be troublesome or even impossible, we had to find some other way. The method of discrete transforms was the tool that led to the desired result. By using discrete transforms we can transform a discrete set of numbers (or functions) to another discrete set of numbers (or functions). Because in the transforms only multiplying by binomial coefficients and summation are used, the inverse transforms for the discrete transforms are easy to find (whereas the derivation of inverse transforms for Laplace transforms and generating functions may become very problematic). In our model the Laplace-transformed equations, which are differential equations in one variable and difference equations in two variables, become after application of transforms and integration (to get rid of the derivatives) algebraic equations, even linear in the unknown functions. There are not, of course, any difficulties of principle in solving such linear equations. This result, that the discrete transforms lead to usual linear equations, is unknown to reliability literature. In the earlier applications of discrete transforms, different ad hoc -methods have been used for solving the transformed equations.

(Doctoral thesis. Publications of the Institute for Applied Mathematics, University of Turku, No. 10, 1977, 113 p.)