Fault-Tolerant Design of Computer Systems
Five day CPD course for managers, developers and corporate IT specialists
Course outline
- Day 1: Introduction to computer dependability; redundant design and reliability modelling
- Day 2: Organisation of fault tolerance; error detection, confinement, and recovery
- Day 3: Modular redundancy; fault tolerance in distributed systems
- Day 4: Fault tolerance for software and design faults; fault tolerance and human error
- Day 5: Examples of fault-tolerant systems; practical research to inform decisions; decisions in design, procurement and deployment of fault-tolerant systems
The timetable includes ample time for class discussions and group problem sessions. The presentation of the material will emphasise examples in practical contexts
Course dates in 2010
-
1-5 March 2010; 09.30 - 17.30 (Each day starts at 09:00 with coffee)
Venue: C300, Tait Building, Northampton Square, City University London
About the course
Fault tolerance - design for surviving component failures - is becoming a necessity for a growing number of companies, far beyond its traditional application areas, like aerospace and telecommunications. Companies place increasing reliance on computer systems for the very survival of their business; computer applications become ever more complex, yet they are often built from unreliable components, hardware or software.
This course, which is organised as five one-day lectures that can be taken individually, addresses the needs of:
- IT and engineering managers who have to address new needs for dependability of their (or their customers') computer applications
- Software designers or system integrators who want an introduction to the problems found in designing for fault tolerance and to the range of design solutions
The course is prepared and taught by the Centre for Software Reliability (CSR), at City University London, which is recognised internationally as a centre for research excellence. The course leader is Prof Lorenzo Strigini, who has 25 years' experience in research in fault tolerance in hardware, software and human-machine systems including consulting and teaching industrial courses. Other course presenters involved are Prof Peter Popov and Dr Andrey Povyakalo, both renowned experts in dependability with academic and industrial experience.
How you benefit
Computer failures can have crippling effects on an organisation's ability to function. Any company, not just software-related businesses, can become bankrupt as a result of computer failure. And yet increasingly, business-critial computing systems are being assembled from off-the-shelf components never designed for high reliability, availability, or safety.
This course offers a unique opportunity for engineering managers and software designers to learn about fault tolerance - about systems surviving failure. It is about maintaining systems despite the failure of some of their parts. In other words, without uncontrolled disruption of service. This is not rocket science; if you know the basic principles, you can apply them to everyday design and purchasing decisions.
Participants will learn the basic concepts necessary for decisions about the form and extent of redundancy to be employed during the design or procurement of computer systems. These concepts have been developed by researchers during the whole history of computing, but their application has been mostly limited to safety-critical and other high-risk, high-budget applications. By contrast, this course will consider the range of techniques available to organisations with different dependability requirements and budgets for fault tolerance. We will cover the integration of automatic and manual procedures, and will specifically address software-caused and operator-caused failures. The course will thus satisfy the needs of companies that have to decide between market offerings of fault-tolerant commercial products, and/or the need to integrate a fault-tolerant system out of non- fault-tolerant products.
On completion of the course, participants will:
- understand the risk of computer failures and their peculiarities compared with other equipment failures;
- know the different advantages and limits of fault avoidance and fault tolerance techniques;
- be aware of the threat from software defects and human operator error as well as from hardware failures;
- understand the basics of redundant design;
- know the different forms of redundancy and their applicability to different classes of dependability requirements;
- be able to choose among commercial platforms (fault-tolerant or non fault-tolerant) on the basis of dependability requirements;
- be able to specify the use of fault tolerance in the design of application software;
- understand the relevant factors in evaluating alternative system designs for a specific set of requirements;
- be aware of the subtle failure modes of "fault-tolerant" distributed systems, and the existing techniques for guarding against them
- understand cost-dependability trade-offs and the limits of computer system dependability
Cost
£1380 the first delegate
Discounts are available for additional delegates from the same organisation
How to book
To book your place on the course please complete and return the Course Booking Form (Word document).
Further information
For more information, please contact us at:Tel: 020 7040 8423
Fax: 020 7040 8585
Email: enquiries@csr.city.ac.uk
