data
system that stores & processes
infrastructure to transport & access
support systems (power, AC)
unscheduled downtime: primary concern
scheduled downtime: maintenance, upgrades, backups
peek usage delays
Data availability: file system should eliminate need for maintenance (defrag, reorganization, expansion). No single point of failure in HW. disk arrays, hot-swappable. SANS (storage area network). online backup
System availability: redundant, hot-swappable HW. Hotsite: duplicate of system available at other site to be used after disaster
Appliation availability: replication or clustering that detects outage and restarts app.
Infrastructure: power: UPS & backup generator
network: redundacy of suppliers
Models/strategies: avoid or survive things that cause outages or to
rapidly recover from an outage
resistant: keep failures from happening.
over-engineering. fault-resistant systems.
resilient: detect & correct errors before they become
failures. fault-tolerant systems that tolerate faults, report them,
repair them if possible and continue as best as can.
redundant: eliminate single points of failure
replaceable: hot-pluggable: replace damaged parts without stopping the
system. deconfigure it first. hot-swappable: no configuration
needed.
restartable: apps return to state it was in when stopped
recoverable: apps return to state it was in when stopped
Availability classifcation of each resource to determine security. HRG AEC
Outage: delays, loss of productivity, sales opportunites
decisions made without info.
peek utilization outage: some request unfulfilled or server failure.