Disaster Recovery Planning and Data Backup for Information Systems and Services

Standard number: DS-12
Date issued: 7/15/18
Date last reviewed: 7/15/18
Version: 1.0
Approval authority: Vice President for Information Technology and CIO
Responsible office: Information Assurance

This Standard supports and supplements Information Security (SPG 601.27). It will be periodically reviewed and updated as necessary to meet emerging threats, changes in legal and regulatory requirements, and technological advances. This Standard replaces Responsibility for Maintaining Information Technology Backup and Recovery Procedures (SPG 601.07-1) which was decommissioned as of August 1, 2018.

I. Overview

In order to facilitate the recovery and restoration of university IT systems that support critical business functions, units shall engage in disaster recovery planning efforts.

Disaster recovery planning is the ongoing process of developing, implementing, and testing disaster recovery management procedures and processes to ensure the efficient and effective resumption of critical functions in the event of an unscheduled interruption, irrespective of the source of the interruption.

Engaging in disaster recovery planning ensures that system dependencies have been identified and accounted for when developing the order of recovery, establishing recovery time and recovery point objectives, and documenting the roles of supporting personnel.

In addition, data backup is an integral component of disaster recovery planning. Data backup protects against the loss of data in the event of a physical disaster, database corruption, error propagation in resilient systems, hardware or software failure, or other incident which may lead to the loss of data. The backup requirements found in this Standard will allow university business processes, teaching and learning activities, research projects, and clinical operations to be resumed in a reasonable amount of time, based on criticality, with minimal loss of data.

II. Scope

This Disaster Recovery Standard applies to the entire university, including the Ann Arbor campus, Michigan Medicine, UM-Dearborn, UM-Flint, and all affiliates. It further applies to:

  • Critical core IT infrastructure and other services which facilitate the transport, authentication and security of systems and data. Critical core infrastructure is defined as components which, when they experience degradation or failure, compromise all other services (e.g., data centers, identity and access management, network, firewall, DNS, Active Directory).
  • Information technology systems that process or store mission critical data managed by, or on behalf of, the University of Michigan, as determined by the unit that maintains the system; this specifically excludes desktop devices and workstations which do not require disaster recovery plans but may require data backup.
  • The processes, policies and procedures related to preparing for recovery or continuation of technology infrastructure, systems and applications which are vital to an organization after a disaster or outage.

Each campus unit that maintains or is responsible for a mission critical system or service must have a disaster recovery (DR) plan that documents the critical recovery functions and tasks that can be executed to enable mission critical system recovery following a significant event or disaster.

III. Roles and Responsibilities

  • Information Assurance (IA)
    • Maintains and publishes U-M disaster recovery planning templates and processes.
  • Units or research projects that maintain information technology systems (system or business owner)
    • Identify mission critical systems.
    • Maintain adequate infrastructure resiliency and data backup and restoration processes for mission critical data and the IT systems assigned to them.
    • Develop, implement, document, maintain, and test disaster recovery plans.
    • Update the status of their DR planning to IA every two years.
  • Unit IT Leader and/or Security Unit Liaison
    • Coordinate unit activities to satisfactorily implement or complete above unit responsibilities.
    • Work with unit IT to review unit DR plans at least annually or whenever significant system architecture or personnel changes occur.
    • Brief unit leadership on status of DR efforts and resources needs.
  • U-M Unit or Executive Leadership (Deans, Directors, U-M Office of Research)
    • Ensures that sufficient financial, personnel, and other resources are available as needed for the successful creation and ongoing maintenance of unit DR plans.

IV. Definitions

Mission Critical: Mission critical IT systems and applications provide essential IT functions and access to data and whose unavailability will have an immediate and significant detrimental effect on the university and campus units if the system fails or is interrupted. A system or application may be designated mission critical if it meets one or more of the following conditions:

  1. Risk to human and research-animal life or safety.
  2. Significant impact on the University’s research, learning and teaching, administrative, and healthcare missions.
  3. Significant legal, regulatory or financial costs.
  4. Serious impediment to a campus unit carrying out its critical business functions within the first 48 hours following an event (48 hour Recovery Time Objective – RTO).
  5. Loss of access to data with defined availability requirements.

Loss of particular systems or applications may be originally assessed as not mission-critical, but may become more critical after an extended period of unavailability.

Service Tier Criticality Levels:

  • Platinum: Services and systems that have the highest requirement for availability, the shortest required recovery time and the quickest required incident response time.
  • Gold: Services and systems that have a high availability requirement, fast recovery time, and fast incident response time.
  • Silver: Services and systems that have a moderate availability requirement, can take some time to recover, and moderate incident response time.
  • Bronze: Services and systems that have the lowest availability requirement, will accept data loss up to entirely, and a very drawn out incident response time.

Critical Business Functions: Critical operational and/or business support functions that cannot be interrupted or unavailable for more than a mandated or predetermined timeframe without significantly jeopardizing U-M operations.

Recovery Time Objective (RTO): The duration of time within which a business process must be restored and a stated service level achieved following a disruption in order to avoid unacceptable consequences associated with a break in service.

Recovery Point Objective (RPO): The maximum tolerable period in which data might be lost from an IT system or service due to a major incident. RTO and RPO timeframes for each criticality level are listed in Table 1 below.

Disaster Recovery Planning: The process, policies and procedures related to preparing for recovery or continuation of technology infrastructure, systems and applications which are vital to an organization after a disaster or outage.

Business Continuity Planning: Business continuity planning, as opposed to disaster recovery planning, is the process of developing detailed plans, processes, and strategies that will enable an organization to respond to an event in such a manner that critical business functions can continue within planned levels of disruption and fully recover as quickly as possible. At U-M, this function is managed by the U-M Division of Public Safety and Security (DPSS).

V. Standard

The following are the core components required of all U-M information technology disaster plans:

  • Critical Systems: All units and research programs that maintain critical information technology systems will develop, implement, and regularly test (exercise) disaster recovery plans for those systems;
  • Disaster Recovery Plan Template: Disaster recovery plans should follow the general content and guidelines identified in the U-M Disaster Recovery Plan Template.
  • Disaster Recovery Review/Plan Testing: Disaster recovery plans must be reviewed annually and updated whenever a significant change to system architecture, system dependencies or recovery personnel occurs, At a minimum, an annual tabletop exercise or equivalent should be conducted that simulates the abrupt and unscheduled loss of critical functions.
  • New System Evaluation: New applications or systems will be evaluated; systems determined to be critical require a disaster recovery plan to be documented and tested prior to go-live;
  • Risk Assessment: Environments designated as mission critical must have a RECON (see Information Security Risk Management Standard, DS-12) performed at least every four years or in accordance with the regulatory requirements of the system. Disaster recovery plans need to include mitigation of potential negative impacts to the mission critical system. Data Backup: Backups are the result of copying or archiving files for the purpose of restoring them to a specific point-in-time or in the event of data loss resulting from computer viruses, hardware failures, file corruption, accidental or intentional destruction, etc. Backups preserve data integrity in the event of data corruption or other loss of the primary copy.
  • Plan Availability: Plans must be accessible and available, independent of availability of U-M IT systems.

Table 1. Disaster Recovery Performance Objectives by Service Tier Criticality Level

Level RPO RTO Performance Objective
Platinum No data loss except data in transit 4 hours Best possible performance, required robust real-time transaction speed monitoring
Gold 0–24 hours 24–48 hours Better performance, some transaction monitoring
Silver 1–7 days 7–30 days No performance targets, not monitored
Bronze 1 month or risk of entire loss 1 month or non-recoverable Economy performance, not monitored

Data Backup Requirements

Data backup and restoration should include a documented process for recovery, accounting for data dependencies or relationships where data from multiple systems must be in sync or share common data elements. The method and media for data backups should allow meeting RTO and RPO requirements for restoration.

In addition to system criticality requirements, data backups are:

  • Required for all mission critical systems and for any system or machine that creates, processes, maintains, or stores data classified as Restricted or High.
  • Recommended for Moderate data, and for data that cannot be recreated in a timeframe satisfactory to the owner.
  • Optional for all other systems or data.

System resiliency is a desirable objective, but is not a substitute for, and does not negate the necessity to perform, data backups and have a disaster recovery plan.

Data intended to be temporary in nature, i.e., work or scratch files, which can readily be recreated from source data in a timely manner, may be excluded from backup requirements provided that the original source data is backed up, regardless of whether the files contain any data classified as Restricted, High, or Moderate. However, those data must still be properly secured until the temporary files are deleted.

It is the responsibility of U-M units, research programs, and individual faculty, staff, and workforce members to:

  • Identify primary responsibility within the unit or research program for data backup; appropriate roles and responsibilities must be defined for data backup and restoration to ensure timeliness and accountability.
  • Classify institutional data based on U-M data classifications, and determine the backup method best suited to their classification level (see Table 2 below).
  • Ensure that backups containing data classified as Restricted and High are encrypted both in transit and at rest; it is recommended that Moderate data are also encrypted.
  • All primary backups of data required to be backed up must be to U-M owned and managed devices or servers, not a personally owned device.

Table 2: Data Backup Requirements Based on Data and RTO Classification

The following table should be used to determine disaster recovery and backup requirements for systems or machines that create, process, maintain, or store Restricted, High, or Moderate data and for mission critical systems irrespective of data classification. Where data can be classified into more than one of the categories listed below (data classification level or RTO classification/criticality level), the classification with the most stringent data backup requirements must be met.

Data Classification Data Backup Data Backup Encryption Disaster Recovery Plan Requirements
Restricted Required Required – At rest/in transit Dependent on Recovery Time Classification
High Required Required – At rest/in transit Dependent on Recovery Time Classification
Moderate Required Recommended Dependent on Recovery Time Classification
Low Recommended Optional Dependent on Recovery Time Classification
Recovery Time Classification Data Backup Data Backup Encryption Disaster Recovery Plan Requirements
Platinum Required Dependent on Data Classification Required
Gold Required Dependent on Data Classification Required
Silver Recommended Dependent on Data Classification Recommended
Bronze Recommended Dependent on Data Classification Recommended

Third Party Vendors

It is the responsibility of system or business owners and U-M Procurement Services to ensure that contracts with U-M vendors that maintain, protect or provide access to U-M mission critical or Restricted or High data—whether on-premises or cloud-based—include disaster recovery and data backup Service Level Agreements.

VI. Violations and Sanctions

Discipline (SPG 201.12) provides for staff member disciplinary procedures and sanctions. Violations of this policy by faculty may result in appropriate sanction or disciplinary action consistent with applicable university procedures. If dismissal or demotion of qualified faculty is proposed, the matter will be addressed in accordance with the procedures set forth in Regents Bylaw 5.09. In addition to U-M disciplinary actions, individuals may be personally subject to criminal or civil prosecution and sanctions if they engage in unlawful behavior related to applicable federal and state laws.

Any U-M department or unit found to have violated this Standard may be held accountable for the financial penalties, legal fees, and other remediation costs associated with a resulting information security incident and other regulatory non-compliance.

VII. Implementation

Information Assurance is responsible for the implementation, maintenance and interpretation of this Standard.

VIII. References

IX. Related NIST Controls

  • NIST SP 800-34 Revision 1 – Contingency Planning Guide for Federal Information Systems
  • NIST SP 800-53 Revision 4
    • CP-01 Contingency Planning Policy and Procedures
    • CP-02 Contingency Plan
    • CP-03 Contingency Training
    • CP-04 Contingency Plan Test
    • CP-07 Alternate Processing Site
    • CP-08 Telecommunications Services
    • CP-09 Information System Backup
    • CP-10 Information System Recovery and Reconstitution