A simple downtime test for Chicago healthcare teams who want fewer surprises.
Nobody plans to have a bad IT day. But healthcare does not need a full outage for things to feel broken. One system slows down, a login loop starts, or the EHR flips into a limited mode and suddenly your day is running on sticky notes and memory.
If you lead a clinic or care organization in Chicago, you already know the hard part is not the technology. It is the moment the technology stops supporting the work and your team has to improvise while patients are waiting.
That is why a downtime drill is worth doing even when everything seems fine.
Not a giant disaster exercise. Not a binder no one reads. A short, practical test that answers one question:
If our environment went read-only right now, what would we do in the first hour?
This article lays out a simple way to run that drill, learn what you need to fix, and reduce the everyday workarounds that quietly increase risk.
The clinical friction tax shows up fast during downtime
When systems slow down, teams do not stop working. They route around the problem.
That is normal. It is also where risk and frustration grow.
In healthcare, the friction tax looks like:
- Check-in lines building because staff cannot confirm appointments quickly
- Clinicians waiting on screens to load, then documenting later from memory
- Referrals and prior auth work stalling because documents are scattered
- Patient questions piling up because the portal workflow is inconsistent
- Teams sending attachments or screenshots because it is the fastest option
Those choices are understandable. They are predictable. And they are the exact moments a drill will expose, without waiting for a real incident to teach the lesson.
The goal is not perfection. The goal is a safer default that still feels easy when the day gets busy.
Why “read-only” is the most useful scenario to practice
Full outages happen, but they are not the only threat. Many organizations experience partial outages more often:
- The EHR is up, but critical integrations are down
- Shared files are accessible, but permission checks fail
- Email works, but someone’s account is compromised and you need to contain it
- Internet access is unstable across a site
- A vendor portal is locked while you sort out authentication issues
Read-only is the most practical drill because it forces you to think operationally:
- What can continue safely?
- What needs a controlled workaround?
- What stops completely?
- Who decides, and how fast?
If you can handle read-only calmly, you are in better shape for bigger disruptions too.
The 60-minute drill that gives you clarity without drama
You can run this with a small group: an operations leader, a clinical leader, someone who owns scheduling, someone who owns billing or revenue cycle, and your IT point of contact.
Keep it simple. Put it on the calendar. Treat it like a leadership exercise, not an IT meeting.
Step 1: Pick the scenario and define the boundary (10 minutes)
Choose one:
- EHR read-only for 4 hours
- Email compromised for one user, must contain and reset access
- Shared drive unavailable for 4 hours
- Internet down at one site, phones still working
- A key vendor portal locked, must re-verify identity
Write the boundary in one sentence. Example:
“From 8:00 a.m. to noon, we can view charts but cannot enter new documentation in the EHR.”
That single sentence keeps the drill focused.
Step 2: Identify the minimum-safe services (10 minutes)
Ask: what must continue within 4 hours to avoid harm?
Most healthcare teams land on some version of:
- Patient identification and check-in
- Scheduling and rescheduling
- Medication and allergy visibility
- Provider-to-provider communication and escalation
- Basic documentation capture, even if delayed entry
- Billing capture, even if claims submission waits
Then ask the harder question:
What can wait 24 hours without creating clinical or financial damage?
This is where priorities become visible. It also stops you from trying to “do everything” during downtime.
Step 3: Assign four roles, even if they are hats (10 minutes)
During disruption, confusion spreads faster than malware. Assign roles now:
- Clinical lead
Decides what workflows are acceptable and what must stop. - Operations lead
Owns patient flow, scheduling decisions, and staff coordination. - Communications lead
Owns messaging to staff, patients, and partners. Keeps it consistent. - IT lead
Owns containment, recovery steps, and vendor coordination.
If one person holds two roles in real life, that is fine. The point is clarity.
Step 4: Walk the first hour (15 minutes)
Start the clock and talk it through.
- How do staff know the system is in read-only mode?
- Who announces it, and where do they announce it?
- What is the immediate instruction to front desk?
- What is the immediate instruction to clinicians?
- What is the instruction for referrals and prior auth work?
- What do you do about new patient forms and signatures?
- Where do “temporary notes” live so they do not become permanent chaos?
This is where you will hear the workarounds. Pay attention. Every workaround is a signal about friction.
Step 5: Verify recovery is real, not assumed (10 minutes)
This is the moment to ask questions that feel blunt, because they matter:
- If this system had to be restored today, what is the actual recovery time?
- When was the last successful restore test, not just a backup job report?
- If ransomware hit shared files, are backups isolated or reachable from the same environment?
- If a device walked away, can you lock or wipe it and prove encryption was on?
Do not accept “we have backups” as an answer. Backups are not recovery. Recovery is proven.
If you want a simple companion, pair this drill with a “first-hour checklist after a cyber incident” so you have a clear set of first moves when something is actively hostile, not just inconvenient.
Step 6: Debrief and produce one page (5 minutes)
End the drill with one deliverable:
A one-page summary that includes:
- The scenario you tested
- The top three failure points you discovered
- The top three fixes you will implement next
- One owner per fix
- A date for a repeat test
That one page is what leadership and boards actually need.
What a good drill usually reveals
Most teams discover the same categories of gaps, even if the details differ:
1) People are improvising because the “approved way” is too slow
If the safe method is hard, staff will choose the fast method. That is not a culture problem. That is a design problem.
This is where you see:
- Attachments and screenshots replacing link-based sharing
- Personal devices filling gaps in onboarding or device availability
- Shadow folders and side spreadsheets becoming the “real system”
2) Ownership is fuzzy
When the system breaks, who decides what comes first? If the answer is unclear, the delay will be expensive.
3) Recovery confidence is based on hope
Many organizations can tell you backups exist. Fewer can tell you the last time they restored the exact data and workflow they would need on a bad day.
4) Access sprawl creates slow containment
If accounts, devices, and permissions have drifted, containment becomes slow and risky. You spend precious time figuring out who has access instead of removing it.
Three fixes that reduce downtime pain fast
Once your drill exposes the weak spots, focus on the fixes that reduce both risk and daily friction.
Fix 1: Make “store first, share second” the default
If staff are still sending attachments with sensitive information, treat it as a process problem.
Pick one approved place for sensitive documents.
Train people to share links, not files.
Make version control part of the habit.
This reduces accidental mis-sends and reduces the “which version is final?” problem that shows up during downtime.
Fix 2: Tighten identity and device control
Downtime is often triggered by access issues: compromised credentials, locked accounts, unmanaged devices, or inconsistent updates.
Your baseline should include:
- MFA or passkeys on accounts that touch PHI or money
- A clean joiner-mover-leaver process for onboarding and offboarding
- Device management that enforces encryption, screen locks, and updates
- A clear policy for personal devices, including what is allowed and what is not
If you want a healthcare-oriented framing for this baseline, the Zero-Downtime Care approach is simple: calm controls, clear ownership, and proof that your defenses and recovery actually work.
Fix 3: Prove recovery quarterly
Set a cadence. Put it on the calendar.
Each quarter, pick one critical dataset or workflow and run a restore test.
Document the result.
Record the actual time.
This turns recovery from “we assume” into “we know.”
A simple way to start this month
If you want momentum without a giant project, do this:
Week 1: Run the read-only drill. Create the one-page summary.
Week 2: Fix the biggest friction point the drill exposed.
Week 3: Verify device encryption and access cleanup across your team.
Week 4: Run one restore test for a critical system and record the real recovery time.
Small improvements compound. The goal is a calmer clinic day, fewer workarounds, and an environment that can take a hit without falling apart.
If you want help running the drill, validating your baseline, or turning the results into a clear 90-day plan, We can help you get to “verified” without turning leadership into IT babysitters. Reach out.