Aikaterina Papanikolaou & Thyra Hogervorst

September 18, 2025

4min.

Mapping the Night: Why Scoring Sleep Is Harder Than It Seems

Every night, our brains slip into a hidden choreography of light dozing, deep rest, and dream-filled REM. To follow this dance, scientists use sleep scoring: 30-second snapshots that map the night. It’s a simple guide, but one that misses the richness that makes us feel truly rested. For decades, experts have shouldered the task of decoding these shifting brain states. Algorithms promised speed and efficiency, but they have yet to displace human scorers — a reminder of the stubborn tension between expertise and automation.

The challenge runs deeper: sleep is a profoundly personal experience, and even within the same person, no two nights ever look quite the same. Scoring offers a scientific lens, but it can’t capture how refreshed we feel or the story of our night. Still, those labels matter. The “deep sleep minutes” flashing on your wearable each morning hinge on how the night was scored. That’s why sleep scoring isn’t just a scientific debate — it shapes the numbers you wake up to, and with them, how you understand your own rest.

Your night, stage by stage: what’s really happening

The current gold standard for sleep scoring, set by the AASM (2018), divides the night into five stages. Each stage shows up differently in brain waves, eye movements, and muscle tone. Instead of dry definitions, think of them as signposts of what your brain and body are up to while you sleep.

Wake is marked by fast, low-amplitude, mixed-frequency brain waves, quick eye movements, and tense muscles. This is the fully alert baseline.
NREM 1 is the bridge between wakefulness and sleep. Brain waves slow from alpha to theta activity. The eyes may roll, and muscles remain relatively active. People are easy to wake in this stage.
NREM 2 is dominated by theta waves. Two hallmarks appear: sleep spindles, which shield from noise and aid memory, and K-complexes, which sense the environment but keep you asleep. Muscles relax further.
NREM 3 is dominated by slow-wave activity, minimal eye movements, and very low muscle tone. This stage is known as restorative sleep, the body’s reset mode.
REM sleep is involved in dreaming, memory consolidation, and emotion processing. Brain activity mirrors wakefulness, but the muscles are paralyzed. The eyes dart beneath the eyelids. REM comes in two forms: phasic, with bursts of activity, and tonic, which is more still.

These categories are a useful shorthand, but leave out nuance. And they don’t just matter to scientists — they shape the story your wearable tells you each morning.

When “good enough” isn’t good enough: limits of the system

Without a doubt, the current approach to sleep scoring, governed by the criteria set by the AASM, acts as a shared language among the scientific community for describing the night; yet, it also carries important limitations that go beyond the lab into everyday life.

Oversimplification: Even within a single night, sleep can look very different. Two segments both labeled as N3 — our “deep sleep” — might contain as little as 20% or as much as 80% slow-waves. Yet they carry the same label, flattening the rich complexity of what’s actually happening in the brain.
Subjectivity: Sleep scoring is inherently subjective — even the experts never agree 100% of the time, and sometimes not even with themselves. In other words, there is no absolute truth in the system. Yet these imperfect labels become the “ground truth” for training algorithms, embedding inconsistency at the very foundation of automation.
Old trade-offs: Five stages, 30-second bins — compromises made decades ago. More detail brings disagreement; less detail hides nuance.

This raises a sharper question: when does simplification cross the line into distortion? Thirty seconds may not sound long, but in brain time it’s an eternity. Since sleep scoring is the trigger that tells the DeepSleep algorithm when to start monitoring, having such a niche application in mind makes the case clear — we need a more tailored approach to improve usability.

“Even experts never agree 100% of the time — so how can machines trained on these labels ever be flawless?”

Blending human insight with machine power: the future of sleep scoring

If humans can be inconsistent and slower in the scoring efforts, why not let machines take over? The appeal is clear: algorithms can free up valuable time that would otherwise be spent on manual labeling. But decades of attempts show just how stubborn this challenge is¹. Because algorithms are trained on human scores, they inherit human biases and disagreements². This means a model that works well for healthy sleepers may fall short in older adults, people with sleep problems, or recordings made with different equipment.

Algorithms are tireless but that does not imply that they are correct. These algorithms are like a metronome that never wavers, but may be keeping the wrong tempo. Humans are nuanced and fallible but not always consistent. Neither, on its own, is enough.

“Algorithms are here to stay — but expert oversight is essential to catch and correct their blind spots.”

Therefore, the most likely future for sleep scoring is neither purely human-led nor machine-led, but a convergence of both, blending the strengths of algorithms with the judgments of experts.

In practice, it could look like this:

Algorithms take the first pass: pre-scoring the recordings and flagging low-confidence areas.
Experts step in where it matters most: focusing on the ambiguous segments instead of the routine.
The goal is simple: minimize low-stakes workload and maximize the payoff of human time.

But there is a warning worth repeating. Algorithms may look as if they outperform experts, while in reality they are only echoing existing biases. Without rigorous quality checks and continuous re-evaluation by independent experts, this creates a false sense of progress. The real test is not whether machines can copy human labels, but whether they can reveal what humans cannot.

By taking over the repetitive burden of routine labeling, semi-automatic scoring frees researchers to focus their cognitive energy where it matters most: exploring how algorithms can dive deeper into sleep architecture itself. That means spotting instabilities in slow-wave sleep, detecting micro-arousals, or tailoring scoring methods to the unique patterns of each sleeper.

The promise of machine learning in sleep science is not just efficiency. It is the opportunity to move beyond rigid stages toward metrics that are continuous, personalized, and actionable — giving us a sharper, more meaningful window into the sleeping brain.

Shaping sleep scoring with intention: from rigid labels to real insight

When we talk about sleep staging, we suggest the night can be neatly sliced into parts. But sleep is not that simple, and algorithms give us the chance to capture more nuance. They should be tailored to their purpose, optimized for real use rather than clinging to outdated standards. Progress means evolving old methods into sharper, more specific ones.

One example is the Odds Ratio Product (ORP)³, a continuous measure of sleep depth. Instead of broad REM or NREM labels, it analyzes EEG patterns every three seconds and scores depth from 0 (deep sleep) to 2.5 (fully awake). This fine‑grained view reveals subtle changes in sleep quality — for instance in sleep apnea patients treated with CPAP — that traditional staging would miss.

What this means for Deep Sleep Technologies

For DST, sleep scoring is more than a retrospective measure. Rather, it is the trigger for real-time intervention. Accurate stage detection forms the foundation of our closed-loop neurostimulation (CLNS) technology, allowing stimulation to align with the brain’s natural rhythms, particularly the slow waves of deep sleep.

DST is helping drive integrated approaches that do not rely solely on human expertise or machine automation. By staying rigorous and refusing to take either for granted, we aim to ensure that the future of sleep science does not grow complacent with outdated standards, but instead forges new paths toward the next breakthrough.

References

[1] Stanley N. (2023). The Future of Sleep Staging, Revisited. Nature and science of sleep, 15, 313–322. https://doi.org/10.2147/NSS.S405663

[2] Danker-Hopfe, H., Anderer, P., Zeitlhofer, J., Boeck, M., Dorn, H., Gruber, G., Heller, E., Loretz, E., Moser, D., Parapatics, S., Saletu, B., Schmidt, A., & Dorffner, G. (2009). Interrater reliability for sleep scoring according to the Rechtschaffen & Kales and the new AASM standard. Journal of sleep research, 18(1), 74–84. https://doi.org/10.1111/j.1365-2869.2008.00700.x

[3] Penner, C. G., Gerardy, B., Ryan, R., & Williams, M. (2019). The Odds Ratio Product (An Objective Sleep Depth Measure): Normal Values, Repeatability, and Change With CPAP in Patients With OSA. Journal of clinical sleep medicine : JCSM : official publication of the American Academy of Sleep Medicine, 15(8), 1155–1163.

Blog

new

news

Lucia Talamini invited in the podcast "De Technoloog" on the 13th of November

Green Thumb Society

Carmen

October 6, 2024

new

news

DeepSleep will represent the Dutch trade mission at CES 2025

Dutch Ministry of Foreign Affairs

NL Netherlands

October 1, 2024

new

news

Presenting research results at European Sleep Congress Society in Seville

Sleep Science

Carmen

August 4, 2024

new

news

Deep Sleep Technologies and Sleep & Memory Lab UvA start at home trial insomnia

Deep Sleep Technologies

Carmen

January 7, 2024

new

Mapping the Night: Why Scoring Sleep Is Harder Than It Seems

Your night, stage by stage: what’s really happening

When “good enough” isn’t good enough: limits of the system

Blending human insight with machine power: the future of sleep scoring

Shaping sleep scoring with intention: from rigid labels to real insight

What this means for Deep Sleep Technologies

References

Related

Blog

articles

Daphne hasn’t slept through the night in ten years and discovers in a sleep lab: “No wonder I wake up exhausted”

Unlocking the Power of Sleep: Insights from the Latest WHOOP Podcast with Gina Poe

Deep Sleep Technologies Raises €2M to turn neuroscience into scalable sleep solutions

DeepSleep for All: Our Technology Featured on EditieNL

Deep Sleep Technologies announces Prof. Ysbrand van der Werf as scientific advisor

Deep Sleep Technologies announces Dr. Gina Poe as scientific advisor

Deep Sleep Technologies and the University of Amsterdam sign Long-Term Partnership

Lucia Talamini invited in the podcast "De Technoloog" on the 13th of November

DeepSleep will represent the Dutch trade mission at CES 2025

Presenting research results at European Sleep Congress Society in Seville

Deep Sleep Technologies and Sleep & Memory Lab UvA start at home trial insomnia

Revenge Bedtime Procrastination: Why You Delay Sleep, Even When You’re Exhausted

Between Worlds: The Night You Wake, But Can’t Move

Invisible Impairment: The Silent Saboteur of Daily Life

Prague in Motion: Reflections from Innovation Week 2025

Beyond Dreams: How REM Sleep Rewrites Emotional Memory at the Singapore World Sleep Congress 2025

Sleep Inertia: What Happens When You Wake Up "Wrong"

The Sleep You Believe You’re Getting... And Why It May Not Be Real

Beyond Deep Sleep: a Breakthrough at REPLAY Cardiff

Successful Collaboration with LiveUP Media Brings The Story of DeepSleep to Life

Michiel Verstraten joins the Future Of Sleep

Why AI alone Won’t Unlock the Mind: and What We’re Doing Differently

Introducing DeepSleep Pioneers: Be Part of the Future of Sleep

From Wearables to AI: The Sleep Tech Evolution

The Future of Health and Sleep: Insights from CES

Unlocking Better Sleep: CES 2025 Highlights in Sleep Technology and Innovation

The Economic Toll of Poor Sleep: Unveiling the Ripple Effects