The 37% Rule Is Almost Always Wrong -- And That Is the Point

The mathematics of optimal stopping are provably optimal under their assumptions -- and those assumptions almost never hold in real life. The 37% number can swing from 10% to 61% depending on which assumptions you relax, yet the deeper principle it encodes -- deliberate exploration followed by decisive commitment -- remains one of the most powerful ideas in decision science.

45 sources

36 min read time

— audio

Section 01

Foundation -- Why Exploration Followed by Commitment Works

The Secretary Problem and Why 37% Became Famous

The setup is deceptively simple: hire the best secretary from a pool of candidates, interviewed one at a time in random order. After each interview, you must immediately hire or permanently reject -- no callbacks, no second chances. You know only how each candidate compares to those already seen.

The optimal strategy, proven by Lindley (1961) and Dynkin (1963), is the "look-then-leap" rule: reject the first n/e candidates unconditionally (where e is Euler's number, ~2.718), using them to establish a quality benchmark. Then accept the next candidate who exceeds it. The fraction 1/e is approximately 37%. This strategy selects the single best candidate about 37% of the time, regardless of pool size -- from 100 to 100 million applicants. Random selection from 100 candidates succeeds only 1% of the time, making the 37% rule a 37-fold improvement over chance.

Bruss proved in 1984 that this 1/e lower bound holds even when pool size is unknown -- a result that surprised the field (Bruss, 1984, odds algorithm). All optimal strategies take the form of threshold rules: reject until a certain point, then accept the next best-so-far.

Key Terms: A Decision-Making Vocabulary

Optimal stopping is the mathematical study of when to take an action in a sequential process to maximize expected reward. The secretary problem is its most famous example.

The explore/exploit tradeoff (also called the exploration-exploitation dilemma) describes the tension between trying new options to learn about them (exploration) and sticking with the best option you currently know about (exploitation).

Multi-armed bandit refers to a class of problems -- named after a gambler facing a row of slot machines with unknown payoff rates -- where a decision-maker must repeatedly choose between options with uncertain rewards, balancing learning against earning.

Satisficing, a term coined by Herbert Simon in 1955, means setting a quality threshold and accepting the first option that meets it, rather than exhaustively searching for the absolute best.

Strategic satisficing combines high standards with efficient search -- wanting the best outcome but refusing to engage in exhaustive, obsessive comparison.

Why the Principle Survives Even When the Number Does Not

Every realistic modification to the secretary problem's five core assumptions changes the optimal exploration percentage, sometimes radically. But the underlying principle -- explore deliberately, then commit decisively -- remains robust across all variants.

The five assumptions that almost never hold simultaneously in real life are: (1) you cannot revisit rejected options, (2) you know the total pool size in advance, (3) you can perfectly evaluate each option, (4) you judge on a single criterion, and (5) search is costless.

When Petruccelli (1993) introduced just a 50% probability of successfully recalling a rejected candidate, the optimal exploration threshold jumped from 37% to 61%, with success probability also rising to 61%. When search costs are added, Lorenzen (1981) showed that the clean cutoff rule disappears entirely, replaced by a declining threshold. When the goal shifts from "find the absolute best" to "find someone good," Bearden (2006) showed the optimal exploration phase drops to the square root of n. For 100 options, that means exploring only 10 rather than 37.

Variant	Optimal Explore %	Success Rate
Classical (no recall, no info)	37%	37%
Full information (known scores)	Dynamic threshold	~58%
50% recall probability	61%	61%
Cardinal payoff (want "good," not "the best")	sqrt(n) (~10% for 100 options)	Higher expected value
With search costs	Declining, variable	Problem-dependent
Mutual selection (50% rejection risk)	~25%	~25%
Prior sampling (strong prior info)	Threshold rule	Up to ~74.5%

Robert Wiblin, head of research at 80,000 Hours, put it bluntly: "The secretary problem is such a poor approximation of real life that we should not see it as useful for guiding our actual decisions." His argument is not that exploration is useless -- it is that the specific number 37% gives false precision.

The takeaway is not a number. It is a principle: before committing to any major sequential decision -- a job, an apartment, a partner -- invest real time and effort in pure exploration. Learn what "great" looks like before you start choosing.

The 37% number can swing from 10% to 61% depending on which assumptions you relax, yet the deeper principle -- explore deliberately, then commit decisively -- remains robust across all variants.

What this means for listeners: The takeaway is not a number. It is a principle: before committing to any major sequential decision -- a job, an apartment, a partner -- invest real time and effort in pure exploration. Learn what "great" looks like before you start choosing. The exact fraction of time you spend exploring matters far less than the fact that you do it deliberately rather than either settling impulsively or searching forever.

Section 02

Evidence -- What Research Actually Shows

How Humans Perform: Earlier Than Optimal, But Surprisingly Smart

Humans consistently stop searching earlier than the 37% rule predicts. In laboratory experiments with 20 candidates, participants choose at position 4-5 when the optimal stopping point is 7-8. The average stopping point is approximately 31% (Seale & Rapoport, 1997). This "bias" may reflect rational adaptation to real search costs -- time, money, emotional energy -- that the model assumes to be zero.

The rapid learning effect is more striking: when participants play repeated rounds with feedback, success rates climb from 28% to near-optimal levels after just 3-7 games. People are not bad at this; they are unfamiliar with it.

Computationally, humans use a linear declining threshold rather than the sharp cutoff the 37% rule prescribes -- starting with high standards and gradually lowering them. This heuristic achieves within 6% of optimality. And Goldstein, McAfee, Suri, and Wright (2019) found in Management Science that people learn near-optimal behavior only when exposed to actual values rather than rankings.

The Satisficing Paradox: Getting More by Wanting Less

The most counterintuitive finding in this field comes from Iyengar, Wells, and Schwartz (2006) in Psychological Science. They tracked graduating seniors through job searches and found that maximizers -- exhaustive searchers for the best possible job -- secured positions with starting salaries roughly $7,500 higher (about 20% more) than satisficers. Yet maximizers were significantly less satisfied with those objectively better jobs and experienced more negative affect throughout the search.

They got better outcomes and felt worse about them.

Schwartz's earlier work (2002, JPSP; 2004, The Paradox of Choice) had established that maximizers score lower on happiness and higher on depression and regret. The breakthrough came when researchers examined exactly what about maximizing causes misery. Diab, Gillespie, and Highhouse (2008) in Judgment and Decision Making developed a revised scale focused on high standards alone -- and found no correlation with unhappiness. Cheek and Schwartz (2016) synthesized 11 scales and resolved the paradox: having high standards (the maximizing goal) is neutral to positive; exhaustive comparison (the maximizing strategy) drives depression, regret, and lower satisfaction.

Hughes and Scholer (2017) in PSPB sharpened this: "adaptive" maximizers (promotion-focused, wanting the best) experience minimal regret. "Maladaptive" maximizers (assessment-focused, compulsively re-evaluating) generate FOBO -- fear of a better option. The critical difference is not how thoroughly you search but whether you re-evaluate after choosing.

One counterpoint: Saltsman et al. (2020) found satisficers exhibited greater physiological threat during choice overload -- satisficing may sometimes function as defensive avoidance rather than genuine contentment.

The resolution is strategic satisficing: wanting the best while stopping efficiently. Mathematically, satisficing corresponds to the "full-information" secretary problem variant, where threshold rules yield approximately 58% success rates -- far better than the classical 37%.

Dating Apps: When Infinite Options Break the Framework

Digital dating has rendered several core assumptions of optimal stopping incoherent. With 350+ million global dating app users (2024), Tinder users swiping through 140 profiles per day and spending 80 minutes daily on the platform, the "finite, known pool" has dissolved.

The evidence on what this does to decision quality is consistent. Pronk and Denissen (2020) in Social Psychological and Personality Science found a cumulative 27% decrease in acceptance probability across Tinder-like sessions -- a "rejection mindset" driven by declining satisfaction and growing pessimism. D'Angelo and Toma (2017) showed in Media Psychology that daters choosing from 24 profiles were less satisfied and more likely to reverse their choice than those choosing from 6.

The damage extends to commitment. Brady et al. (2022) showed across five experimental samples in JESP that perceiving abundant partners decreased commitment readiness. Thomas et al. (2022) in Computers in Human Behavior found higher partner availability increased fear of being single and decreased self-esteem.

Yet a PNAS study of 19,131 marriages found online-met couples had slightly higher satisfaction and lower breakup rates (5.96% vs. 7.67%). And Scheibehenne et al.'s (2010) meta-analysis found no universal choice overload effect -- expertise, complexity, and time pressure moderate it. The problem is not abundant options per se but the psychological strategies most people lack.

Platform design matters. Hinge users show 25% higher conversation rates and 40% higher meeting rates versus Tinder, likely due to limited-likes design. No formal mathematical revision of optimal stopping for infinite-scroll environments exists; foraging theory may be a better framework.

The Lifespan Trajectory: Explore When Young, Exploit When Mature

The explore/exploit balance shifts systematically across the lifespan -- not as folk wisdom but as converging evidence from economics, developmental psychology, and neuroscience.

The economic logic: a 20-year-old has 50+ years to benefit from exploration; a 70-year-old has 10-15. Early exploration costs are vastly outweighed by decades of informed exploitation. The cognitive logic: fluid intelligence (novel problem-solving) peaks young while crystallized intelligence (expertise, pattern recognition) increases with age, creating natural alignment between youth and exploration, maturity and exploitation.

The most powerful finding comes from Laura Carstensen's socioemotional selectivity theory (SST), one of the best-replicated results in developmental psychology. The shift is driven not by chronological age but by perceived future time. Young people facing terminal illness show the same exploitation bias as elderly people; elderly people told about a life-extending breakthrough show renewed exploration motivation. The implication: calibrate per domain, not per birthday.

Children ages 3-5 show almost exclusively exploratory behavior, even after discovering high-reward options. By adulthood, people predominantly exploit, with exploration becoming rare and strategic -- mirroring mathematical predictions. A Nature study found creative "hot streaks" follow periods of diverse exploration, suggesting exploration is a productive input, not merely a cost.

Organizations: The Exploitation Trap and How to Escape It

James March's 1991 paper in Organization Science (3,949+ citations) established the framework: adaptive processes refine exploitation faster than exploration, making organizations "effective in the short run but self-destructive in the long run."

Kodak is the textbook exploitation trap. Steve Sasson invented the digital camera there in 1975; management suppressed development to protect ~90% U.S. film market share; bankruptcy followed in 2012. But the standard narrative oversimplifies. Former executive Willy Shih argued in MIT Sloan Management Review (2016) that leaders tracked digital threats and achieved top-3 digital positions. Lucas and Goh's analysis (2009) identified the binding constraint as middle-management culture and bureaucratic structure, not leadership blindness. Exploitation traps are structural, not just about bad leaders.

Nokia at peak held 40% of global mobile phones. By 2009: 57 incompatible versions of Symbian OS. INSEAD researchers (76-interview study, Administrative Science Quarterly) found the root cause was fear: top managers were "extremely temperamental," middle managers afraid to deliver bad news, and "top management was directly lied to" about capabilities.

Amazon shows the alternative: the Fire Phone's $170M writedown (2014) was a failed exploration bet, but learnings redirected to Echo/Alexa (~70% smart speaker market). AWS exploited internal infrastructure while exploring a new market, now $100B+ annual revenue.

Google's 20% time cautions against unstructured exploration. Only ~10% of engineers used it; Laszlo Bock called it "cultural aspiration rather than operational reality." By 2012, Google shifted to structured programs.

The structural lesson: O'Reilly and Tushman (2004) found that organizations with separate exploration and exploitation units achieved breakthrough goals in over 90% of cases, versus 25% for functional designs and 0% for unsupported teams (35 innovation attempts). The 70-20-10 model (Nagji & Tuff, 2012) -- 70% core, 20% adjacent, 10% transformational -- earned companies a 10-20% P/E premium. Counterintuitively: 70% of resources go to core but only 10% of long-term ROI; 10% to transformational but 70% of long-term ROI.

However, Mathias's meta-analysis (117 studies, 21,000+ firms) found ambidexterity yielded weaker effects than focused strategies -- coordination costs partially offset benefits. Uotila et al. (2009) found an inverted U-shape in S&P 500 firms. A 2025 Nature study found peak performance at ~61% exploitation. The optimal balance is not universal.

Evidence from Education: The England vs. Scotland Natural Experiment

One of the strongest pieces of evidence for the value of structured exploration comes from economist Ofer Malamud's natural experiment comparing the English and Scottish education systems. In England, students choose their major before entering university, typically at age 16-17. In Scotland, students study broadly for the first two years before specializing.

Malamud (2010, 2011, NBER) found that English graduates -- the early specializers -- were more likely to switch to entirely unrelated occupations later in life, suggesting they frequently discovered "bad matches" only after entering the labor force. Late specializers found better field matches despite sacrificing some early skill depth. The benefits of "match quality" -- finding the right field -- proved substantial enough to outweigh the loss of specific skills accumulated through early specialization.

This finding aligns with the 80,000 Hours career framework suggestion that ages 18-26 represent roughly the first 37% of a working life starting at 18, and should be dedicated to sampling different career paths rather than optimizing advancement in a single track.

Evidence Synthesis: Where Sources Agree and Diverge

Areas of agreement across multiple sources and study types:
- The principle of structured exploration before commitment is robust (mathematical proofs, experimental studies, organizational research)
- Humans stop searching earlier than mathematically optimal but are within 6% of optimality using simple heuristics (Seale & Rapoport, 1997; linear threshold modeling)
- The satisficing/maximizing distinction is real, but the original measurement conflated goals and strategies (Schwartz, 2002; Diab et al., 2008; Cheek & Schwartz, 2016)
- Perceived time horizon, not age, drives the explore/exploit shift (Carstensen, SST -- one of the best-replicated findings in developmental psychology)
- Organizations systematically drift toward exploitation (March, 1991; 3,949+ citing articles)

Areas of genuine disagreement or uncertainty:
- Whether organizational ambidexterity outperforms focused strategies (O'Reilly & Tushman show >90% success; Mathias meta-analysis shows weaker effects from ambidexterity than focus)
- Whether choice overload is universal or moderated (Scheibehenne meta-analysis finds no universal effect; dating app studies consistently find negative effects)
- The exact optimal exploration percentage for any real-world domain (ranges from 10% to 61% depending on which assumptions are relaxed; individual variation is enormous)
- Whether satisficing reflects genuine wisdom or sometimes defensive avoidance (Saltsman et al., 2020 cardiovascular findings)

What remains unknown:
- No formal mathematical framework for optimal stopping in infinite-option digital environments
- No randomized controlled trials on long-term life outcomes from deliberate application of explore/exploit frameworks
- Cross-cultural differences in exploration strategies are largely unexplored -- nearly all research is Western
- How personality traits and neurodiversity interact with optimal exploration strategies

Maximizers secured positions with starting salaries roughly $7,500 higher (about 20% more) than satisficers -- yet maximizers were significantly less satisfied with those objectively better jobs.

What this means for listeners: The popular advice to "just be a satisficer" oversimplifies. The real insight is more specific: maintain high standards for what constitutes "good enough," but refuse to engage in exhaustive comparison after you have found it. Set your threshold before you start searching. Commit when it is met. And critically, do not re-compare with alternatives after committing -- that re-evaluation, not the high standards themselves, is what produces misery.

Section 03

Application -- How to Know When You Have Explored Enough

The Multi-Armed Bandit Toolkit

Three algorithms formalize the explore/exploit tradeoff for repeated decisions, each mapping to a distinct life strategy.

Epsilon-greedy: exploit your best-known option 90% of the time; explore randomly 10%. Simple and cheap but wastes exploration on clearly bad options.

UCB1 (Upper Confidence Bound): selects the option with highest estimated reward plus a confidence bonus for uncertainty. Less-known options get an exploration bonus precisely because you know less. Achieves logarithmic regret -- the performance gap grows only logarithmically with time.

Thompson Sampling: maintains probability distributions for each option, samples from them, picks the highest. Uncertain options sometimes produce high samples (exploration); well-known good options consistently do (exploitation). Often outperforms UCB in practice, especially with sparse feedback.

The Gittins Index (1979, proven optimal) delivers a counterintuitive insight: an unknown option is mathematically more attractive than one known to pay 70%, because the unknown has uncapped upside. This rigorously justifies biasing toward exploration when uncertain.

Algorithm	Exploration Strategy	Guarantee	Best For
Epsilon-greedy	Random (uniform)	None (heuristic)	Simple problems, daily habits
UCB1	Uncertainty-directed	Logarithmic regret	When you want theoretical rigor
Thompson Sampling	Bayesian posterior	Competitive with UCB	Sparse feedback, practical decisions
Gittins Index	Optimal Bayesian	Proven optimal	Theoretical benchmark

Protocol 1: Adapted Look-Then-Leap

Define your decision domain and time horizon. Examples: "30 days for an apartment." "3 years exploring career directions."
Spend the first 30-40% in pure exploration -- gather information, build benchmarks, do not commit. For a 30-day apartment search: 9-12 days of viewing. For careers ages 18-60: roughly ages 18-35.
After exploration, commit to the first option meeting or exceeding your benchmark.
If nothing exceeds your benchmark by the final 10% of your horizon, lower your threshold and take the best available.

Why 30-40%: Real decisions involve partial recall (pushing optimum higher) and search costs (pushing it lower). The range captures the realistic middle ground.

Protocol 2: Strategic Satisficing

Set your threshold before searching. Write it down. Be specific: "A job paying at least X, commute under Y minutes, involving Z work."
Maximize on 2-3 high-stakes dimensions only (career, life partner, health). Satisfice on everything else.
When an option meets your threshold, commit. Make it feel irreversible -- cancel other interviews, sign the lease, delete the app.
Do not re-compare after committing. Hughes and Scholer (2017): the difference between adaptive and maladaptive maximizers is whether they re-evaluate after choosing.

Protocol 3: The Five-Question Stopping Test

Can you articulate what "great" looks like in this domain? If no: explore more broadly. You have not yet learned your own preferences.
Are new options teaching you anything fundamentally new? If yes: you are in the high-return zone of exploration. If no: you have hit diminishing returns on information gathering.
Does your best current option meet your satisficing threshold? If no: continue targeted search.
Has your best guess stopped changing with new information? If yes: commit and set a 1-2 year review point. The 80,000 Hours framework recommends: "Once your best guess stops changing with new information, it's probably time to commit and try it for a few years."
Would you regret not trying one specific unexplored option? If yes: explore that one thing, then commit. If no: commit with confidence.

Protocol 4: Plan A/B/Z Career Framework

From 80,000 Hours (tested on 1,000+ individuals).

Plan A: best-guess career path you are actively testing, with a 2-3 year commitment.
Plan B: nearby alternative with specific trigger conditions. Example: "If no promotion within 2 years, transition to consulting."
Plan Z: fallback if everything collapses. Not an aspiration -- a safety net enabling risk-taking.
Stopping signal: Once your best guess stops changing with new information, commit for 2-3 years.
Epsilon-greedy maintenance: Reserve ~10% of time for exploration after committing -- conferences, side projects, cross-industry networking. Prevents the exploitation trap.

Protocol 5: Domain-Specific Time Horizon Calibration

Based on Carstensen's SST and the mathematical relationship between horizon length and optimal exploration.

For each domain (career, relationships, geography, hobbies, health), estimate remaining meaningful horizon independently. A 50-year-old changing careers has 15-20 years (explore more); if happily partnered, the relationship horizon calls for exploitation.
Longer horizons: bias toward exploration. Accept short-term costs for information value.
Shorter horizons: bias toward exploitation. Deepen commitments, harvest knowledge.
Reassess annually -- health, career disruptions, or family changes alter horizons.

Caveats and Context

Who should be cautious: People in genuine crisis may need to take the first adequate option. The research base is overwhelmingly Western -- cultural norms around mobility and risk vary enormously. Personality and neurodiversity likely interact with these strategies in unstudied ways.

What algorithms cannot capture: Decisions come in three types -- hats (reversible), haircuts (lingering), and tattoos (permanent). "Wisdom is knowing what kind of decision you are making." These frameworks are most useful for haircut and tattoo decisions.

The Gittins Index proves mathematically that an unknown option is more attractive than one known to pay 70%, because the unknown has uncapped upside -- rigorously justifying a bias toward exploration when uncertain.

What this means for listeners: The drift toward exploitation is automatic and invisible. You need structural protection for exploration: dedicated time, separate budgets, explicit permission to fail. Google's lesson is that saying "you can explore" is not enough -- only 10% will. Build exploration into the structure, not just the culture. And even after committing, maintain 10% of your effort in exploration mode -- the epsilon-greedy approach prevents the exploitation trap that consumed Kodak, Nokia, and countless careers.

Explore deliberately, then commit decisively. The specific 37% number is almost always wrong for real-world decisions, but the principle it encodes is gold. Before any major sequential decision, invest 30-40% of your available time in pure exploration -- learning what "great" looks like, building an internal benchmark. Then commit to the first option that meets your standard. · Want the best, but do not shop the best. Having high standards correlates with no increase in unhappiness (Diab et al., 2008). What causes misery is the strategy of exhaustive comparison: endlessly browsing, re-evaluating, second-guessing. Set your threshold before searching, commit when it is met, make the decision feel irreversible, and do not look back. · Calibrate exploration to your time horizon, not your age, and do it separately for each life domain. A long remaining horizon in any domain justifies more exploration; a short one justifies more exploitation. Reassess annually. And even after committing, maintain 10% of your effort in exploration mode -- the epsilon-greedy approach prevents the exploitation trap that consumed Kodak, Nokia, and countless careers.