Algorithms for Life: Game Theory

The same people cooperate 70–80% of the time or defect 80–90% of the time depending purely on the rules of the game — not their character. This episode reveals how game theory secretly governs your rent, your child's school placement, your salary negotiation, and why the algorithm setting your rent may be colluding without any human ever deciding to.

46 min listen time

25 May 2026 published

6 episode

View Sources

00:00 Traffic jams and the price of anarchy
03:12 What Nash equilibrium actually tells us
06:05 How humans splash around the equilibrium
09:30 The ultimatum game: spite over math
13:45 Noisy best responses and bounded rationality
16:50 The beauty contest trap
22:10 The prisoner's dilemma and cooperation research
28:35 Axelrod's tournament and Tit for Tat
35:20 Extortion strategies and why niceness still wins
39:45 Designing for the 60% conditional cooperators
43:30 Mechanism design: kidney exchanges and school choice
52:15 Adverse selection, moral hazard, and hidden information
57:40 Perverse incentives and mechanism design failures
62:50 Algorithmic collusion and AI at the game theory frontier
70:15 Your game theory toolkit and closing takeaways

Read transcript

Welcome to UDAM Research, Yo-Odame Research, from our Algorithms for Life series by Valor Engels. Glad to be here for this one. Imagine you're stuck in traffic. Your phone shows a faster route, so you take it. So does everyone else. Now both roads are worse. Yeah, that is, uh, it's just a universally frustrating experience, right? It's truly maddening. Right. And it perfectly illustrates this mathematical concept called the price of anarchy. The price of anarchy. I love that term. It's a great term. Basically, the price of anarchy is the, um, the efficiency tax you pay for living in a world where nobody coordinates. Like when everyone's just looking out for themselves. Exactly. And in traffic networks specifically, studies show this lack of coordination costs us roughly 15 to 35 percent in efficiency. Wait, 15 to 35 percent just because we aren't talking to each other? Yeah. Everyone looks at their map app, acts in their own immediate rational self-interest to save, you know, three minutes. And as a result, everyone collectively loses 20 minutes. That is the perfect entry point into our topic today. Because that traffic jam, that feeling of making the right choice only to have it completely backfire because of everyone else, that's the absolute essence of game theory. It really is. To put it simply, game theory is the mathematical study of what happens when your best move depends entirely on what everyone else does. And, you know, most people hear game theory and think it's just about like chess or poker. Cold war nuclear strategy. Exactly. But it's not. It's about literally everything. It's about how you negotiate your salary, how you deal with your coworkers. Even how you use dating apps. Oh, heavily dating apps. Right. And for this deep dive, we are embarking on this massive exploration of human behavior. We've got a three-part mission today. A very ambitious one. Yeah. First, we're going to explore why deeply smart people routinely make collectively stupid decisions. Which happens a lot. Constantly. Second, we'll look at how the invisible design of rules saves and ruins lives, from how we allocate kidneys to how we place kids in classrooms. Mechanism design. Yeah. And finally, we'll look at what happens when the players sitting across the table aren't humans at all, but rapidly learning algorithms. It's a real journey from the quirks of human psychology to structural engineering. And then right out to the bleeding edge of artificial intelligence. Because the rules of the game are changing faster than we realize. By the end, you'll be asking, are you playing the game or is the game playing you? To even begin to answer that, though, we have to establish a baseline, right? Like, what do we actually mean when we use the word rational? Right. The rational actor. Yeah. In game theory, the absolute baseline of pure rationality is anchored by a concept called Nash equilibrium pronounced Nash equilibrium. Named after John Nash, the mathematician. Right. The guy from the movie A Beautiful Mind. Yeah. Russell Crowe. So to understand a Nash equilibrium, you have to imagine a completely stable state in an interaction. It's a scenario where no player can improve their outcome by changing their strategy alone. Okay. So I can't do any better unless someone else moves first. Exactly. If everyone else keeps their strategy the same, you have absolutely zero incentive to change yours. You are already doing the mathematical best you can. You're locked in. Makes sense. But, and this is a crucial distinction we have to make right away, Nash equilibrium describes where systems ultimately settle. It doesn't necessarily describe how individuals actually think in the moment. Okay. Let's unpack this. So if Nash equilibrium is like water finding its level, why do humans seem to splash around so much before settling? Splashing around. I like that. I mean, if I pour water into a glass, it doesn't just instantly teleport to a perfectly flat surface, right? It sloshes against the sides, it swirls, and eventually gravity pulls it flat. Humans seem to do a ton of sloshing. We don't just instantly flow to the logical point. That is a phenomenal metaphor. We splash around because we aren't perfectly rational supercomputers. We're biological creatures. We have emotions, limited attention spans, biases. We get tired. Exactly. There's this very significant gap between theory on a chalkboard and real human reality in a boardroom, and we can actually measure the splashing. How do we measure it? Well, there was a landmark meta-analysis published by Kammerer and Ho in 1999. They looked at 122 different experimental studies of how humans actually play games in labs. That's a huge data set. Massive. And what they found was deeply fascinating. Human subjects do actually converge toward that flat surface, toward the Nash equilibrium, but it takes time. So they don't just see the matrix immediately. Not at all. In relatively simple games, it takes about 10 full rounds of play for the splashing to stop. 10 rounds, just to figure out the mathematically obvious move. Yep, they have to poke and prod at the system, make mistakes and adjust. That feels so inherently human. We have to like touch the hot stove a few times before we trust the physics of the heat. That's exactly it. And even then, after those 10 rounds of learning, there is still a 15 to 25% deviation from the predicted perfect equilibrium. So even when we figure it out, we're still 20% off. Right. People get close, but they almost never play perfectly. And the problem gets significantly worse when people only play a game once. Like a one-shot game. Yeah. Think about how much of your life is a one-shot game. You only buy that specific house once. You only negotiate that specific job offer once. Right. There's no time for trial and error. Exactly. When there's no time to learn, human deviation from rationality is massive. And the best way to see this is by looking at the ultimatum game. I love the ultimatum game. It makes economists so mad. It really does. It was first conducted by Guth, Schmittberger and Schwarze in 1982. The setup is elegantly simple. Okay, walk us through it. You have two people in a room, a proposer and a responder. The proposer is given a pot of free money, say $10, and they get to decide how to split it. Between the two of them. Right. And the responder only has two choices. They can accept the split and they both keep the money, or they can reject the split, in which case the money is burned and nobody gets a cent. Okay, let me put myself in the shoes of a purely rational actor here. If I am a perfectly self-interested economic robot and you offer me one single penny out of that $10, I should accept it. Mathematically, yes. Because one penny is mathematically better than zero pennies. Exactly. And if you are also a perfectly rational robot, you know that I will accept the penny. So as the proposer, your optimal move, the absolute best thing you can do, is to offer me the minimum possible keep $9.99 and walk away. That's the Nash equilibrium, right? That is exactly what pure theory dictates. Proposers offer near zero. Responders accept literally anything. So the water should settle immediately at the 9.99 to 1 penny mark. It should. But what happens when you bring real human beings into the lab? Proposers generally offer roughly 40% of the pot. So they offer a pretty fair split right off the bat. They do. And even more interestingly, when proposers do try to act like rational robots and offer an unfair split like keeping $9 and offering one responders reject those unfair offers 40 to 50% of the time. Wait, 40 to 50%. I want the listener to really internalize what that means. People will literally choose to set their own free money on fire rather than accept an insult? Yes. Out of sheer spite, they will take zero just to ensure you also get zero. That is wild. We're talking about a massive 35 to 45 percentage point deviation from pure textbook rationality. Which forces game theorists to stop and recalibrate. I mean, if we aren't calculating Nash equilibria, if we're willingly burning cash to punish a stranger, what exactly is the architecture of our decision making? If I had to guess, it's that we don't just care about the money. We care about the social reality of the money. Yes, exactly. Human thinking involves systematic probabilistic errors, and we have deep evolutionary wiring for fairness. We don't exist in vacuums. So how do economists model that? To explain this statistically, researchers developed a much better model called Quantal Response Equilibrium, or Q-R-E pronounced K-Wan-Tol Response Equilibrium. Q-R-E. Got it. Instead of assuming players make perfect choices, Q-R-E assumes players play what are called noisy best responses. Noisy best responses. I like that phrase. It sounds like my entire career. It describes all of us. Under Q-R-E, individuals generally try to do what's best for themselves, but their execution is flawed. They make calculation mistakes. They let their ego into the room. They get tired. And their responses end up smeared out like a probability distribution, right? Yeah. Rather than a single sharp point on a graph. Yes. It's like throwing darts. Oh, that makes sense. The Nash equilibrium is the absolute dead center of the bullseye. The perfect rational choice. Exactly. But human beings have shaky hands. We're aiming for the center, but the dart hits an inch to the left or an inch to the right. We hit the board, but we're scattered around the center in a cloud of probability. That is a brilliant way to visualize it. And beyond that physical or emotional noise, there is a hard limit on our cognitive capacity. Our brain power. Right. Researchers use cognitive hierarchy models to map this out. Think of a game where you actively have to outsmart someone else. Most normal human beings do not sit down and think 10 steps ahead like a chess grandmaster. No, absolutely not. My brain hurts just trying to think three steps ahead. And studies consistently show that average people typically reason only one or two levels deep. There's this incredibly revealing experiment called the beauty contest game. Okay. I have to jump in here because the beauty contest game is my absolute favorite thought experiment. If you're listening to this, try to play along in your head. It's a fun one. The rules are simple. Imagine a massive room full of people. Everyone has to write down a number between zero and 100. Okay. The winner is the person whose guess is exactly two-thirds of the average of everyone else's guesses. So think about what you would guess. It is a brilliant trap because it forces you to think about how other people think. Exactly. Yeah. So let's walk through the levels. If you assume everyone else in the room is just completely random and clueless, the average of all their random guesses from zero to 100 will be 50. Right. So your winning move is two-thirds of 50, which is roughly 33. That is level one thinking. You are one step ahead of the crowd. But what if the crowd is smart? Right. What if everyone else in the room also did that exact same math? If everyone realizes the average should be 50, then everyone guesses 33. And if everyone guesses 33, then the new average is 33. So you need to be one step ahead of them. Two-thirds of 33 is 22. That is level two thinking. And if you keep following that logic down the rabbit hole, level three, level four, level five, the numbers get smaller and smaller. Until they hit zero. Exactly. The only mathematical place where the sequence finally stops and stabilizes, the Nash equilibrium, is zero. Because two-thirds of zero is zero. If you are a perfectly rational game theorist, the only answer you can possibly write down is zero. But here's the magnificent irony of the human condition. If you confidently walk into that room and guess zero, you will lose the game. You will lose spectacularly. Why? Because you assumed a level of rationality in the rest of the room that simply does not exist. Real world experiments show that most people stop at level one or level two. They get tired or they assume everyone else is dumb. Yeah. The actual winning guess in most of these studies is usually somewhere around 22 or 33. If you play the pure, perfect Nash equilibrium, you fail. Because being perfectly rational in an irrational world is a fundamentally irrational strategy. That is a great way to phrase it. Being mathematically perfect when your opponents are boundedly rational will cost you the game. You have to play the players, not just the board. But wait, I want to push back on this a little bit. If human beings deviate by 15 to 45 percentage points, if we burn our own money out of spite, and if we only think one or two levels deep before getting a headache, isn't Nash equilibrium basically useless for predicting real life? It seems that way, doesn't it? Especially in those one-shot scenarios I mentioned earlier, like buying a house or negotiating a salary. You only get one shot, so the water never has time to settle. Why do economists even care about Nash equilibrium if humans are this messy? Well, if we connect this to the bigger picture, you have to understand the critical difference between prediction and stress testing. Okay, what's the difference? Is Nash equilibrium a perfect crystal ball for predicting exactly what an irrational human being will do on a Tuesday morning? No, absolutely not. But major Fortune 500 companies use it heavily, not for calculating human behavior, but for something called structured anticipation. Structured anticipation. What does that look like in a boardroom? It looks like, looking at the worst case scenario, they use Nash equilibrium to stress test their corporate strategies against the absolute bedrock of a market. It tells you where the structural gravity of a situation is pulling everyone. So it's about a long game. Yes. Even if people deviate in the short term, even if there is splashing and noise and emotion, the Nash equilibrium tells you where the unyielding market forces are eventually going to drag the system over a long enough timeline. So the takeaway for the listener is this. If you are in a repeated game, a long-term relationship, a daily interaction with a vendor, expect things to eventually settle near that rational equilibrium. The gravity will win. Right. But in a one-shot game, expect massive, messy, emotional human deviation. Never rely on the person sitting across from you being perfectly rational. So we have these boundedly rational, emotional humans who care about fairness but get confused by complex math. This creates a perfect bridge to the next massive question. What happens when they interact? Exactly. What happens when you put multiple of these messy humans into a room together and force them to interact? We've talked about how people splash around the Nash equilibrium, but how do the rules of the room change the splashing? To explore this, we have to talk about the absolute crown jewel of game theory, the Prisoner's Dilemma. The famous Prisoner's Dilemma. Let's get into it. It's the canonical demonstration of how smart, self-interested individuals can make collectively disastrous decisions. So smart individuals making collectively stupid decisions. Exactly. It's a two-player game where both players would be significantly better off if they just cooperated, but each individual has an undeniable mathematical incentive to defect and betray the other person. Walk us through the classic setup. Paint the picture of the interrogation room for the listener. Sure. Imagine you and an accomplice have been arrested for a crime. The police put you in two completely separate, soundproof interrogation rooms. You have absolutely no way to communicate with your partner. Okay. Totally isolated. Right. The detective comes in and gives you a choice. If you both stay silent, if you cooperate with each other, the police only have enough evidence to give you both one year in jail. That is a great outcome for both of you. One year is entirely manageable. I just have to trust that my partner keeps their mouth shut. Right. But here's the trap. The detective says if you betray your partner and testify against them, but your partner stays silent, you will walk out of here completely free today. Oh, wow. And your partner will get 10 years in prison. And the reverse is also true. If you stay silent and your partner betrays you, you get the 10 years and they go free. And what if we both talk? If you both panic and betray each other, you both get five years. It's agonizing because no matter what my partner does, my individual selfish best move is always to betray them. Let's look at the math. Right. If my partner stays silent, betraying them gets me zero years instead of one year. If my partner betrays me, betraying them back gets me five years instead of 10 years. Mathematically, I must betray them. Exactly. The Nash equilibrium of the prisoner's dilemma is mutual defection. Two perfectly rational people will both betray each other, both get five years in prison, and completely miss out on the one year sentence they could have enjoyed if they had just trusted each other. It is the tragedy of rationality. It really is. But wait, earlier with the ultimatum game, we established that humans are not perfectly rational. We established that we care deeply about fairness and spite. So when researchers actually run the prisoner's dilemma with real people, do we actually betray each other as much as the math says we should? This is where we arrive at what might be the signature finding of this entire deep dive. It comes from a monumental meta-analysis published by David Sallie in 1995. OK, what did he find? Sallie looked at decades of prisoner's dilemma experiments, thousands and thousands of human subjects to see what actually gets human beings to cooperate instead of defecting. And the numbers he found are absolutely staggering. Give me the numbers. He discovered that if you simply allow the participants to have face-to-face communication before they are separated into the interrogation rooms, cooperation boosts by 40 to 50 percentage points. Let me stop you there. 40 to 50 percentage points. Yeah. Just from talking. Just from talking, looking someone in the eye and saying, hey, we are going to get through this together. That is incredible. Furthermore, if you change the rules so that their decisions are visible to each other rather than anonymous, visibility adds another 25 to 40 percentage points. Because they know they'll be seen. Right. And if you introduce an external rule, the threat of punishment for betraying cooperation jumps by another 30 to 50 percentage points. Here is where it gets really, really interesting. And I want the listener to grasp this. We are talking about the exact same human beings across these different setups. The exact same people. It is not that some people are inherently born as glowing angelic cooperators and other people are born as evil sociopathic defectors. The exact same person who will ruthlessly betray a stranger in an anonymous one-shot computer game will happily and loyally cooperate if they can just look the person in the eye first. Yes. The biology didn't change. The rules changed. Different rules yield entirely opposite behavior. Structure determines character. Structure determines character. That is perhaps the most profound practical lesson a game theory has to offer humanity. If you put good people in a badly structured game, they will behave badly. If you put selfish people in a well-structured game, they will cooperate. And this concept was proven beautifully, not just with humans, but with code. Right. In the 1980s by Robert Axelrod. Ah, yes. Axelrod's computer tournament. This is an amazing piece of history. Let's set the scene for Axelrod's tournament because it sounds like something out of a sci-fi novel. It was the dawn of complex computing. Axelrod wanted to know how cooperation could ever evolve in a world of purely selfish individuals. I mean, if biology is just selfish genes fighting for survival, how did altruism ever arise? Good question. So he hosted a computer tournament. He sent letters to experts all over the world, top economists, evolutionary biologists, computer scientists, sociologists, and invited them to submit computer programs, to play a repeated prisoner's dilemma against each other. It was literally a digital gladiator arena. These academics submitted incredibly complex, highly sophisticated blocks of code. They wrote programs designed to trick opponents, feign weakness, exploit patterns, and brutally outmaneuver the competition. Some of these programs were hundreds of lines of dense code. And yet, when Axelrod ran the simulation, having these programs play hundreds of rounds against each other, the ultimate winner of the massive tournament was a program that was only four lines of code. Submitted by a psychologist named Anatole Rapoport, it was called Tit for Tat. Tit for Tat. Four lines of code beat the smartest algorithms in the world. It did. Tit for Tat is beautifully, almost aggressively simple. On the very first move of the game, it defaults to trust. It cooperates. It starts nice. Yes. And after that first move, it simply looks at whatever its opponent did on the previous round, and it mirrors it. If you cooperate, it cooperates. If you betray it, it betrays you right back on the very next turn. So why did it win? How did a mirror defeat the most complex strategic algorithms on Earth? Axelrod analyzed the results deeply, and he found that Tit for Tat succeeded because it possessed four key characteristics that the other, more complex programs lacked. OK, what were they? First, it is nice. It never, ever defects first. It never initiates violence. OK. Second, it is retaliatory. It is not a pacifist. If you betray it, it instantly punches back. It doesn't let itself be a sucker. Retaliatory. Makes sense. Third, it is forgiving. And this is crucial. If the opponent realizes their mistake and goes back to cooperating, Tit for Tat immediately drops the grudge and goes back to cooperating too. So nice. Retaliatory, forgiving. And the fourth. And fourth, it is clear. Its logic is so simple that its opponents instantly understand how to play with it. There is no confusing its intentions. Be nice, set firm boundaries, forgive easily, and communicate clearly. It sounds less like game theory and more like a blueprint for a healthy marriage. It sounds like the perfect algorithm for human relationships. It does. But nothing is perfect. There is a twist in the Tit for Tat story. It has a fatal flaw. The fatal flaw of noise. Yes. In Axelrod's original, pristine computer tournament, every single move was executed exactly as intended. There were no typos. But in the real world, the world you and I live in, there is noise. What do you mean by noise in this context, like literal sound? No, think of noise as friction or simple human error. You might deeply intend to cooperate with your coworker, but you're exhausted. And your email sounds incredibly harsh. You might mean to be helpful to your partner, but you forget a key detail. The signal drops. Ah, I see. So what happens to a strict Tit for Tat strategy if there is just a 5% error rate in communication? If two Tit for Tat players are happily cooperating and one accidentally defects due to a misunderstanding, what does the other one do? Well, Tit for Tat is perfectly retaliatory. So if I accidentally send a harsh email, you, playing Kit for Tat, must retaliate with a harsh email of your own. Right. But then my algorithm looks at your harsh email and I retaliate against your retaliation. We lock into this endless death spiral of mutual destruction, punishing each other forever over a dropped cell phone signal. Precisely. In a noisy environment, strict Tit for Tat destroys itself. It lacks the capacity to absorb an accident. So researchers had to develop upgrades to make the strategy robust against the friction of real life. How do you upgrade it? One of the best upgrades is called generous Tit for Tat. It does the exact same thing as the original, but it includes a small built-in probability of forgiveness, usually around a 5% to 10% forgiveness rate. So even if you defect, there's a 5% to 10% chance I'll just look the other way and cooperate anyway just to try and hit the reset button on the relationship? Yes. It acts as a shock absorber for reality. Another highly effective upgrade is a strategy called win-stay-lose shift. Win-stay-lose shift. How does that work? It is incredibly simple cognitively, which is how humans actually operate. It just says, if my last move resulted in a good outcome for me, I'll do it again. If my last move resulted in a bad outcome, I'll change my behavior. Both generous Tit for Tat and win-stay-lose shift survive real-world noise vastly better than strict, unyielding retaliation. It's comforting to think that forgiveness is mathematically optimal, but the drama in the academic community didn't stop there. Because a few decades later, the idea that niceness always wins was seriously threatened. Oh, yes. The 2012 extortion drama. Right. In 2012, two researchers, Press and Dyson, published a discovery of zero-determinant strategies, and it sent absolute shockwaves through the field. It was a very dark day for game theorists who believed in the inherent power of cooperation. Press and Dyson mathematically proved that a player could use a specific, incredibly complex probabilistic strategy to unilaterally control the ratio of payoffs in a repeated game. In English, what does that mean? It means you could force your opponent into an extortionate relationship. The zero-determinant strategy was designed so that no matter what your opponent did, you would always ensure you got a larger share of the pie. Wow. And here is the truly insidious part. Because of the way the math worked, your opponent's only rational choice to maximize their own meager points was to accept being exploited. They mathematically had to bow to the extortion. So they proved that nice strategies could actually be systematically dominated by extortion. The bad guys could mathematically win. Yes. I can imagine evolutionary biologists sweating through their tweed jackets reading that paper. If extortion is the mathematical victor, how did human society ever build roads and hospitals? Exactly. But the despair was thankfully short-lived. Because just one year later in 2013, a researcher named Hilby and his colleagues published a brilliant reversal. The Hilby reversal. What did they find? They pointed out a crucial flaw in the zero-determinant model. They showed that while zero-determinant extortion strategies work flawlessly in a vacuum, a one-on-one interaction against a trapped opponent, they are completely evolutionarily unstable in a broad population. Why? What happens when an extortionist meets a crowd? When an extortion strategy is introduced into a large population of interacting players, the population eventually recognizes the extortion. The players realize they are being scammed. And what do humans do when they realize they're being scammed? They retaliate. Or they simply refuse to play with the extortionist entirely. The extortionist might dominate a few one-on-one encounters, but eventually they are ostracized and their strategy collapses under its own weight. So niceness wins again. It does. Hilby proved that niceness does win again, but it's an important caveat. It is a hard-fought battle where populations have to actively enforce cooperation and punish extortion. Which brings us from the realm of computer simulations back down to the reality of the office floor. We know that not everyone in your department or your neighborhood is going to read Hilby's 2013 paper on evolutionary game theory. Probably not, no. So how do the masses, everyday normal people, actually behave when placed in cooperative scenarios? To understand the masses, we look at the vital work of Urs Fischbacher and his colleagues from 2001. They wanted to categorize how normal people actually approach cooperation, and their findings should fundamentally change how anyone listening to this thinks about leadership. Okay, what do they categorize? They found that a massive majority of humanity, roughly 60% of people, fall into a category they called conditional cooperator. A conditional cooperator? Yeah. Defined as a person who matches the cooperation level of others. What does that look like in practice? It means they are chameleons. They look around the room, they see what everyone else is doing, and they match that baseline. So if the group is working hard? If the group is working hard, arriving on time, and sharing credit, the conditional cooperator says, okay, that's the culture here. I will work hard and share credit. But if the group is slacking off? If the group is slacking off, stealing lunches from the fridge and pointing fingers, the conditional cooperator says, well, I'm not going to be the only sucker working hard, and they slack off too. They just match the room. Exactly. Now, there is another distinct group in Fischbacher's research. About 30% of the population are pure free riders. The 30%? Right. They are the defectors. They will take advantage of the system, dodge work, and optimize for their own selfishness, regardless of what the rest of the room is doing. If you are listening to this right now, and you manage a team or run a community organization, or even just organize a family reunion, that 60-30 statistic should completely rewire how you operate. Think about the rules your HR department or your manager sets up. So often, leadership designs rules explicitly to catch and punish the 30% of free riders. Yeah, draconian trapping software. Strict punch-in times. Hostile oversight. But what does that actually do? It creates an atmosphere of deep distrust. It makes the environment toxic. And the 60% of conditional cooperators look around, sense the toxicity, and adjust their behavior downward. It is a profound leadership failure. Instead of fixating on the 30% of bad actors, you must design your rules to make good behavior highly visible and celebrated for the 60% who just want to match the room. If you manage a team, don't design your rules around punishing the 30% free riders. Design your rules to make good behavior visible for the 60% who just want to match the room. Exactly. If the conditional cooperators can clearly see each other cooperating, they will lock into a high-cooperation state, and the cultural gravity will pull the organization upward. There's also cross-cultural variations to this, as shown by Henrik and colleagues in 2010. But you have to be careful of anti-social punishment, where defectors actually try to punish cooperators. But overall, designing for the conditional cooperator is key. We've seen how rules change behavior organically, how structure determines character. But what if we work backward? What if we know the exact societal behavior we want, and we custom-build the math of the game to get it? This is a specialized, highly impactful field of economics known as mechanism design. Mechanism design. Let's define that. It is the reverse engineering of game theory to get desired behavior. You start with the goal you want to achieve, and you meticulously design the incentives and the rules to ensure that rational players inevitably arrive at that specific goal. And mechanism design isn't just academic whiteboard theory. It literally saves human lives. One of the most beautiful examples of this is the Roth Kidney Exchange. An incredible real-world application. Let's paint the picture of the before state, because the before state was a tragedy of inefficiency. Imagine someone you love deeply needs a kidney transplant to survive. You are perfectly willing to give them one of your kidneys. You are ready for surgery. Right. But the doctor runs a blood test, and there's a biological roadblock. Yeah. Your blood types or tissue types are incompatible. If you give them your kidney, your loved one's immune system will violently reject it. It's devastating. Under the old pre-mechanism system, you were simply out of luck. The swap rate for incompatible willing donors was practically non-existent. Maybe zero to five percent. You had the will to save a life, but no mechanism to execute it. So economists Alvin Roth, Typhon Semmes, and Utku Unver stepped in. They didn't try to change human biology, obviously. And they didn't try to guilt more people into donating. They engineered a new mathematical mechanism to facilitate what are called chain swaps. How does a chain swap work? It relies on a centralized algorithmic intervention. Suppose donor A wants to give to patient A, but they don't match biologically. Across the country, donor B wants to give to patient B, but they also don't match. OK. Two incompatible pairs. Right. The algorithm looks at the entire national pool of these incompatible pairs. It analyzes the complex web of blood types. And it discovers that donor A actually perfectly matches patient B, and donor B perfectly matches patient A. So they cross over. Yes. The algorithm coordinates a simultaneous, mutually beneficial swap. And it doesn't just stop at two pairs, right? These chains can become incredibly complex. You can have donor A giving to patient B, donor B to patient C, all the way down the line. Precisely. Because of this new mechanism, the efficiency of the system skyrocketed. Suddenly, 30 to 40 percent of incompatible pairs were able to successfully swap kidneys. Thousands and thousands of lives were saved. And not from an increase in human generosity. The bravery was already there. It was purely from a structural upgrade. Exactly. The rules of the game were fixed. It's a triumph. But mechanism design has a darker side. When you attempt to engineer complex systems involving millions of human lives, the side effects can be devastating. And this brings us to a gut punch reversal. England's school equity paradox. The school choice problem is notoriously difficult. Let's look at the original flawed system, often called the Boston Mechanism. If you are a parent, you will feel the stress of this immediately. Under the Boston Mechanism, parents had to rank their top school choices. If you ranked a school as number one, you had priority. Right. But if that school was incredibly popular and you didn't get in, you were bumped down to your number two choice. However, by the time they looked at your backup, all those seats were likely already taken by the parents who ranked it as their number one choice. It creates a terrifying strategic gamble. If your true first choice is an elite school, applying to it is a massive risk. If you miss, you lose your safe backup school, too. So the system forced parents to lie. Studies show that only about 60 percent of parents were actually reporting their preferences truthfully. Only 60 percent. The other 40 percent were actively gaming the system, hiding their true preferences, and strategically ranking safe schools first. And worse, this heavily penalized unsophisticated families, usually disadvantaged families, who didn't understand the unwritten strategy of the gamble. They'd rank elite schools honestly, get rejected, and end up in the worst schools. It was a disaster. So Mechanism designers swooped in with the Deferred Acceptance Algorithm, or DA, for short, pronounced Deferred Acceptance. How does the DA algorithm fix the gamble? In the Deferred Acceptance Algorithm, applications are iteratively processed. A school looks at the applicants and provisionally accepts a student based on priority-like test scores or proximity. But crucially, the acceptance is only provisional. It is deferred. OK, so if a different student with higher priority applies later? The school can bump the first student out. So if my kid gets bumped, what happens? Do they fall into the abyss like the old system? No, and this is the genius of it. The bumped student immediately moves down their list to their second choice, where they are evaluated against the current provisional pool. They might bump someone else out, causing a cascade. It's a matching algorithm. Exactly. The mathematical magic is that DA is strategy-proof, meaning truth-telling is always, mathematically, the absolute best approach. There is zero strategic advantage to lying. And on paper, it worked beautifully. When DA was introduced, truthful reporting by parents jumped from 60% to 80 or 85%. Parents could finally just list the schools they actually wanted without gambling. It sounds like a massive win. But here is the twist. A 2021 study by Terrier, Pathak, and Wren looked at what actually happened to the demographics. And they found that the new algorithm actually reduced access to good schools for disadvantaged families. Wait, how does a perfectly fair algorithm hurt the people it was supposed to help? Well, under the old scary Boston mechanism, affluent parents who had a lot to lose often played it safe. They didn't want to risk losing a perfectly good neighborhood school by gambling on an elite school. So they hedged. And that hedging left a vacuum at the elite schools. Exactly. A vacuum that was sometimes filled by less strategic, lower-income parents. But when the DA algorithm removed all the strategic risk, the affluent parents realized they could swing for the fences with zero penalty. So they flooded the elite schools with applications. Right. And because elite schools prioritize acceptances based on metrics that correlate with wealth-like test scores from expensive tutoring or geographic proximity to expensive neighborhoods, the affluent kids won the seats. But wait, I have to argue the other side here. Deferred acceptance is objectively better. It eliminates the guessing game. The equity issue is just a side effect of broader societal inequality. The algorithm itself is perfectly fair. This raises an important question. If the side effect of your perfect mechanism hurts the most vulnerable people in the system, is the system actually fixed? Strategy-proof and equitable are not the same thing. That's a profound point. Mathematical fairness can sometimes amplify pre-existing inequality. Researchers at CPEO actually proposed a structural fix to this, reserving roughly 15 percent of the seats at top schools specifically for disadvantaged students. And did that work? Simulations showed this simple patch could close the equity gap by 16 to 17 percent without destroying the strategy-proof nature of the algorithm. Mechanism design works great when everyone's cards are on the table. But what happens when someone is hiding their hand? That brings us to the infrastructure of hidden information. Economists categorize this into two main concepts. The first is adverse selection. Adverse selection. Let's define that. Adverse selection occurs when there is hidden information before a contract is signed. One party knows something critical that the other party doesn't. Give us an example. The classic example is George Akerlof's 1970 paper on the market for lemons. Imagine a used car market. There are good cars, peaches, and terrible cars with hidden flaws. Lemons. Only the seller knows if it's a peach or a lemon. So the buyer, taking on risk, will only offer an average price. Right. But the seller of a peach refuses to sell their great car for an average price, so they leave the market entirely. Eventually, only lemons are left, and the market collapses. Hidden defects cause market collapse. We see this everywhere. In housing, sellers who have hidden structural defects discount their homes 5 to 15% more aggressively to force a quick sale before an inspector figures it out. It happens at the micro level and the macro level. Like high-frequency trading, HFT firms exploit millisecond information gaps between stock exchanges. They see a price move a fraction of a second before anyone else and exploit that hidden gap. That front-running generates an estimated $20 billion annually. That's adverse selection hidden knowledge before the deal. The second concept is moral hazard. Moral hazard. This occurs after a contract is signed. It's risk-shifting. The classic example is renting a car. Once you buy the expensive daily insurance waiver, the financial risk is completely transferred from you to the insurance company. So you might drive a little faster or park closer to the shopping carts. Exactly. So how do we solve this? How do we build trust when we can't see the other person's hand? What are the solutions? One of the primary mechanisms is called signaling, developed in a 1973 model by Michael Spence. Spence argued that for a signal to be believable, it must be inherently costly to fake. Like a college degree. Labor statistics show a 25 to 40 percent wage premium for college grads. Right. And a large part of that premium is just a costly signal to prove competence to employers. It proves you have the diligence to sit through four years of rigorous grading. It's too costly for a highly unreliable person to fake. Reputation is another infrastructure we use. Look at the famous Resnick eBay study. They found that moving a seller's profile from zero reviews to exactly 100 positive reviews boosts the transaction success rate from 85 percent to 97 percent. The digital stars act as a structural proxy for trust. And we need these proxies because of human psychology. We have a deep phenomenon known as betrayal aversion. People hate being lied to far more than they hate losing money to bad luck. Because betrayal attacks the rules of the game itself. Exactly. Studies show that betrayal reduces your future trust by 30 to 50 percentage points more than an equivalent random loss. And sometimes trying to design a system to fix a problem creates the ultimate betrayal, a perverse incentive. Let's talk about the United Nations Clean Development Mechanism story. This is the ultimate cautionary tale. The intent was to pay developing nations to reduce greenhouse gas emissions. But it created the HFC-23 loophole. Hydrofluorocarbon-23, a deeply toxic greenhouse gas. The UN said to chemical plants, if you capture and destroy this HFC-23 gas instead of venting it, we will reward you with carbon credits you can sell. Let's look at the math here. Destroying the HFC-23 only cost the industry roughly $100 million. But the UN formula awarded them carbon credits worth $4.7 billion. So what did the plants do? They intentionally increased their pollution. They churned out extra toxic gas just so they could turn around, incinerate it, and get paid billions of dollars. They turned polluters into ransom seekers. We see similar backfires when we try to engineer civic duties. In 2020, England aggressively changed its organ donation rules to a soft opt-out policy. The logic was, if you presume everyone implicitly consents unless they opt out, donation rates should skyrocket. Right. The models projected a 78 percent consent rate. But the actual consent rate fell to 61 percent. Why? Because of family uncertainty. Under the old opt-in system, families knew for sure what their loved one wanted. But under opt-out, families faced terrifying uncertainty in a moment of grief. So they defaulted to saying no. The math ignored the emotion. And the ethical complexities go even deeper with Iran's compensated kidney market. Iran legally pays cash to living kidney donors. And from a pure math perspective, it eliminated their waitlist. It did. But the reality is that deeply impoverished people are selling body parts out of sheer desperation. It brings in Michael Sandel's critique. Turning a civic duty into a market transaction can destroy intrinsic motivation. OK, let's pivot. Earlier, we talked about the price of anarchy and traffic. But there's a modern version of this tax, and it's hitting your wallet directly. What happens when the players are algorithms instead of humans? This is the absolute frontier of antitrust law. We have to look at the Department of Justice's case against RealPage. And as a reminder, we are presenting this strictly as allegations and legal theory. Right. RealPage is a software company for property managers. The DOJ alleges that RealPage uses a hub-and-spoke model to artificially inflate rent prices. In the old days, a cartel was landlords meeting in a smoke-filled room to fix prices. But the DOJ alleges RealPage acts as the digital hub. The individual landlords are the spokes. RealPage allegedly collected non-public data from competitors to generate unified rental pricing. This is algorithmic collusion. Algorithms independently converging on anti-competitive prices. No human ever spoke to another landlord. The algorithm was the agreement. And it's not isolated. The Federal Trade Commission brought a case against Amazon regarding Project Nessie. The FTC alleges Amazon used an algorithm to see if competitors would follow price hikes. And if competitors matched the higher price, Nessie locked in the inflation, generating an alleged $1 billion in excess revenue. They also allegedly penalized sellers via the buy box. But here is the antitrust dilemma. The Sherman Antitrust Act requires a meeting of the minds, a conspiracy. The law is designed for smoke-filled rooms, not software code. Is the legal framework just broken? The framework is bending, not breaking. The DOJ's hub-and-spoke theory is testing this. Plus, Senator Klobuchar introduced the Preventing Algorithmic Collusion Act, creating legal presumptions. And the European Union AI Act classifies these algorithms as high risk. But RealPage is easy because there's a company in the middle. What happens when independent pricing algorithms just learn to collude on their own? That brings us to the Calvano et al. study. They looked at Q-Learning agents' advanced AI pricing algorithms. They set two independent algorithms to compete with no explicit programming to collude. And what happened? The algorithms independently discovered super competitive prices. Without any secret meetings, they achieved what human cartels do. They learned to keep prices artificially high through trial and error. And it's not just pricing bots. The newest players sitting across the table from us are large language models, or LLMs. Yes, when AI learns to play. A 2024 study had LLMs play the Ultimatum game and the Prisoner's Dilemma. Oh, did you? GPT-4 mimics human fairness norms in the Ultimatum game, rejecting unfair offers. But in the Prisoner's Dilemma, it is overly cooperative. It cooperates far more frequently than humans do, revealing biases from its training data. But that changes based on the prompt. I was reading a Moonlight review of Willis et al. They had LLMs play 1,000 round iterated tournaments with 10% noise. And the critical takeaway is that the prompt dictates if the agent is cooperative or aggressive. If you prompt it to be ruthless, it's ruthless. Prompt design is no longer just engineering. It is mechanism design. Exactly. And we see this collision in the wild. A 2025 study by Shaw looked at ride-sharing apps like Uber and Lyft. They used game theory to dynamically price rides. And drivers are colluding against the algorithms. Drivers collectively log off at the airport to trigger a massive price surge, then log back in simultaneously to capture higher fares. It's incredible. So what does this all mean for you? Let's build a rapid-fire toolkit based on everything we've covered today. All right. Number one, diagnose the game. Is it one shot or repeated? Match your strategy to the structure. Number two, design for the 60%. Build visible systems for conditional cooperators, not the 30% free riders. Number three, build your reputation infrastructure to solve information asymmetry. And number four, add forgiveness. Generous tit-for-tat beats strict retaliation when real-world noise happens. That's the toolkit. Remember that traffic jam from the opening? The problem was never the drivers. It was the road. Game theory's deepest lesson is that intelligence doesn't guarantee good outcomes. Structure does. The same brilliant people will cooperate or betray, depending entirely on the rules of the game. As we enter a world where you're negotiating not just with boundedly rational humans, but with rapidly learning algorithms, your ability to read the structure of the game is your only true defense. As you wrap up this UDM Research episode, remember, you're already in dozens of games. Your compensation structure, your team dynamics, your market position. Audit whether the rules reward the behavior you actually want. And if they don't, change the rules. For the full briefing and more episodes like this one, visit udm.ai. Yo, yo, Dom A dot AI. Keep analyzing the board. Research dot UDA dot me. That is Y-U-D-A dot M-E.

34 sources · 32 min read

Section 01

The Beautiful Trap: Why Smart People Make Collectively Stupid Decisions

Imagine you're stuck in traffic. You can see a faster route on your phone, so you take it. So does everyone else. Within minutes, that route is jammed too — and now both roads are worse than if everyone had stayed put. Congratulations: you've just experienced the price of anarchy, and it's one of the most important ideas you've never heard of.

Game theory is the mathematical study of strategic interaction — what happens when your best move depends on what everyone else does. It sounds abstract. It isn't. Game theory now determines which kidney you receive, which school your child attends, what rent you pay, and whether the pricing algorithm on your favorite app is quietly colluding with its competitors.

The field's founding insight belongs to John Nash, the Nobel laureate whose life was dramatized in A Beautiful Mind. Nash equilibrium describes a stable state where no player can improve their outcome by changing strategy alone. It's elegant. It's also, as decades of experimental evidence have shown, a deeply imperfect description of how humans actually behave.

A landmark meta-analysis by Camerer and Ho examined 122 experimental studies and found that while subjects do converge toward Nash equilibrium over repeated plays, the speed and precision vary dramatically by game structure (Camerer & Ho (1999), 'Experience-Weighted…). In simple two-player games, convergence happens within about ten rounds. In larger, more complex games, players get lost. Typical deviation from Nash equilibrium after ten rounds of play was still 15–25% of the possible range — a significant gap between theory and reality.

The problem deepens in one-shot interactions — the kind most of us face daily. In the famous ultimatum game, where one player proposes how to split a sum of money and the other can accept or reject, Nash equilibrium predicts the proposer should offer nearly nothing and the responder should accept any positive amount. What actually happens? Proposers offer roughly 40% of the pot, and responders reject unfair offers 40–50% of the time, even when rejection means both players get nothing (Güth, Schmittberger & Schwarze (1982), 'An…). That's a 35–45 percentage point deviation from the "rational" prediction.

So if humans aren't calculating Nash equilibria, what are they doing? The most compelling answer comes from a framework called quantal response equilibrium, introduced by McKelvey and Palfrey, which assumes players make probabilistic errors — they play noisy best responses rather than perfect ones (McKelvey & Palfrey (1995), 'Quantal Respon…). As rationality increases, play approaches Nash but never quite reaches it. A more recent 2025 study directly comparing large language models and humans in strategic games confirmed that both groups exhibit bounded rationality that systematically departs from Nash predictions (arXiv:2506.09390 (2025), 'Beyond Nash Equi…). Neither silicon nor carbon computes equilibria cleanly.

Perhaps the most striking evidence comes from cognitive hierarchy models, which propose that people reason in levels: a Level-0 player acts randomly, a Level-1 player best-responds to Level-0, and so on. Empirical research consistently finds that most humans reason only one to two levels deep (McKelvey & Palfrey (1995), 'Quantal Respon…). In beauty contest games — where players guess two-thirds of the group's average guess — level-k reasoning predicts actual behavior far better than Nash equilibrium, which would predict everyone guessing zero. Even professional traders, tested by Duffy and Nagel, reached lower numbers than amateurs but rarely approached the theoretical limit (Camerer (2003), Behavioral Game Theory, Pr…).

Here's the counterintuitive twist: despite all these deviations, Nash equilibrium keeps emerging. Brown and subsequent researchers showed that in many repeated settings, play converges to Nash even when subjects don't report thinking strategically (Camerer (2003), Behavioral Game Theory, Pr…). Equilibrium appears to be an emergent property of learning — not a conscious calculation but something the system settles into, like water finding its level.

This distinction matters enormously. In boardrooms, Nash equilibrium functions less as a calculation tool and more as what one consulting case study describes as "structured anticipation" — competitor reaction functions, pricing scenarios, and no-regret strategic moves robust across multiple possible responses (Tang (n.d.), Flevy consulting case study —…). Fortune 500 companies engage major consulting firms not to solve equations but to model competitive scenarios using frameworks, benchmarks, and historical reaction analysis (Tang (n.d.), Flevy consulting case study —…). Nash equilibrium is a metaphor for stability, not a literal computation.

Typical deviation from Nash equilibrium after ten rounds of play was still 15–25% of the possible range — humans converge, but they never arrive.

What this means for listeners: The implication is that you're probably not calculating optimal strategy — and neither is anyone else. What matters is whether you're in a one-shot interaction or a repeated game, and whether the environment gives you feedback to learn from. In repeated interactions with the same people, behavior naturally converges toward equilibrium through trial and error. In one-shot decisions — a job negotiation, a major purchase — expect significant deviation from any 'rational' prediction, including your own.

Section 02

The Cooperation Paradox: Same People, Different Rules, Opposite Behavior

Here is the single most counterintuitive and empirically robust finding in all of game theory: the same people cooperate at 70–80% or defect at 80–90% depending purely on the rules of the game. Not different people. Not different cultures. The same subjects, placed in different structures, produce radically different outcomes.

The prisoner's dilemma is the canonical demonstration. Two players each choose to cooperate or defect. If both cooperate, both do well. If one defects while the other cooperates, the defector wins big and the cooperator loses. If both defect, both do poorly. The Nash equilibrium is mutual defection — and in single-shot anonymous games, cooperation rates sit at a dismal 10–25% (Rapoport & Chammah (1965), Prisoner's Dile…).

But change the structure and everything changes. Sally's 1995 meta-analysis, one of the most cited in the field, documented the effects with striking precision: face-to-face communication before the game boosts cooperation by 40–50 percentage points, from roughly 30% to 70–80% (Sally (1995), 'Conversation and Cooperatio…). Making actions public rather than anonymous adds 25–40 percentage points. Allowing punishment of defectors adds another 30–50 percentage points (Fehr & Gächter (2002), 'Altruistic Punishm…). Simply making the game indefinitely repeated — so players don't know when it ends — raises cooperation by 35–55 percentage points (Axelrod (1984), The Evolution of Cooperati…).

Robert Axelrod's famous computer tournaments in the 1980s revealed the strategy that thrives in this environment: tit-for-tat (Axelrod (1984), The Evolution of Cooperati…). Cooperate on the first move, then mirror whatever your opponent did last round. It's devastatingly simple — and it won against hundreds of complex alternatives submitted by game theorists, evolutionary biologists, and computer scientists worldwide. The strategy succeeds because it is "nice" (never defects first), "retaliatory" (punishes defection immediately), "forgiving" (returns to cooperation after a single retaliation), and "clear" (opponents quickly learn what to expect).

But tit-for-tat has a fatal weakness: noise. In any real-world interaction, mistakes happen — a misunderstood email, a delayed response, an accidental slight. When tit-for-tat encounters even a 5% error rate, it can lock into devastating cycles of mutual retaliation. Nowak and Sigmund showed that under moderate noise, pure tit-for-tat becomes unstable (Nowak & Sigmund (1992, 1993), 'Tit for Tat…). Their solution was "generous tit-for-tat" — occasionally cooperate even after the opponent defects, at a forgiveness rate of roughly 5–10%. This small modification breaks retaliation spirals and often outperforms strict tit-for-tat in noisy tournaments.

An even more intriguing alternative emerged: win-stay, lose-shift (Nowak & Sigmund (1992, 1993), 'Tit for Tat…). Repeat your last move if it worked; switch if it didn't. This strategy doesn't require tracking what the opponent did — only whether your own outcome was good. It's cognitively simpler and remarkably effective, performing comparably to generous tit-for-tat across a wide range of conditions.

The story took a dramatic turn in 2012 when Press and Dyson discovered zero-determinant strategies — mathematical proof that a player could unilaterally control the ratio of payoffs in a repeated game without the opponent's knowledge (Press & Dyson (2012), 'Iterated Prisoner's…). This seemed to upend Axelrod's finding that "nice" strategies always win. But Hilbe and colleagues quickly demonstrated that zero-determinant strategies are evolutionarily unstable (Hilbe et al. (2013), 'Evolution of Extorti…). When opponents recognize extortion, they retaliate, and the extortioner's advantage collapses. In tournaments with populations of strategies, generous versions of zero-determinant play outperform extortionate versions — niceness wins again, but through a more nuanced mechanism than Axelrod originally proposed.

The real-world implications crystallize in public goods games, where Fischbacher and colleagues classified players into types: roughly 60% are conditional cooperators who match others' contribution levels, and 30% are free riders (Fischbacher, Gächter & Fehr (2001), 'Are P…). This distribution matters enormously for policy. Targeting conditional cooperators — by making high contributions visible, for instance — works. Trying to force free riders into cooperation through surveillance alone does not. And Nikiforakis revealed a darker dynamic: anti-social punishment, where defectors punish cooperators, can collapse cooperation entirely (Nikiforakis (2008), 'Punishment and Counte…). In treatments allowing punishment of punishers, cooperation dropped by 40–50 percentage points.

Henrich and colleagues' cross-cultural experiments across fifteen diverse societies — from whale-hunting communities to slash-and-burn horticulturalists — showed that cooperation and fairness norms vary enormously by culture, with cooperation rates ranging from 10% to 90% in identical game structures (Henrich et al. (2010), 'Markets, Religion…). The same game yields profoundly different outcomes depending on the cultural context, suggesting that mechanisms designed in one society may not transfer to another.

Face-to-face communication before the game boosts cooperation by 40–50 percentage points — the same people, under different rules, become entirely different strategists.

Structural Levers That Change Cooperation Rates

Indefinite repetition No known endpoint

+35–55pp

Face-to-face communication Pre-play discussion

+40–50pp

Punishment opportunity Peer sanctioning

+30–50pp

Transparent actions Public vs. anonymous

+25–40pp

Small group size 2–5 vs. 10+

+20–30pp

Anti-social punishment Defectors punish cooperators

−40–50pp

0 +55pp

Each structural change shifts cooperation dramatically — with the same human subjects. Baseline anonymous one-shot cooperation is roughly 25%. Effect sizes from Sally (1995) meta-analysis and Fehr & Gächter (2002).

What this means for listeners: The practical takeaway is powerful: if you want cooperation, change the environment. Make actions visible, create repeated stakes, enable communication before the game begins. The same people who betray each other in anonymous one-shot encounters will cooperate beautifully when they can see each other, talk first, and expect to interact again. This applies to teams, partnerships, negotiations, and even international agreements.

Section 03

Reverse Engineering the Rules: Mechanism Design From Kidneys to Classrooms

If game theory asks, "Given these rules, what will people do?" mechanism design asks the opposite: "Given what we want people to do, what rules should we create?" It is game theory's most consequential applied product — and its track record is both extraordinary and cautionary.

Consider kidney exchange. Before algorithmic matching, a patient with an incompatible willing donor was simply out of luck. Roth, Sönmez, and Ünür designed matching algorithms that identify chains of swaps — if your donor is compatible with my patient and my donor is compatible with yours, we trade (Roth, Sönmez & Ünür (2004/2005), AER/Econo…). Before the algorithm, roughly 0–5% of incompatible pairs could exchange. After implementation in real US kidney exchange pools, participation jumped to 30–40% (Roth, Sönmez & Ünür (2004/2005), AER/Econo…). Thousands of additional transplants became possible, not because anyone became more generous but because the rules made generosity effective.

School matching tells a similar story with a darker subplot. Many jurisdictions historically used the Immediate Acceptance algorithm — commonly called the Boston Mechanism — where students who rank a school as their first choice get priority (Pathak & Sönmez (2008), 'Leveling the Play…). The problem is devastating for families without strategic sophistication: if your child is rejected from their first choice, their second-choice school has already filled its seats with students who ranked it first. Parents must gamble, often avoiding ranking their dream school to secure a "safe" backup. Abdulkadiroğlu and colleagues demonstrated that strategic misreporting was rampant, with only about 60% of families reporting true preferences. After redesigning Boston's system to incentivize truthfulness, compliance improved to 80–85% (Pathak & Sönmez (2008), 'Leveling the Play…).

The Deferred Acceptance algorithm, pioneered by Gale and Shapley, solves this elegantly. Applications are processed iteratively — schools accept students provisionally, and if a higher-priority student applies later, the school can bump a previously accepted student. Crucially, DA is "strategy-proof" for students: truth-telling is always the best approach (Pathak & Sönmez (2008), 'Leveling the Play…).

England banned the manipulable system across all local authorities in 2008, mandating the strategy-proof alternative. It should have been a triumph of mechanism design. Instead, a 2021 longitudinal study by Terrier, Pathak, and Ren revealed a profound unintended consequence (Terrier, Pathak & Ren (2021), longitudinal…). Under the old manipulable system, affluent parents often played it safe — avoiding competitive selective schools to guarantee placement elsewhere. Less-strategic lower-income parents, willing to take the risk, faced less competition for top spots. When the new system removed all risk of truthful reporting, affluent parents flooded applications to elite selective schools. Because those schools prioritize test scores — which correlate heavily with socioeconomic advantage — the influx of high-income applicants crowded out disadvantaged students (Terrier, Pathak & Ren (2021), longitudinal…).

The empirical result was that the transition to the "fair" algorithm actually reduced access to high-quality schools for disadvantaged families. Low-income students were pushed into schools with lower achievement and lower value-added scores (Terrier, Pathak & Ren (2021), longitudinal…). Researchers at CEPEO and the Nuffield Foundation have since modeled a fix: reserving roughly 15% of seats at effective schools for students eligible for free school meals, which simulations suggest would reduce the effectiveness gap by 16–17% while causing minimal disruption to the overall system (CEPEO / Nuffield Foundation, FSM quota sim…).

Spectrum auctions provide another cautionary tale. Nobel laureates Paul Milgrom and Robert Wilson designed auction mechanisms to allocate radio spectrum efficiently — and they've generated over $10 billion in revenue (Roth, Sönmez & Ünür (2004/2005), AER/Econo…). But when political objectives override allocative efficiency, mechanisms fail spectacularly. Italy's 2018 5G auction divided the critical 3.7 GHz band into two large blocks and two small ones (AGCOM Italy 5G auction analysis (2018) — c…). Since efficient 5G requires at least 40–80 MHz of contiguous spectrum and there were four operators competing, the design guaranteed that only two could build competitive networks. The resulting bidding war pushed Italian spectrum prices to roughly five times the UK equivalent, where symmetrical lot design produced a smoother allocation (AGCOM Italy 5G auction analysis (2018) — c…).

Germany's 2019 auction went further awry. The regulator reserved 100 MHz for industrial users, leaving only 300 MHz for four national operators, then set aggressive reserve prices and coverage mandates (BNetzA Germany 5G auction records (2019) +…). The auction generated over €6.5 billion but strained operator budgets so severely that actual network deployment suffered. By late 2025, German courts found the auction conditions legally questionable, forcing a restart of the entire spectrum award process (BNetzA Germany 5G auction records (2019) +…).

Philosopher Michael Sandel's critique cuts deeper still. He argues that expanding market logic into domains traditionally governed by civic duty can transform how people think about those domains entirely (Sandel, 'How Markets Crowd Out Morals,' Bo…). Paying citizens to accept nuclear waste facilities actually reduced acceptance rates — reframing what had been a civic contribution as a financial transaction. Research confirms that extrinsic rewards can undermine intrinsic motivation when payment signals distrust or reframes meaningful activity as mere labor (Sandel, 'How Markets Crowd Out Morals,' Bo…).

England's switch to the 'fair' algorithm actually reduced access to high-quality schools for disadvantaged families — mechanism design cannot correct structural inequality it refuses to see.

When Mechanism Design Helps vs. Harms

Structural inequality unaddressed

Structural inequality addressed

Good mechanism design

Equity Paradox

Audit for displacement effects

England school choice: strategy-proof algorithm crowded out disadvantaged students (Terrier et al. 2021)

Success Zone

Scale and iterate

US kidney exchange: 30–40% participation from 0–5%; Boston school redesign improved diversity 15–20pp

Poor mechanism design

Cascading Failure

Redesign from scratch

Italy 5G auction: asymmetric lots created artificial scarcity, prices 5× UK equivalent

Blunt Instrument

Add structural reforms

England opt-out organ donation: consent rate fell to 61% vs. 78% projected due to family overrides

The outcome of a designed mechanism depends on both the quality of the design AND whether the underlying domain's structural inequalities are addressed. Adapted from school choice and spectrum auction evidence.

What this means for listeners: The lesson isn't that market logic fixes everything — it's that rule design is consequential. Bad rules can harm the people they were designed to help. Whether you're designing a team incentive structure, a hiring process, or a community governance system, the mechanism matters more than the intentions behind it. And some domains — organ donation, civic participation, family care — may be fundamentally corrupted when subjected to market logic.

Section 04

Seeing Through the Fog: Information Asymmetry, Lemons, and the Infrastructure of Trust

In 1970, George Akerlof published a three-page paper that would win him a Nobel Prize and change how we understand markets. His "market for lemons" model posed a simple question: what happens when sellers know the quality of their product but buyers don't (Akerlof (1970), 'The Market for Lemons,' Q…)?

The answer is market collapse. If buyers can't distinguish good used cars from lemons, they'll only pay the average price. But sellers of good cars, who know their product is worth more, withdraw from the market. This leaves only lemons, which drives the price down further, which drives more quality sellers away. In the theoretical limit, the market for good cars simply ceases to exist.

This isn't just theory. Genesove and Mayer analyzed real home sales data and found that information asymmetry between sellers who know about structural defects and buyers who don't predicts steeper price discounts for forced sales — sellers with more hidden information price-cut 5–15% more aggressively (Camerer (2003), Behavioral Game Theory, Pr…). In financial markets, Brogaard and colleagues documented that millisecond informational advantages in high-frequency trading generate significant excess returns, with HFT firms earning an estimated $20 billion annually partly through information asymmetry exploitation (Brogaard et al. (2018), 'High-Frequency Tr…).

The twin pathologies of information asymmetry are adverse selection and moral hazard. Adverse selection strikes before a contract is signed: high-risk individuals disproportionately seek generous insurance because they know their own risk better than insurers. Moral hazard strikes after: once insured, people take on more risk because consequences are partially transferred (Empirical health insurance research on adv…). Research in health insurance confirms both forces operate simultaneously, but here's a finding that surprises most people — empirical work suggests moral hazard is likely the larger real-world constraint, even though popular understanding fixates on the lemons problem (Empirical health insurance research on adv…). Prior-year medical expenditures tend to overstate adverse selection's magnitude due to mean reversion.

Signaling theory offers a partial remedy. Michael Spence's job-market model demonstrates how education credentials — costly to acquire but credible — allow high-ability workers to distinguish themselves (Akerlof (1970), 'The Market for Lemons,' Q…). The college premium of roughly 25–40% in earnings is partly attributable to signaling, though the exact split between signaling and genuine human capital accumulation remains debated.

But the most powerful modern antidote to information asymmetry is reputation infrastructure. Resnick and colleagues analyzed eBay transactions and found that seller reputation — measured by feedback scores — strongly predicts successful transactions (Resnick et al. (2006), 'The Value of Reput…). Moving from zero reputation to over 100 reviews increased successful-sale probability from roughly 85% to 97%. In game-theoretic terms, reputation systems transform one-shot anonymous interactions — where trust collapses — into effectively repeated games where defection carries a lasting cost.

The behavioral dimension adds a critical layer. Bohnet and Zeckhauser demonstrated that people exhibit "betrayal aversion" — they are more averse to being lied to than to equivalent losses from bad luck (Camerer (2003), Behavioral Game Theory, Pr…). Information that a counterparty is untrustworthy reduces trust by 30–50 percentage points more than an equivalent payoff loss from random chance. This asymmetry means that even small revelations of dishonesty can permanently damage relationships and markets in ways that purely economic models underpredict.

Gneezy's experimental work on deception found that roughly 50–60% of senders lie when it's profitable, but 20–30% refuse to lie even when it's costless (Gneezy (2005), 'Deception: The Role of Con…). The population isn't uniformly selfish — there's genuine heterogeneity in honesty preferences. This matters for institutional design: systems that assume universal dishonesty may crowd out the substantial minority who would behave honestly without monitoring.

Moving from zero reputation to over 100 reviews increased eBay's successful-sale probability from 85% to 97% — reputation transforms one-shot games into repeated ones.

What this means for listeners: Reputation is infrastructure. Whether you're hiring, investing, or buying a used car, the systems that make hidden information visible determine whether the market functions at all. Practically, this means investing in your own reputation capital — reviews, referrals, track records — is not vanity but strategic necessity. And when evaluating others, look for costly signals: credentials, warranties, and public track records that would be expensive for a low-quality actor to fake.

Section 05

The Price You're Already Paying: Anarchy, Collusion, and Algorithms That Set Your Rent

The price of anarchy has a formal definition — the ratio of the worst-case Nash equilibrium welfare to the optimal social welfare — but its informal definition is more visceral: it's the tax you pay for living in a world where nobody coordinates (Koutsoupias & Papadimitriou (1999), STOC —…).

Koutsoupias and Papadimitriou formalized the concept, and Roughgarden proved that in networks with linear delay functions, selfish routing produces outcomes no worse than 4/3 of optimal (Koutsoupias & Papadimitriou (1999), STOC —…). That's a 33% efficiency ceiling on anarchy in the simplest case. In practice, measured welfare losses from selfish routing in traffic networks run 15–35% compared to socially optimal routing (Koutsoupias & Papadimitriou (1999), STOC —…). GPS data from the Boston area estimated the real-world routing welfare loss at 15–20% (Camerer (2003), Behavioral Game Theory, Pr…).

Braess's Paradox makes this concrete and deeply counterintuitive: adding a new road to a network can actually worsen total congestion, because individually rational drivers flood the new route and degrade everyone's experience (Braess's Paradox empirical and simulation…). The implication — that sometimes removing infrastructure improves outcomes — has been verified in both simulations and real traffic networks. It applies far beyond roads: internet packet routing, financial markets, and platform ecosystems all exhibit the same structure.

But the most consequential modern manifestation of the price of anarchy isn't in traffic — it's in your rent. The Department of Justice's case against RealPage, filed in August 2024 and amended in January 2025, alleges that the company's algorithmic pricing software acted as a central hub, collecting nonpublic transaction-level data from competing landlords and generating unified rental pricing recommendations (DOJ v. RealPage complaint (August 2024, am…). The software featured "auto accept" functionalities and deployed human pricing advisors to monitor and enforce landlord compliance, minimizing price decreases and maximizing pricing power (DOJ v. RealPage complaint (August 2024, am…).

This is algorithmic collusion in its purest form — and it tests the limits of antitrust law. Section 1 of the Sherman Act requires evidence of a "meeting of the minds" — explicit communication to fix prices. But RealPage's algorithm achieves the same result without any landlord directly talking to another. The DOJ's theory is a "hub-and-spoke" conspiracy: RealPage is the hub, landlords are the spokes, and the algorithm is the agreement (DOJ v. RealPage complaint (August 2024, am…).

Amazon's "Project Nessie" algorithm represents the same phenomenon from a different angle. The FTC alleged in 2023 that Amazon used a secret pricing algorithm to test whether competitors' algorithms would follow its price increases. If they did, the higher price stuck — generating an estimated $1 billion in excess revenue (FTC v. Amazon / Project Nessie federal com…). If competitors didn't follow, the algorithm reverted. Amazon also allegedly enforced price parity by penalizing sellers who offered lower prices on competing websites, removing their access to the Buy Box — the mechanism through which 98% of Amazon sales occur (FTC v. Amazon / Project Nessie federal com…).

The regulatory response is accelerating. In late 2025, the DOJ filed a consent decree settling its claims against RealPage, drawing a strict line: the software must cease using competitors' nonpublic information for runtime pricing and cannot use active lease data for model training unless aggregated, anonymized, and aged at least twelve months (DOJ v. RealPage complaint (August 2024, am…). Senator Klobuchar's Preventing Algorithmic Collusion Act, reintroduced in 2025 as S. 232, would create a legal presumption that a price-fixing agreement exists whenever competitors share competitively sensitive information through a common pricing algorithm (Senator Klobuchar, Preventing Algorithmic…).

The academic evidence supports the concern. Calvano and colleagues demonstrated that competing Q-learning pricing algorithms consistently learn to sustain supra-competitive prices through repeated interaction alone, without any explicit programming to collude (Calvano et al. (SSRN/arXiv) — algorithmic…). The algorithms discover tacit coordination independently — achieving what human cartels require secret meetings and whispered agreements to accomplish.

The EU is approaching the problem from a different direction. The EU AI Act, formally effective in stages from August 2024, classifies algorithms making decisions with significant socioeconomic effects — credit scoring, hiring, insurance pricing — as "high-risk" systems subject to continuous risk management, algorithmic transparency requirements, and human oversight mandates (EU AI Act text — formally effective August…). Violations carry fines of up to 7% of global annual turnover or €35 million (EU AI Act text — formally effective August…). The Act represents the first major regulatory framework that treats algorithmic mechanisms as objects of governance rather than neutral tools.

Amazon's Project Nessie algorithm tested whether competitors would follow price increases — when they did, the higher price stuck, generating an estimated $1 billion in excess revenue.

Evidence Strength: Algorithmic Collusion Claims

Computational proof Tier 1

Calvano et al.: Q-learning algorithms independently discover and sustain supra-competitive pricing in repeated games without explicit collusion programming.

85% weight

Federal litigation Tier 2

DOJ v. RealPage (2024–25): alleged hub-and-spoke conspiracy via shared nonpublic rental data; consent decree mandates data anonymization and 12-month aging.

75% weight

Federal complaint Tier 2

FTC v. Amazon / Project Nessie (2023): secret algorithm tested competitor price-following; district court denied Amazon's motion to dismiss.

70% weight

Legislative response Tier 3

Klobuchar's Preventing Algorithmic Collusion Act (S. 232, 2025): creates legal presumption of agreement when competitors share data through common algorithm.

50% weight

Industry analysis Tier 4

Trade press and antitrust reviews describe accelerating enforcement posture but note definitional challenges in proving 'agreement' under existing Sherman Act framework.

35% weight

The case for algorithmic collusion rests on converging evidence across tiers — from computational proof-of-concept to active federal litigation — but definitive causal estimates of consumer harm remain contested.

What this means for listeners: You may be paying higher rent because of an algorithm — not because any landlord decided to gouge you. The price of anarchy isn't abstract; it's on your lease, in your insurance premium, and embedded in your online shopping cart. Watch for legislation around algorithmic pricing transparency, and understand that the next generation of antitrust enforcement will target coordination that no individual human consciously chose.

Section 06

When the Rules Backfire: Carbon Credits, Organ Markets, and the Limits of Design

The most dramatic mechanism design failures don't just produce inefficiency — they create perverse incentives that actively generate the harm they were built to prevent. Two cases illustrate this with uncomfortable clarity.

The UN's Clean Development Mechanism was designed to channel capital toward cost-efficient emissions reductions in developing nations (UN CDM monitoring reports on HFC-23 loopho…). The mechanism allowed entities regulated by the EU Emissions Trading System to purchase carbon offset credits from reduction projects elsewhere. In theory, beautiful: money flows to wherever abatement is cheapest, maximizing global emissions reduction per dollar. In practice, the mechanism suffered from extreme information asymmetry. To earn a credit, a project had to prove it reduced emissions below a counterfactual "business as usual" baseline — but regulators had almost no ability to verify what that baseline truly was (UN CDM monitoring reports on HFC-23 loopho…).

The HFC-23 loophole became the most egregious exploitation. HFC-23, a potent greenhouse gas, is a byproduct of manufacturing the refrigerant HCFC-22. The cost of capturing and destroying HFC-23 was trivial — roughly $100 million across all relevant facilities (UN CDM monitoring reports on HFC-23 loopho…). But because HFC-23 is extraordinarily climate-destructive, destroying it generated an enormous volume of carbon credits, projected to yield $4.7 billion in revenue (UN CDM monitoring reports on HFC-23 loopho…). The game-theoretic incentives became perverse: chemical plants in developing nations strategically increased production of toxic refrigerant simply to generate more byproduct, which they could then destroy to harvest credits. The offset mechanism incentivized the creation of pollution for the profit of its abatement.

When the European Commission realized the scale of the exploitation and banned HFC-23 credits after April 2013, the Chinese government initially threatened to vent accumulated HFC-23 directly into the atmosphere (UN CDM monitoring reports on HFC-23 loopho…) — a stark demonstration of strategic brinkmanship in a non-cooperative climate game.

Organ donation presents the mirror image: a mechanism design challenge where the moral stakes are so high that market logic itself becomes suspect. England shifted from opt-in to "soft opt-out" organ donation in May 2020, presuming all adults have consented unless they explicitly object (England Organ Donation (Deemed Consent) Ac…). Behavioral economics predicted dramatic increases in donation — pre-implementation estimates projected consent rates would rise to 78%. The actual observed consent rate fell to 61% (England Organ Donation (Deemed Consent) Ac…).

The primary mechanism of failure was family override. Families retain the legal right to veto the presumed consent of the deceased, and 13% of refusals explicitly cited uncertainty over the deceased's true wishes (England Organ Donation (Deemed Consent) Ac…). The paradox is clear: presumed consent provides a weaker signal of donor preference than active opt-in registration. When someone actively signs up, their family knows they wanted to donate. When someone is merely presumed to consent, grieving families facing tragedy default to refusal. The implementation was further hampered by COVID-19, which strained ICU resources during the critical early period, and by deep demographic asymmetries — consent rates of approximately 70% for white patients versus 39% for ethnic minorities (England Organ Donation (Deemed Consent) Ac…).

Iran's kidney market stands in stark contrast. The world's only legal compensated market for living non-related kidney donation, established in 1988, reportedly eliminated Iran's kidney transplant waiting list by 1999 (Iran compensated kidney donation literatur…). From a strict market-design perspective, it works: financial incentives align supply with demand. But the mechanism faces intense ethical scrutiny — demographic data reveals that the overwhelming majority of vendors are young, impoverished, and motivated by acute financial distress rather than altruism (Iran compensated kidney donation literatur…).

This tension between efficiency and dignity is not a bug in mechanism design — it's a fundamental boundary. As Sandel argues, some goods are corrupted by the very act of pricing them (Sandel, 'How Markets Crowd Out Morals,' Bo…). The challenge for designers is knowing which domains benefit from market logic and which are degraded by it.

Chemical plants strategically increased production of toxic refrigerant simply to generate more byproduct to destroy — the offset mechanism incentivized the creation of pollution for the profit of its abatement.

What this means for listeners: The lesson from carbon credits and organ markets is that mechanism design is only as good as its information environment and its moral context. When designers can't verify the baseline, participants will game it. When the mechanism operates in a domain where human dignity is at stake, efficiency alone is an insufficient criterion. Before implementing any incentive system — at work, in a community, in policy — ask not just 'will this produce the right behavior?' but 'will this change the meaning of the behavior itself?'

Section 07

The New Players: When Algorithms Learn to Strategize

Everything we've discussed so far assumes the players in the game are human. That assumption is rapidly becoming obsolete.

Researchers have begun running classic game-theoretic experiments with large language models as participants, and the results are both fascinating and unsettling. A 2024 study on cultural evolution of cooperation among LLM agents found that GPT-4 makes positive offers and rejects unfair ones in ultimatum games, closely mirroring human fairness norms (arXiv:2412.10270 (2024), 'Cultural Evoluti…). In prisoner's dilemma settings, LLMs performed even more cooperatively than humans typically do, suggesting they may encode cooperative biases absorbed from their training data rather than engaging in genuine strategic calculation (arXiv:2412.10270 (2024), 'Cultural Evoluti…).

A literature review of Willis and colleagues' work examined LLM agents — specifically ChatGPT-4o and Claude 3.5 Sonnet — generating full strategies in natural language for iterated prisoner's dilemma tournaments (Moonlight literature review of Willis et a…). The methodology was rigorous: all-play-all tournaments, 1,000 rounds per game, with a 10% noise injection simulating real-world execution errors. The findings were nuanced: cooperation often succeeded, but aggressive strategies could persist under certain conditions. Most critically, the prompts given to agents materially influenced whether they leaned cooperative or aggressive (Moonlight literature review of Willis et a…).

This finding has profound implications. If prompt design shapes equilibrium selection in LLM agents, then the specification choices made by product teams become a form of mechanism design. Two platforms using identical base models could produce radically different cooperative or defective emergent behavior purely through differences in instruction design. This is governance risk masquerading as an engineering detail.

A 2025 study directly comparing bounded rationality in LLMs and humans confirmed that both exhibit systematic departures from Nash predictions, but in characteristically different ways (arXiv:2506.09390 (2025), 'Beyond Nash Equi…). Humans are inconsistent and context-sensitive. LLMs are more consistent but carry biases from training distributions that may not match the strategic environment they're deployed in. Neither is "rational" in the classical sense.

Multi-agent reinforcement learning research tells a parallel story. Leibo and colleagues at DeepMind ran deep RL agents in iterated games and found that agents independently discovered cooperation through reward shaping — without any explicit programming to cooperate (Leibo et al. (2017), DeepMind — multi-agen…). But the convergence strategies were often not tit-for-tat or any recognizable human heuristic. Instead, richer conditional strategies emerged that exploited the specific reward structure of their environment.

The ride-sharing industry provides the clearest real-world laboratory for these dynamics. Shah's 2025 analysis describes how platforms like Uber and Lyft use game theory-based pricing models incorporating real-time conditions — waiting times, road congestion, local demand — to optimize supply and demand through dynamic pricing (Shah (2025), 'Game Theory in Ride-Sharing…). The paper documents multiple pricing strategies: uniform pricing, differential customer pricing, and differential driver pricing, with increasing use of machine learning for demand prediction. The outcomes are real: reduced driver idle time, reduced customer waiting time. But the paper also flags persistent failure modes: driver collusion, price fairness concerns, and regulatory pressure (Shah (2025), 'Game Theory in Ride-Sharing…).

This is where the threads converge. Pricing algorithms that learn to collude, LLM agents whose cooperation depends on prompts, ride-sharing platforms where drivers strategize against the algorithm — the game-theoretic frontier is no longer about humans playing against humans. It's about ecosystems where human and artificial agents interact, adapt, and co-evolve in ways that no single designer fully controls.

The EU AI Act represents the first major attempt to govern this landscape (EU AI Act text — formally effective August…). But regulation designed for a world of human players may prove inadequate for one where the most consequential strategic decisions are made by systems that learn faster than legislators can write laws.

Two platforms using identical base models could produce radically different cooperative or defective behavior purely through differences in prompt design — specification choices are now a form of mechanism design.

Is Your Algorithm a Player or a Tool?

Does your algorithm learn from other agents' behavior?

Adapts pricing, recommendations, or actions based on competitor/user responses

Yes — adaptive system

Algorithm updates strategy based on observed outcomes

No — static rules

Fixed logic; does not respond to other agents

Treat as strategic player

Apply mechanism design governance: audit for emergent collusion, test prompt/reward sensitivity, monitor equilibrium drift

Monitor for gaming

Humans may still strategize against static rules; audit for exploitation

Standard oversight

Conventional software governance applies; no game-theoretic risk

A decision framework for assessing whether an algorithmic system requires game-theoretic governance — based on whether it learns, interacts with strategic agents, and can produce emergent coordination.

What this means for listeners: The most important game-theoretic question of the next decade may not be about human strategy at all. It's about what happens when the players are algorithms that learn. If you work in product design, policy, or any role that involves setting rules for systems with AI participants, understand that prompt design is incentive design, training data is institutional culture, and the emergent behavior of your system is your responsibility — even if no individual chose it.

Section 08

Playing Better Games: A Strategic Toolkit for Real Life

Game theory's deepest lesson is deceptively simple: intelligence doesn't guarantee good outcomes — structure does. The same brilliant people will cooperate or betray, compete or coordinate, depending entirely on the rules of the game they're playing. So the question that matters isn't "how do I become a better player?" It's "how do I change the game?"

Here's what the evidence says about doing exactly that.

Diagnose the game before you play it. The first step is identifying what kind of game you're in. Is it one-shot or repeated? Are actions visible or hidden? Is communication possible? Can defectors be punished? Each structural variable shifts expected cooperation by 20–50 percentage points (Sally (1995), 'Conversation and Cooperatio…). A negotiation you'll never revisit is fundamentally different from a partnership you'll maintain for years. A public commitment is a different game than a private one. Match your strategy to the structure.

Design for conditional cooperators. Roughly 60% of people are conditional cooperators — they'll match the group's behavior (Fischbacher, Gächter & Fehr (2001), 'Are P…). The strategic implication: make cooperation visible and early. If you're leading a team, publicly model the behavior you want. If you're designing a system, make high contributions salient. The 30% who are pure free riders won't change regardless, but the majority will follow the signal.

Build reputation infrastructure. The eBay evidence is unambiguous: reputation systems transform anonymous one-shot interactions into effectively repeated games (Resnick et al. (2006), 'The Value of Reput…). In any context where trust matters — hiring, partnerships, marketplace transactions — invest in systems that make track records visible and costly to fake. This includes your own track record: building a public portfolio of delivered results is not self-promotion, it's the signaling mechanism that makes markets work (Akerlof (1970), 'The Market for Lemons,' Q…).

Add forgiveness to your strategy. Strict tit-for-tat fails under noise. Generous tit-for-tat — cooperating 5–10% of the time even after a defection — breaks retaliation spirals and outperforms in realistic conditions (Nowak & Sigmund (1992, 1993), 'Tit for Tat…). In practical terms: when a colleague drops the ball, assume error before malice. One unexplained defection in a long cooperative relationship is almost certainly noise, not betrayal.

Audit your mechanisms for perverse incentives. The HFC-23 loophole, England's school choice paradox, and Italy's spectrum auction all share a common failure: designers optimized for one objective without modeling strategic responses to the rules themselves (UN CDM monitoring reports on HFC-23 loopho…) (Terrier, Pathak & Ren (2021), longitudinal…) (AGCOM Italy 5G auction analysis (2018) — c…). Before implementing any incentive system, ask: "If everyone involved were purely self-interested and fully strategic, what would they actually do?" Then ask: "Does this system change the meaning of the behavior, not just its frequency?"

Treat algorithms as players, not tools. If your system learns from other agents' behavior, it is a strategic player and should be governed as one (Calvano et al. (SSRN/arXiv) — algorithmic…). Audit for emergent collusion, test sensitivity to specification changes, and monitor equilibrium drift over time. The RealPage and Amazon cases demonstrate that algorithmic coordination can generate enormous consumer harm without any individual human choosing it (DOJ v. RealPage complaint (August 2024, am…) (FTC v. Amazon / Project Nessie federal com…).

Embrace bounded rationality — yours and others'. Neither you nor anyone you interact with computes Nash equilibria. Most people reason one to two levels deep (McKelvey & Palfrey (1995), 'Quantal Respon…). This means complex strategies that require your opponent to recognize your strategy, recognize that you recognize theirs, and respond accordingly will fail. Simple, clear, transparent strategies — like tit-for-tat — outperform precisely because they're legible. In the real world, clarity is a strategic advantage.

Game theory's deepest lesson is deceptively simple: intelligence doesn't guarantee good outcomes — structure does.

What this means for listeners: Game theory isn't a set of equations to solve — it's a diagnostic language for understanding why incentive structures produce the outcomes they do. The most powerful application isn't calculating your optimal move; it's redesigning the game so that everyone's self-interested move happens to be the cooperative one. Start by auditing the games you're already in: your compensation structure, your team dynamics, your market position. Ask whether the rules reward the behavior you actually want — and if they don't, change the rules.

Tier 1 · Meta-analytic

Camerer & Ho (1999), 'Experience-Weighted Attraction Learning in Normal Form Games,' Econometrica 67(4):827–874 — meta-analysis of 122 experimental studies on Nash convergence.
Güth, Schmittberger & Schwarze (1982), 'An Experimental Analysis of Ultimatum Bargaining,' Journal of Economic Behavior & Organization 3:367–388.
McKelvey & Palfrey (1995), 'Quantal Response Equilibria for Normal Form Games,' Games and Economic Behavior 10:6–38.

Tier 2 · Empirical

arXiv:2506.09390 (2025), 'Beyond Nash Equilibrium: Bounded Rationality of LLMs and Humans in Strategic Games.'

Tier 1 · Meta-analytic

Camerer (2003), Behavioral Game Theory, Princeton University Press — comprehensive synthesis of experimental findings across game types.

Tier 4 · Trade press

Tang (n.d.), Flevy consulting case study — Fortune 500 game theory consulting approach; FasterCapital (n.d.) — Nash equilibrium in startup strategy.

Tier 1 · Meta-analytic

Rapoport & Chammah (1965), Prisoner's Dilemma, University of Michigan Press — foundational PD experimental baselines.
Sally (1995), 'Conversation and Cooperation in Social Dilemmas,' Rationality and Society 7:58–92 — meta-analysis of communication and cooperation.
Fehr & Gächter (2002), 'Altruistic Punishment in Humans,' Nature 415:137–140.
Axelrod (1984), The Evolution of Cooperation, Basic Books — foundational iterated PD tournament research.
Nowak & Sigmund (1992, 1993), 'Tit for Tat in Heterogeneous Populations' and 'A Strategy of Win-Stay, Lose-Shift,' Nature.
Press & Dyson (2012), 'Iterated Prisoner's Dilemma Contains Strategies That Dominate Any Evolutionary Opponent,' PNAS 109:10409–10413.
Hilbe et al. (2013), 'Evolution of Extortion in Iterated Prisoner's Dilemma Games,' PNAS 110:6913–6918.
Fischbacher, Gächter & Fehr (2001), 'Are People Conditionally Cooperative?' AER 91(5):1340–1349.

Tier 2 · Empirical

Nikiforakis (2008), 'Punishment and Counter-Punishment in Public Good Games,' AER 98(4):1319–1329.

Tier 1 · Meta-analytic

Henrich et al. (2010), 'Markets, Religion, Community Size, and the Evolution of Fairness and Punishment,' Science 328:1480–1484.
Roth, Sönmez & Ünür (2004/2005), AER/Econometrica — kidney exchange algorithm design; Abdulkadiroğlu et al. (2005), Econometrica — school choice mechanism design.
Pathak & Sönmez (2008), 'Leveling the Playing Field: Sincere and Sophisticated Players in the Boston Mechanism,' AER — strategic misreporting in school choice.

Tier 2 · Empirical

Terrier, Pathak & Ren (2021), longitudinal study of England school matching post-Deferred Acceptance reform.

Tier 3 · Practitioner

CEPEO / Nuffield Foundation, FSM quota simulation modeling for English school admissions equity.

Tier 2 · Empirical

AGCOM Italy 5G auction analysis (2018) — comparative UK/Italy auction data on 3.7 GHz band pricing.
BNetzA Germany 5G auction records (2019) + 2025 German court rulings on spectrum award validity.

Tier 3 · Practitioner

Sandel, 'How Markets Crowd Out Morals,' Boston Review; Heyman & Ariely (2004) — crowding out of intrinsic motivation.

Tier 1 · Meta-analytic

Akerlof (1970), 'The Market for Lemons,' QJE 84:488–500; Spence (1973), job-market signaling.

Tier 2 · Empirical

Brogaard et al. (2018), 'High-Frequency Trading and Information Asymmetry,' Review of Financial Studies 31:147–199.
Empirical health insurance research on adverse selection and moral hazard — multiple sources including RAND working papers and Baker Institute analysis.
Resnick et al. (2006), 'The Value of Reputation on eBay,' Management Science 52:1494–1505.
Gneezy (2005), 'Deception: The Role of Consequences,' AER 95(1):384–394.

Tier 1 · Meta-analytic

Koutsoupias & Papadimitriou (1999), STOC — price of anarchy formalization; Roughgarden (2003), JCSS — selfish routing PoA bounds.

Tier 2 · Empirical

Braess's Paradox empirical and simulation evidence — transportation network studies on adding road capacity worsening congestion.
DOJ v. RealPage complaint (August 2024, amended January 2025) + consent decree (late 2025) — algorithmic rental pricing collusion.
FTC v. Amazon / Project Nessie federal complaint and district court ruling (2023) — algorithmic price manipulation.

Tier 3 · Practitioner

Senator Klobuchar, Preventing Algorithmic Collusion Act (S. 232, reintroduced 2025).

Tier 2 · Empirical

Calvano et al. (SSRN/arXiv) — algorithmic collusion via Q-learning pricing agents; 2025 antitrust reviews.
EU AI Act text — formally effective August 2024, Articles 9 and Annex III on high-risk AI system governance.
UN CDM monitoring reports on HFC-23 loophole; European Commission HFC-23 ban documentation (post-April 2013).
England Organ Donation (Deemed Consent) Act implementation data (post-May 2020); NHS Blood and Transplant evaluation.

Tier 3 · Practitioner

Iran compensated kidney donation literature — Kidney Foundation of Iran (KFI) descriptive and ethical accounts.

Tier 2 · Empirical

arXiv:2412.10270 (2024), 'Cultural Evolution of Cooperation among LLM Agents.'

Tier 3 · Practitioner

Moonlight literature review of Willis et al. (2026), 'Will Systems of LLM Agents Cooperate?' — IPD tournaments with ChatGPT-4o and Claude 3.5 Sonnet.

Tier 2 · Empirical

Leibo et al. (2017), DeepMind — multi-agent deep RL emergent cooperation in iterated games.
Shah (2025), 'Game Theory in Ride-Sharing Apps,' IJSRSET 12(4):71–79.

Structure determines behavior, not character: the same people cooperate or defect depending entirely on the rules — anonymity, communication, punishment availability. · Mechanism design is game theory's most consequential applied product, now allocating kidneys, school seats, and radio spectrum — and when badly designed, measurably harming the people it was built to help. · Algorithms are now players, not just tools: pricing software achieves tacit collusion without human intent, and LLMs exhibit cooperative biases shaped by their training data rather than strategic calculation.

Back to Yudame Research

Algorithms for Life: Game Theory

The Beautiful Trap: Why Smart People Make Collectively Stupid Decisions

The Cooperation Paradox: Same People, Different Rules, Opposite Behavior

Reverse Engineering the Rules: Mechanism Design From Kidneys to Classrooms

Seeing Through the Fog: Information Asymmetry, Lemons, and the Infrastructure of Trust

The Price You're Already Paying: Anarchy, Collusion, and Algorithms That Set Your Rent

When the Rules Backfire: Carbon Credits, Organ Markets, and the Limits of Design

The New Players: When Algorithms Learn to Strategize

Playing Better Games: A Strategic Toolkit for Real Life

Discover

Legal