On self driving, Waymo is playing chess while Tesla plays checkers
We'll know Tesla is serious about robotaxis when it starts hiring remote operators.
Tesla fans—and CEO Elon Musk himself—are excited about the prospects for Tesla’s Full Self Driving (FSD) software. Tesla released a major upgrade—version 12.3—of the software in March. Then last month Musk announced that Tesla would unveil a purpose-built robotaxi on August 8. Last week Elon Musk announced that a new version of FSD—12.4—is coming out in the coming days and will have a “5X to 10X improvement in miles per intervention.”
But I think fans expecting Tesla to launch a driverless taxi service in the near future are going to be disappointed.
During a late March trip to San Francisco, I had a chance to try the latest self-driving technology from both Tesla and Google’s Waymo.
During a 45-minute test drive in a Tesla Model X, I had to intervene twice to correct mistakes by the FSD software. In contrast, I rode in driverless Waymo vehicles for more than two hours and didn’t notice a single mistake.
So while Tesla’s FSD version 12.3 seems like a significant improvement over previous versions of FSD, it still lags behind Waymo’s technology.
However, Waymo’s impressive performance comes with an asterisk. While no one was behind the wheel during my rides, Waymo has remote operators that sometimes provide guidance to its vehicles (Waymo declined to tell me whether—or how often—remote operators intervened during my rides). And while Tesla’s FSD works on all road types, Waymo’s taxis avoid freeways.
Many Tesla fans see these limitations as signs that Waymo is headed for a technological dead end. They see Tesla’s FSD, with its capacity to operate in all cities and on all road types, as a more general technology that will soon surpass Waymo.
But this fundamentally misunderstands the situation.
Safely operating driverless vehicles on public roads is hard. With no one in the driver’s seat, a single mistake can be deadly—especially at freeway speeds. So Waymo launched its driverless service in 2020 in the easiest environment it could find—residential streets in the Phoenix suburbs—and has been gradually ratcheting up the difficulty level as it gains confidence in its technology.
In contrast, Tesla hasn’t started driverless testing because its software isn’t ready. For now, geographic restrictions and remote assistance aren’t needed because there’s always a human being behind the wheel. But I predict that when Tesla begins its driverless transition, it will realize that safety requires a Waymo-style incremental rollout.
So Tesla hasn’t found a different, better way to bring driverless technology to market. Waymo is just so far ahead that it’s dealing with challenges Tesla hasn’t started thinking about. Waymo is playing chess while Tesla is still playing checkers.
Tesla is several years behind Waymo
The current excitement around Tesla’s FSD reminds me of the hype that surrounded Waymo in 2018. Early that year, Waymo announced deals to purchase 20,000 I-Pace sedans from Jaguar and 62,000 Pacifica minivans from Fiat Chrysler.
But the service Waymo launched in December 2018 was a disappointment. There were still safety drivers behind the wheel on most rides and access was limited to a handpicked group of passengers.
It wasn’t until October 2020 that Waymo finally launched a fully driverless taxi service in the Phoenix area that was open to the general public. And even after that, Waymo expanded slowly.
Waymo began offering commercial service in San Francisco in 2023 and is now expanding to Los Angeles and Austin. Today the company only has a few hundred vehicles in its commercial fleet—far fewer than the 82,000 vehicles Waymo was planning to purchase six years ago.
What went wrong? In an August 2018 article, journalist Amir Efrati reported on the limitations of Waymo’s technology. Efrati wrote that “Waymo vans have trouble with many unprotected left turns and with merging into heavy traffic in the Phoenix area.” In addition, “the cars have trouble separating people, or cyclists, who are in groups, especially people near shopping centers or in parking lots.”
Efrati’s reporting makes Waymo’s technology in 2018 sound worse than FSD 12.3, but not that much worse. FSD still sometimes struggles with merging in heavy traffic and navigating around groups of pedestrians.
Waymo’s technology improved steadily after 2018. In late 2020, after Waymo launched its fully driverless service in the Phoenix area, I interviewed Joel Johnson, a college student who had taken dozens of Waymo rides and published a number of ride videos on YouTube.
"They've really ironed out stuff like unprotected lefts," Johnson said in one video. "It's definitely improved over time."
In an October video, a Waymo car was driving through a Costco parking lot crowded with pedestrians. It waited patiently until they were out of the way, then moved forward confidently.
"This amount of pedestrians would have caused whiplash-inducing brake usage in March 2020," Johnson wrote in an on-screen note. "And it would have completely given up in July 2019. No longer!"
The version of FSD I tried in March was clearly not ready for driverless operation. For example, I had to intervene to prevent the Model X from running over a plastic lane divider, a mistake Waymo would not have made in 2020. So while FSD 12.3 seems superior to Waymo’s technology circa 2018, it’s not as good as Waymo’s technology at the end of 2020.
Waymo relies on remote operators
In its early years, Waymo did all of its testing with safety drivers. When the software made a mistake, the driver would intervene to prevent a crash and then carefully document the incident. Engineers used data from these near misses to improve Waymo’s software.
The better self-driving software gets, the more expensive this testing strategy becomes. If self-driving software makes a mistake every 50 miles, a safety driver might experience several errors in a single workday. But if the software makes a mistake every 5,000 miles, a safety driver might have to drive around for weeks, at company expense, to get a single bug report.
It would have been prohibitively expensive for Waymo to continue testing with safety drivers until its software was provably safer than a human driver. So instead Waymo started deploying driverless vehicles supported by a remote operator.
Waymo programmed its driverless vehicles to make extreme caution their default behavior. If a driverless Waymo isn’t 100 percent confident that it’s safe to proceed, it will slow to a stop and ask for remote assistance. The hope is that the software’s average confidence will increase over time and vehicles will need remote assistance less and less often.
Waymo says remote operators never directly drive its vehicles. Instead, operators answer questions and give hints to guide the vehicle in the right direction. Here are two examples supplied to me by Waymo:
In this video, a Waymo’s path is blocked by a large truck coming in the opposite direction. The remote operator has the Waymo squeeze into the right lane to give the truck room to pass.
In this video, a Waymo is approaching an intersection with multiple fire trucks. The Waymo asks the remote operator two questions—“Is the emergency vehicle blocking all indicated lanes” and “Is the road closed?”—that help it move through the scene with confidence.
This strategy gets tricky on freeways. If a driverless vehicle asks for help and doesn’t get a timely response, it needs to stop and wait. But that’s hard to do on a freeway going 70 miles per hour.
So although Waymo has tested its technology on freeways (with safety drivers) for more than a decade, Waymo’s driverless taxis don’t use them yet. This makes Waymo’s service less useful.
For example, my first Waymo trip in March took me from downtown San Francisco to a McDonalds in the Bayview neighborhood. For Waymo this was a 28-minute trip on surface streets. An Uber or Lyft driver would have taken the 101 freeway and gotten there in about 15 minutes.
Waymo is working to address the problem. In January, the company started testing driverless operations on freeways in the Phoenix area. If testing goes well, Waymo may enable freeway driving for its commercial fleet in the coming months.
According to statistics published by Waymo, Waymo’s cautious approach has worked remarkably well, at least from a safety perspective. Over Waymo’s first 7 million driverless miles, its vehicles got into injury-causing crashes about one fourth as often as comparable human drivers.
Does Tesla have a better approach?
Many Tesla fans see the limitations of Waymo’s current service—avoidance of freeways, reliance on remote operators, and restriction to a few metro areas—as evidence that Waymo’s technology is fundamentally flawed.
In a typical tweet last year, a Tesla supporter argued that Waymo and GM-owned Cruise have “developed extremely narrow, very brittle technology that won't scale.” Elon Musk replied: “Yeah, extremely brittle to local conditions & doesn’t scale.”
A key part of this argument has to do with neural networks. Waymo began as the Google self-driving car project 15 years ago, before the deep learning revolution of the 2010s. The earliest versions of its software probably used hand-coded rules rather than machine learning. Some Tesla supporters seem to assume Waymo is still using the same outdated techniques.
In reality, Waymo now makes extensive use of neural networks.
For example, in this 2020 article, Waymo described how it uses neural networks for its perception system—the software that identifies and tracks nearby objects. In this February 2024 talk, a Waymo engineer explained how the company uses transformers, the Google-invented architecture behind large language models, to predict the actions of other vehicles.
So Waymo might have had a brittle software stack a decade ago, but the company hasn’t been standing still.
A big factor that makes self-driving difficult is the “long tail” of unusual situations a driver might encounter in the real world. Wet cement. A bike strapped to a car. A tree on the back of a truck. A woman in a wheelchair chasing a duck.
Companies building self-driving technology need to do millions of miles of testing in order to discover as many of these “edge cases” as possible. And this is one place where Tesla plausibly has an advantage over Waymo.
As we’ve seen, Waymo has to pay safety drivers for every mile of supervised testing. In contrast, Tesla has convinced thousands of customers to test its Full Self Driving software for free. Indeed, customers pay thousands of dollars for the privilege!
This gives Tesla access to effectively unlimited data. In theory, more data should enable Tesla to efficiently identify edge cases its self-driving software needs to handle. More data should also enable Tesla to train better neural networks.
But while access to more data is certainly helpful, it’s not a magic bullet. One issue is that the data Tesla collects is unlabeled. Waymo’s safety drivers document each disengagement to help identify flaws in Waymo’s software. But Tesla customers are unlikely to do that.
But another issue is that some edge cases are much harder to handle than others.
The first responder problem
Let’s take interactions with police and firefighters as an example. This is a problem that Waymo and GM’s Cruise struggled with a lot last year. I wrote about this in September:
According to San Francisco Fire Department records, several Waymo or Cruise vehicles blocked narrow roads, forcing fire trucks to take detours en route to fires. AVs got stuck near firefighting operations, forcing firefighters to work around them as they positioned hoses and ladders. A few AVs parked in front of fire stations, trapping fire trucks inside.
The problem wasn’t that these vehicles were running into firetrucks (Tesla has had a problem with that in the past). Rather, they were so cautious that they were getting stuck. This is why these incidents only became a concern after Waymo and Cruise began driverless operations; a safety driver could have intervened after a few seconds of inaction. It’s also why this hasn’t become an issue for Tesla yet: every Tesla still has a human behind the wheel who can take over if FSD gets stuck.
Most of the time, driving requires following simple, deterministic rules: stay in the center of the lane, avoid hitting other road users, obey stop lights and stop signs, and so forth.
But navigating through the scene of a fire or car crash is much trickier. Emergencies can totally disrupt the flow of traffic, forcing drivers to improvise new traffic patterns. Often drivers need a nuanced understanding of what other people are trying to accomplish so that they can avoid getting in their way. If police or firefighters are directing traffic, drivers need to understand their hand signals.
In short, navigating the scene of a fire or car crash sometimes requires reasoning skills that are far beyond the capabilities of today’s AI systems. So emergency scenes are likely to remain as a “corner case” for Tesla’s FSD for at least a few more years, just as they still are for Waymo’s software.
Recognizing this, Waymo backstops its software in a number of ways. Waymo vehicles can request guidance from Waymo remote operators if they aren’t sure what to do. First responders can lean into the car to talk to the remote operator, or they can jump into the car and drive it themselves. City officials can erect a geofence to keep Waymo vehicles away from emergency scenes. And Waymo has provided training to thousands of first responders in Phoenix, San Francisco, and elsewhere on how to interact with its vehicles.
These efforts seem to be paying off. In a February story, the San Francisco Chronicle reported that firefighters have been filing fewer reports about misbehaving vehicles since August. That’s partly because Cruise stopped operations in San Francisco in October. But the number of Waymo reports seems to be down as well.
If Tesla pushed out an FSD update that enabled fully driverless operations, I expect we’d start to see the same kind of stories that we saw about Waymo and Cruise last year: stories about Tesla vehicles driving over hoses, blocking ambulances, ignoring police officers’ instructions, interfering with firefighters’ ladder placements, and so forth.
Given the size of its fleet, Tesla could face a backlash from first responders much bigger than the one Waymo and Cruise faced last year in San Francisco. And as Waymo and Cruise discovered, police and firefighters have a lot of political influence.
If Tesla is serious about providing a driverless taxi service, it’s going to need the kinds of infrastructure and support services Waymo has been building in recent years. That includes remote operators to intervene when a vehicle gets stuck and personnel to work with local governments. Instead, Tesla has been moving in the opposite direction: the Information reported last month that Tesla was dissolving its policy team.
A service, not just a software product
Today’s FSD is a software product, but a robotaxi service is much more than just software. Here’s a simple example: what happens if a Tesla robotaxi gets a flat tire?
If you are driving your car and it gets a flat tire, it’s your problem. If you’re riding in a taxi that gets a flat tire, it’s the driver’s problem. But who changes the tire if there is no driver?
Tesla envisions a future where people buy a Tesla and then rent it out via a Tesla-run ride-hailing network. So theoretically Tesla could say that a flat tire is owner’s problem. But that wouldn’t work in practice. The owner might be in a meeting or even on vacation. And it wouldn’t be acceptable to leave a car stranded by the side of the road—possibly with a passenger inside—for hours while Tesla tries to contact its owner.
So if Tesla wants to get into the taxi business, it’s going to need a staff of mobile technicians to rescue stranded Teslas. These could be Tesla employees or independent contractors, but arrangements need to be in place before Tesla starts offering service in a particular area. And these same people can be dispatched to rescue Teslas when, inevitably, FSD software gets stuck.
And this means that Tesla will likely want to roll out the driverless version of FSD gradually, one metropolitan area at a time. This makes sense for other reasons too: it would give Tesla time to introduce itself to local officials and offer training to local police and fire departments. And while remote operators and customer service agents need not be local, a phased rollout would give Tesla time to hire people for those roles.
I hope it’s clear now why I don’t consider the current limitations on Waymo’s service— geographic restrictions and the use of remote operators—particularly damning. Every taxi service is geographically limited, and there are good reasons for Waymo to roll out its service one city at a time.
The bitter lesson
I’ve talked to a lot of Tesla fans, so I have a pretty good idea what they’d say here: that I’m underestimating how quickly Tesla’s self-driving technology will improve as Tesla throws more data and computing power at the problem.
Some Tesla fans like to reference the famous Rich Sutton essay “The Bitter Lesson.” Sutton argued that AI researchers have historically spent too much time time trying to hand-code human insights on the best way to solve a problem like computer chess or image recognition. People ultimately got better results with general-purpose learning algorithms trained on large amounts of data.
Sutton wrote his essay in 2019. Since then, the success of large language models has demonstrated Sutton’s insight in a spectacular way. Early AI researchers would try to understand the properties of natural language and then encode their insights into AI systems. These systems didn’t work very well. What worked much better was taking the simple transformer architecture and scaling it up to hundreds of billions of parameters to create LLMs like GPT-4.
Elon Musk is betting that this same dynamic will work in Tesla’s favor with self driving. He has described FSD version 12 as using “end-to-end neural nets.” He is investing billions of dollars on hardware to train those neural networks using vast amounts of data collected from Tesla customers. If you buy Sutton’s argument, you might expect Tesla to jump ahead of Waymo.
But I think people are reading too much into Sutton’s argument. Sutton’s point is that large neural networks trained on lots of data tend to outperform hand-coded AI systems. But it doesn’t follow that throwing more data and computing power at any particular neural network will achieve arbitrarily high levels of performance.
LLMs are a powerful example of this. LLMs hallucinate. They fail at simple tasks like counting objects and reading analog clocks. LLMs are great for applications where accuracy isn’t that important, or where a human being is checking the output after it’s generated. But if you need very high accuracy, they are not a good choice.
Self-driving systems do need very high accuracy. And it’s not obvious that an end-to-end neural network with enough data and computing power will necessarily achieve it.
Business professor Ethan Mollick has written about the “jagged frontier”: Complex AI systems are often impressively good at some tasks but surprisingly bad at others. Tesla may get really good at navigating freeways, intersections, and traffic circles but make little progress on avoiding wet cement or understanding the hand signals of police officers.
Could Waymo have a Just Walk Out problem?
Waymo’s approach to this problem is to build a mostly automated system that is able to gracefully fall back on human assistance when needed.
While this works quite well from a safety perspective, I’ve started to wonder about the economics of it. If Waymo vehicles were constantly asking for remote guidance, Waymo might need to hire so many remote operators that it negates the cost savings that come from not needing a driver.
Last month, Amazon announced it was removing its no-checkout technology, called Just Walk Out, from Amazon Fresh grocery stores.
Like Waymo, Amazon was bullish about its technology in 2018. That year Bloomberg reported that Amazon was planning to open 3,000 Amazon Go convenience stores based on Just Walk Out technology.
But that never happened, and reporting from The Information’s Theo Wayt helps to explain why. Wayt reported last year that Amazon’s technology—like Waymo’s—wasn’t fully automated. Amazon had more than 1,000 workers in India manually verifying customer selections. Wayt says that in mid-2022 “Just Walk Out required about 700 human reviews per 1,000 sales.”
Amazon aimed to reduce this figure to 20 to 50 human reviews per 1,000 items, but the company “repeatedly missed” its performance targets.
Could Waymo have a similar problem? I don’t know, and unsurprisingly Waymo declined to comment about the frequency of remote interventions.
My best guess is that this will not be a serious issue for Waymo. During the rides I took in March, Waymo’s vehicles drove smoothly and confidently. If they were constantly seeking remote guidance, I would have expected more hesitation and erratic driving.
Waymo also—finally—seems to be expanding fairly rapidly. Earlier this month the company announced it was serving 50,000 trips a week, up from 10,000 weekly trips nine months earlier. It seems unlikely Waymo would grow that quickly unless management was confident they had a clear path to profitability.
But either way, I don’t think Tesla has discovered a better way to approach the problem. Large, complex neural networks tend to be good at some things but not as good at others. Yet the AI system that controls a two-ton vehicle needs to be very reliable all the time. For the next few years, at least, that’s only going to be possible with human backup.
The crowd forecasting site Metaculus has created a series of betting markets inspired by my reporting. Click here to check it out.
Re the flat tire scenario (or for any breakdown);, I'll float the alternative that they send a second taxi to pick the rider up in a few minutes and call a tow truck, which could be one they hire on the open market. Thus no staffing is needed, the tow truck driver does what's needed. Do the tow truck drivers need training? Maybe, but I'm not sure.
Perhaps this removes the need to expand city by city?
Tesla could easily compare their current software prediction with what drivers actually do, no matter if the system is disengaged, or when the driver manually overrides system control, or when the system self-disengages, or when the system fails while engaged. Or even when FSD hasn't been purchased! The software can ALWAYS run whenever the hardware is powered, independent of its purchase state, engagement status or control mode.
In this way, *ALL* Teslas can provide experiential data at all times, not limited by factors such as Waymo's comparatively tiny fleet size.