How are the models different despite their similarity? And I would imagine the transformer gets greatly specialized in its neuron implementation right? But it's about prediction symbols
The difference is that Waymo is relaxing the separation between its modules in a deliberate fashion, with rigorous validation to ensure it doesn't just become all mush.
Tesla seems to think that if they provide the inputs, so images, the machine will al sort it out in a giant neural net.
Then Waymo provides its AI with richer data and priors, with lidar and maps.
There's vast amount of engineering and research into how to do these things properly, and two different end-to-end implementations can be as night and day in practice.
I like how Comma.AI's add-on system layers their smarts atop existing vehicle systems, making use of certified vendor systems to support and validate their own top-level smarts.
For example, Comma (well, actually, the Open Source OpenPilot software) has its own LKAS capabilities that provide LKAS support on vehicles lacking it. However, when a vehicle has its own LKAS, the Comma LKAS capability becomes secondary, providing redundancy.
Comma also uses end-to-end training for their camera-based system, with a separate safety processor ("Panda") validating ("sanity checking") and managing CAN data flowing between Comma and the vehicle. Such safety processors are common throughout many industries, including the electric power grid, nuclear reactors, aircraft avionics, spacecraft and so on.
This is done by Comma despite strictly being a Level 2 system (at least presently). I hope all other self-driving companies are using equivalent safety hardware that exists and runs independently of the host.
This safety module provides a key side benefit: It is the ONLY part of the Comma system that must be validated at the hardware level, allowing the main system hardware to be designed to commercial standards, precisely like a smartphone.
In addition, the only successful self driving vehicle will combine radar based autonomous braking with cameras plus some combination lidar, radar or other technology. The idea that camera only, self driving cars are possible vastly underestimates the complexity of the brain (granted, can fail when compromised) and such an idea is caused by a "fixed thinking neural block".
The idea that Tesla has any ML work that can be considered cutting edge seems fanciful. Where's the evidence of it? They hardly publish anything. The FSD division is run by a nobody. All observable indicators suggest that Google is on the frontier and Tesla trails.
That is similar to the popular meme that Tesla enjoys a data advantage over Google. Stop and think about that one for a few seconds. "Tesla has more data than Google" is pretty much the craziest claim I've ever heard.
Amazing to see that you actually can put it all to transformers. Especially geometric reasoning. So far away from text based next token prediction task. UAT is definitely one of the most impactful math theorems in world of AI
Excellent post, Tim! In particular, I appreciate your efforts to push the automated driving companies to be more transparent about how their systems work. I hope that in the long run the industry will evolve to become more like the aviation industry, where safety-related technical information is shared more openly.
The fact that Waymo can provide validated object model data directly to that subsystem is not a trivial difference. It allows them to train and validate at a level that would only rarely gain feedback in an end-to-end system. Often failures in object reco will not affect actions so it will not get a gradient signal that it is wrong.
I don't mean to say this makes it totally different, but in my eyes, it is a significant difference that I would expect would show up in better handling of black swan events, where one of those objects was relevant for the action taken.
Ultimately the distinction between modular and monolithic is very largely in the eye of the beholder/developer. After all, nobody does spaghetti code anymore (I hope!). Any rationally designed system programmed by multiple people is going to be organized into modules, if only as a matter of ease of programmability, verifiability, and (above all) maintainability. These may not be originally intended as functionally distinct from a larger system analysis perspective, but in general they can be viewed that way if one wants to. At the same time the final system will (hopefully) function in an integrated or "monolithic" fashion.
I am not sure who is more advanced under the hood but anyone who has driven in both Waymo and Tesla's robotaxi knows that Tesla is the much better and human like driver. If Tesla is a black box then they will have an uphill battle with regulators but since they are already fully driverless on roads they might be able to lean into just using the real world numbers, assuming they are comparable to Waymo, and 'convince' regulators that's good enough without having to see into the black box.
I drive more than 99% of humans on the planet and I use Tesla FSD 95% of the time, including last week when it drove me from Texas to New Jersey and back with no help from me except when the navigation was incorrect or when I took over by accident. One year ago I would have said Tesla has a ways to go but this new version has solved self driving and is better than humans.
What’s your basis for saying it’s better than Waymo? I’ve ridden in Waymos about 15 trips and I don’t think I saw it make a single mistake. I’m not saying you are wrong but I’m curious what you are basing the judgment on.
The limitation with Tesla is that there is no guarantee it won't ABRUPTLY require driver intervention. Glad to hear it is doing much better on long hauls than a year ago (did this include any city driving?), but they still have to find a way to recover [if at all possible] from situations that the system can't handle. Honestly not sure that is even a solvable problem (in general), any time soon. Waymo (as a robotaxi service) sidesteps the worst of it, by only operating in restricted areas, that they have hi-res mapped. (Waymo's approach can't be used except in these limited areas.)
Tesla as a robotaxi service maybe could do the same thing (extra mapping); but their story is that they can handle any situation a person can, thus can be used anywhere. Are they safer than a person (on average)? Probably, but that isn't saying much. Fully autonomous vehicles will likely be held to a higher standard; should match-or-beat a good driver at full alertness.
What I'm trying to say is the *hard* situations will make or break an autonomous system. Driving a long ways under *good* conditions is a starting point, but is not sufficient to judge whether the system is ready for autonomy.
I assume some future solution will combine the best of both approaches: as much information as possible about conditions ahead (from recent trips by other cars?) plus an excellent analysis of what is happening at this moment. Slow down before reaching the problem. Have an "escape plan" ready at every moment. No reason to limit the vehicle to what is knowable from its position.
There’s a convergence to an approach using transformers that seems to work.
This might mean that other companies can catch up quite quickly. For example, Rivian has also just launched a transformer-based autopilot that they believe is on the path to level 4 autonomy next year.
After 10 plus years of Tesla versus Waymo, perhaps a large number of companies will get to autonomous at roughly the same time?!?
There are two related plans in motion: 2026 is an attempt to complete with Tesla (Supervised) ability, while collecting data from their cars in use, to feed into a future autonomous solution.
Then they might do something clever: of the millions of miles of roads in US, they learn which roads HAVE GOOD LANE MARKINGS. As they learn about these roads, they might allow level 4 autonomy where they know the markings are good!
BUT there is a problem with this: It is not possible to know ahead of time whether weather conditions, construction, or an accident (think "oil spill") will interfere with the lane markings.
They have not announced how they will cope with those conditions. Will they be like Tesla, and require a driver on stand-by (not level 4)? Or will they be like Waymo and require hi-res mapping (know where the lanes "should be", even when obscured), thus only useable in limited areas.
MAYBE they can manage a "middle ground", using the fact that good markings used to be here, to immediately slow down, and pull off the road. Wait for human intervention. (Similar to what Waymo does; but Waymo GPS pre-mapped the lane markings, so can more easily pull safely out of the way, even if markings have become obscured.)
Rivian has NOT announced even a "target year" for Level 4 autonomy.
[ChatGPT does an excellent job of describing what would have to happen, before Rivian could be approved by a regulator in any marget area.]
My guess: by the time Rivian gets to level 4 in US or Europe, they will have very strong competition from China.
ChatGPT does an excellent job of explaining how comparing Tesla and Waymo autonomy is an "apple to oranges" comparison.
Bottom line: the systems (combined software and hardware) are designed to solve **different goals**. They have different requirements, and different abilities.
*WAYMO*: REQUIRES: a highly-detailed mapping => ABILITY: able to operate without a person in the driver's seat. IF UNABLE TO CONTINUE: pulls out of traffic and stops.
*TESLA* (Supervised): REQUIRES: a person who can take over at any time, with little advance notice. => ABILITY: able to drive anywhere ... but IF UNABLE TO CONTINUE: abruptly requires driver intervention.
Still waiting for Tesla (Robotaxi) to prove they can operate unsupervised without MANY more accidents per million miles than Waymo. OTOH, Waymo is NOT a solution anywhere that hasn't been hi-res mapped.
--------------------------
I'm skeptical Tesla, as designed, ever becomes a "works anywhere" level 4 autonomous vendor:
* Lacks redundant sensors. No Radar. No LiDAR. No thermal/infrared.
"better than the average human driver" isn't good enough for L4.
Their competitors use multiple sensor types to cope with poor conditions.
[Ask ChatGPT "Given only visual cameras, how can Tesla pass regulatory approval as an L4 system?"]
The details remind me of what a new aircraft model has to do, to be certified to fly with passengers. Have to demonstrate what the system does in the worst case scenarios. Have to collect a LOT of evidence from actual driving, and from test-driving designed to cause failure.
How are the models different despite their similarity? And I would imagine the transformer gets greatly specialized in its neuron implementation right? But it's about prediction symbols
The difference is that Waymo is relaxing the separation between its modules in a deliberate fashion, with rigorous validation to ensure it doesn't just become all mush.
Tesla seems to think that if they provide the inputs, so images, the machine will al sort it out in a giant neural net.
Then Waymo provides its AI with richer data and priors, with lidar and maps.
There's vast amount of engineering and research into how to do these things properly, and two different end-to-end implementations can be as night and day in practice.
👍👍
I like how Comma.AI's add-on system layers their smarts atop existing vehicle systems, making use of certified vendor systems to support and validate their own top-level smarts.
For example, Comma (well, actually, the Open Source OpenPilot software) has its own LKAS capabilities that provide LKAS support on vehicles lacking it. However, when a vehicle has its own LKAS, the Comma LKAS capability becomes secondary, providing redundancy.
Comma also uses end-to-end training for their camera-based system, with a separate safety processor ("Panda") validating ("sanity checking") and managing CAN data flowing between Comma and the vehicle. Such safety processors are common throughout many industries, including the electric power grid, nuclear reactors, aircraft avionics, spacecraft and so on.
This is done by Comma despite strictly being a Level 2 system (at least presently). I hope all other self-driving companies are using equivalent safety hardware that exists and runs independently of the host.
This safety module provides a key side benefit: It is the ONLY part of the Comma system that must be validated at the hardware level, allowing the main system hardware to be designed to commercial standards, precisely like a smartphone.
All future vehicles need radar based collision avoidance with autonomous braking, this should override all other systems in a vehicle.
In addition, the only successful self driving vehicle will combine radar based autonomous braking with cameras plus some combination lidar, radar or other technology. The idea that camera only, self driving cars are possible vastly underestimates the complexity of the brain (granted, can fail when compromised) and such an idea is caused by a "fixed thinking neural block".
The idea that Tesla has any ML work that can be considered cutting edge seems fanciful. Where's the evidence of it? They hardly publish anything. The FSD division is run by a nobody. All observable indicators suggest that Google is on the frontier and Tesla trails.
That is similar to the popular meme that Tesla enjoys a data advantage over Google. Stop and think about that one for a few seconds. "Tesla has more data than Google" is pretty much the craziest claim I've ever heard.
Amazing to see that you actually can put it all to transformers. Especially geometric reasoning. So far away from text based next token prediction task. UAT is definitely one of the most impactful math theorems in world of AI
Excellent post, Tim! In particular, I appreciate your efforts to push the automated driving companies to be more transparent about how their systems work. I hope that in the long run the industry will evolve to become more like the aviation industry, where safety-related technical information is shared more openly.
The fact that Waymo can provide validated object model data directly to that subsystem is not a trivial difference. It allows them to train and validate at a level that would only rarely gain feedback in an end-to-end system. Often failures in object reco will not affect actions so it will not get a gradient signal that it is wrong.
I don't mean to say this makes it totally different, but in my eyes, it is a significant difference that I would expect would show up in better handling of black swan events, where one of those objects was relevant for the action taken.
Ultimately the distinction between modular and monolithic is very largely in the eye of the beholder/developer. After all, nobody does spaghetti code anymore (I hope!). Any rationally designed system programmed by multiple people is going to be organized into modules, if only as a matter of ease of programmability, verifiability, and (above all) maintainability. These may not be originally intended as functionally distinct from a larger system analysis perspective, but in general they can be viewed that way if one wants to. At the same time the final system will (hopefully) function in an integrated or "monolithic" fashion.
I am not sure who is more advanced under the hood but anyone who has driven in both Waymo and Tesla's robotaxi knows that Tesla is the much better and human like driver. If Tesla is a black box then they will have an uphill battle with regulators but since they are already fully driverless on roads they might be able to lean into just using the real world numbers, assuming they are comparable to Waymo, and 'convince' regulators that's good enough without having to see into the black box.
I drive more than 99% of humans on the planet and I use Tesla FSD 95% of the time, including last week when it drove me from Texas to New Jersey and back with no help from me except when the navigation was incorrect or when I took over by accident. One year ago I would have said Tesla has a ways to go but this new version has solved self driving and is better than humans.
What’s your basis for saying it’s better than Waymo? I’ve ridden in Waymos about 15 trips and I don’t think I saw it make a single mistake. I’m not saying you are wrong but I’m curious what you are basing the judgment on.
To clarify, I am not saying it is safer. I am saying it drives less robotic and almost indistinguishable from a human.
The limitation with Tesla is that there is no guarantee it won't ABRUPTLY require driver intervention. Glad to hear it is doing much better on long hauls than a year ago (did this include any city driving?), but they still have to find a way to recover [if at all possible] from situations that the system can't handle. Honestly not sure that is even a solvable problem (in general), any time soon. Waymo (as a robotaxi service) sidesteps the worst of it, by only operating in restricted areas, that they have hi-res mapped. (Waymo's approach can't be used except in these limited areas.)
Tesla as a robotaxi service maybe could do the same thing (extra mapping); but their story is that they can handle any situation a person can, thus can be used anywhere. Are they safer than a person (on average)? Probably, but that isn't saying much. Fully autonomous vehicles will likely be held to a higher standard; should match-or-beat a good driver at full alertness.
What I'm trying to say is the *hard* situations will make or break an autonomous system. Driving a long ways under *good* conditions is a starting point, but is not sufficient to judge whether the system is ready for autonomy.
I assume some future solution will combine the best of both approaches: as much information as possible about conditions ahead (from recent trips by other cars?) plus an excellent analysis of what is happening at this moment. Slow down before reaching the problem. Have an "escape plan" ready at every moment. No reason to limit the vehicle to what is knowable from its position.
There’s a convergence to an approach using transformers that seems to work.
This might mean that other companies can catch up quite quickly. For example, Rivian has also just launched a transformer-based autopilot that they believe is on the path to level 4 autonomy next year.
After 10 plus years of Tesla versus Waymo, perhaps a large number of companies will get to autonomous at roughly the same time?!?
The decade plus of indecent
Not quite.
There are two related plans in motion: 2026 is an attempt to complete with Tesla (Supervised) ability, while collecting data from their cars in use, to feed into a future autonomous solution.
Then they might do something clever: of the millions of miles of roads in US, they learn which roads HAVE GOOD LANE MARKINGS. As they learn about these roads, they might allow level 4 autonomy where they know the markings are good!
BUT there is a problem with this: It is not possible to know ahead of time whether weather conditions, construction, or an accident (think "oil spill") will interfere with the lane markings.
They have not announced how they will cope with those conditions. Will they be like Tesla, and require a driver on stand-by (not level 4)? Or will they be like Waymo and require hi-res mapping (know where the lanes "should be", even when obscured), thus only useable in limited areas.
MAYBE they can manage a "middle ground", using the fact that good markings used to be here, to immediately slow down, and pull off the road. Wait for human intervention. (Similar to what Waymo does; but Waymo GPS pre-mapped the lane markings, so can more easily pull safely out of the way, even if markings have become obscured.)
Rivian has NOT announced even a "target year" for Level 4 autonomy.
[ChatGPT does an excellent job of describing what would have to happen, before Rivian could be approved by a regulator in any marget area.]
My guess: by the time Rivian gets to level 4 in US or Europe, they will have very strong competition from China.
ChatGPT does an excellent job of explaining how comparing Tesla and Waymo autonomy is an "apple to oranges" comparison.
Bottom line: the systems (combined software and hardware) are designed to solve **different goals**. They have different requirements, and different abilities.
*WAYMO*: REQUIRES: a highly-detailed mapping => ABILITY: able to operate without a person in the driver's seat. IF UNABLE TO CONTINUE: pulls out of traffic and stops.
*TESLA* (Supervised): REQUIRES: a person who can take over at any time, with little advance notice. => ABILITY: able to drive anywhere ... but IF UNABLE TO CONTINUE: abruptly requires driver intervention.
Still waiting for Tesla (Robotaxi) to prove they can operate unsupervised without MANY more accidents per million miles than Waymo. OTOH, Waymo is NOT a solution anywhere that hasn't been hi-res mapped.
--------------------------
I'm skeptical Tesla, as designed, ever becomes a "works anywhere" level 4 autonomous vendor:
* Lacks redundant sensors. No Radar. No LiDAR. No thermal/infrared.
"better than the average human driver" isn't good enough for L4.
Their competitors use multiple sensor types to cope with poor conditions.
[Ask ChatGPT "Given only visual cameras, how can Tesla pass regulatory approval as an L4 system?"]
The details remind me of what a new aircraft model has to do, to be certified to fly with passengers. Have to demonstrate what the system does in the worst case scenarios. Have to collect a LOT of evidence from actual driving, and from test-driving designed to cause failure.