Some time ago (around ~10 years) this guy (the presenter) was internet famous for being a Rubik cube speed solver and making tutorials and videos about that: https://www.youtube.com/watch?v=609nhVzg-5Q
I'll always know him as badmephisto. In a recentish reddit AMA, he says he still keeps a cube on his desk so he can practice a bit and not forget his algorithms.
What a blast from the past! I learned how to solve the Rubik's Cube blindfolded by watching him, back in the day. His tutorials are perfect and I've probably recommended his channel to ~50 people myself. Crazy he only has 36.6k subs.
The competition in this space is great but I can't help but wonder what would happen if instead all these companies pooled their resources and went after the goal collectively. There is so much duplication going on and the paths do not seem to me - as an outsider - to be all that divergent, which is usually a pre-condition for having a lot of independent efforts one of which will succeed.
It's as if everybody wants to be the one to exclusively own the tech. Imagine every car manufacturer having a completely different take on what a car should be like from a safety perspective. We have standards bodies for a reason and given the fact that there are plenty of lives at stake here maybe for once the monetary angle should get a back-seat (pun intended) to safety and a joint effort is called for. That would also stop people dying because operators of unsafe software are trying to make up for their late entry by 'moving fast and breaking things' where in this case the things are pedestrians, cyclists and other traffic participants who have no share in the monetary gain.
> The competition in this space is great but I can't help but wonder what would happen if instead all these companies pooled their resources and went after the goal collectively.
It would probably slow down. 9 women can't have a baby in 1 month. Besides that, the disagreements about approach, politics, or eventual competitive interests would probably bring things to a halt for a long time.
I don't think the solutions to this problem are resource-constrained. Many companies would happily find more resources in order to be first to market with this technology.
Also agreed, when hedge funds don't silo their quants, instead of seeing 50 different strategies from 50 quants, they get 50 variants of the same strategy, source:
Exactly. Look at Human Brain Project: $1B and 10 years later what exactly have they achieved? Just like you said - disagreements about approach, politics, or eventual competitive interests did bring things to a halt for a long time
The human brain project was DOA from day #1. Unrealistic goals, no clear reason why more money would lead to better results and no concrete deliverables that anybody needed.
To me, the fatal flaw of HBP (or USA's competing HBI) is that they are akin to "cargo cult science", the idea that we can replicate the superficial structures to a significant enough degree that they system they impart will suddenly somehow become activated.
But just like the Melanesians with their coconut-shell headsets, there won't be anyone listening on the other end...
If we replicate a car with a sufficient accuracy it will start. I don’t see why this wouldn’t apply to any other piece of machinery, including a brain.
HBP should have tried a simpler task first, e.g. replicate a fruit fly’s brain.
Having replicated the fruit fly's brain, what are you going to do?
Replicate the rest of the fruit fly so that you can install the brain in it? And then replicate the world so that your software-simulated fruit fly has a natural context in which to operate?
Or just stimulate the brain with random inputs not associated with any real-world stimulus?
I’m pretty sure simulating fly’s neural inputs would be a far easier task than simulating its brain. After verifying it works correctly we would proceed to simulating a more complex brain, say a frog. And so on.
To simulate the fly's neural input, you need to simulate the fly's entire environment, including fluid dynamics for the air around the fly, the physics of every object the fly interacts with, hormonal responses to changes in the flies blood concentration of various substances (O2, glucose, ...).
This already strains our technical capabilities (at least for the amount of money we are willing to spend on it).
For any animal whose behavior is the product of a lot of learning, and who deals with other such animals in daily life, you have to solve all the problems you had to solve for the fly and also deal with a much larger connectome and also deal with the fact that, until you can passably simulate mouse, you will never be able to simulate the way a mouse learns to behave in the presence of another mouse.
No need to simulate any of the environment. You only need to record the neural inputs to a fly’s brain. Sure that’s also challenging, but nowhere as challenging as simulating the entire fly’s brain. My point is if you manage to accomplish the latter, the former would be a breeze.
> You only need to record the neural inputs to a fly’s brain.
I don't think that's true. As soon as your simulated fly's behavior diverges from the actual fly's, all of the recorded input after that point is invalid/useless because it will not match the simulated fly's position/orientation/whatever.
Also, how many dollar of investment and years into the future do you think we are from being able to record all of a fly's neural input while it is moving freely?
To start with you only need to study input-response pairs (sensory input causing motor commands). You enter the neural inputs from a real fly into the simulation, and compare responses of the real fly and the simulated one. Once you understand what's going on, proceed to do sequence of inputs, and compare the sequence of responses. The goal is not to produce the identical sequence of actions by tuning the simulation, it's understanding how the actions are being computed from the inputs.
Having a detailed simulation like this would greatly accelerate Numenta's style research, where instead of piecing together information from published papers, you would get it straight from the experiments you control.
>> if instead all these companies pooled their resources and went after the goal collectively.
That would be a bad idea because like in evolutionary processes you need this diversity of ideas to locate better local optima even if it will take longer.
Standards shouldn't emerge too soon. I think for self driving tech, at the current stage, competition is good because there are lots of unsolved questions. Competition will ensure the best tech is ultimately available to consumers.
Of course, it's not a binary choice. Things like data should probably be pooled but the use of data in tech should compete.
Ditto validation tech frameworks. If those are not standardized then people will not be able to make an informed choice about which solution is the safest other than to wait a decade and do a bodycount.
In the self-driving world, the duplication is necessary - different companies are taking different directions, and nobody really knows which will work out.
In the ML hardware world, the duplication is mostly unnecessary. People are developing their own inference hardware ASIC's because they're relatively simple (compared to designing a CPU from scratch, designing a TPU is pretty simple because there are so few operations, and no complex out-of-order execution), and you can't buy one off the shelf yet.
As soon as ML hardware becomes available to buy off the shelf without a massive price premium, everyone will switch to that.
I do research in ML hw field: there are currently a couple hundred designs to run a convolutional NN inference. A couple of dozen have been built. They have pretty different underlying technologies (CMOS, floating gate, ReRAM/memristors, etc), different ideas (systolic arrays, analog crossbars, cache organization, lookup tables, data reuse, TDM, using spikes, etc), wildly different power (from microwatts to hundreds of Watts), size, speed, precision, flexibility, cost, ease of use/integration, etc. This is just convnet inference. Lots more is needed to do training in hw, again with multiple choices on how to do it.
So which one design you suggest we all use for all our ML needs?
The duplication I see in the commercial world in ML inference hardware is in designs similar to the TPU... So a big ~128x28 accumulating mat-mul array, with enough memory throughput to get one operand in and out fast, and enough cache to store the other operand (weights) and switch between which weights are used so the mat-mul array can very efficiently do larger matrix sizes.
Also lookup tables for a bunch of activation functions.
That basic design can efficiently implement nearly any neural net architecture as long as the layer sizes are at least 128x128 and fixed point is okay.
The other exotic designs you suggested are more academic research things, and not yet deployed at scale in anyone's datacenters.
We could see some of this play out in the China EV market in the coming years. State sponsored subsidies around infrastructure standardization. Combined with foreign investment and competition spurring innovation.
What I've seen personally is what can be loosely termed "emergent consensus". Historical competitors (and often it gets whittled down to two giants, such as Boeing and Airbus) will work in secret on research. But after years of experimentation arrive at very similar outcomes. An optimal answer that could only be arrived at through constant trial and error, product evolution and iteration.
Regarding Karpathy's PyTorch presentation I don't thing anything that wasn't already public was revealed. The FSD board with custom NPUs is a Work of Art. I like that there are dual redundant streams. And the scale of the dataset is already well know: 4096 HD-images per step!
If I had to speculate, the "Dojo Cluster" may be envisioned as an effort to share data and compute with industry partners as a cloud SaaS product and ancillary revenue stream. But that is pure speculation ;)
Fortunately, some companies do share a significant amount of what their cars have learned so far. Uber publishes a ton of papers about their self-driving research [0][1]. Waymo released an open autonomous driving dataset, and publishes papers as well [2][3].
Of course, papers and data aren't code. But I think a lot more is being shared than people realize.
I don't know about sharing tech, but there should definitely be a shared evaluation benchmark, and some kind of oversight agency should be involved. The idea would be: if you want to be permitted to operate an AV on public roads, you need to demonstrate that your vehicle's vision system can detect pedestrians and obstacles with near-perfect accuracy on a large shared image database, most of which is NOT distributed to researchers.
Competition is good. It keeps you honest. The Manhattan project experienced competition, of the do-or-die type. It only was not internal, it was from the Nazis, Japan, and later the soviets.
Right now, the competition in the self-driving area is metaphorically as close to do-or-die as you can have in peace time. GM as a regular car manufacturer is toast, Cruze is pretty much their only hope. Uber is bleeding, if they pull self-driving off, they are kings. The German manufacturers are watching in disbelief as Tesla is starting to eat their pie. Conversely, Tesla knows that in what they are doing (electric cars) they don't enjoy any fundamental moat. If the Germans get their act together, they'll be able to make equally performant electric cars, but true luxury. The only one that's not really about survival is Waymo.
I wonder if the French civil nuclear electricity program that led to this level of low carbon emissions, or the TGV (high speed rail system), could be good examples of what you're asking for.
Maybe the companies actually building the nuclear plants and trains and rails were actually in competition ?
There was a time in the medival ages where alchemists were kidnapped by kings and held in chambers so they would only generate knowledge for them. This obviously lead to a similar duplication to the one you describe, right up to calculus where Newton kept the thing hidden in a drawer and then Leibnitz had the same idea.
Once that kind of secrecy was gone our whole technical progress was accelerated, because people could build on the discoveries of other people.
Right now we are going back to the alchemist model in some ways (the highest profile people work for the big companies and don’t share their discoveries). This makes progress slower.
> the highest profile people work for the big companies and don’t share their discoveries
I have to strongly disagree with this for the specific case of AI/ML. The big company labs are publishing open access papers non-stop, often with code and sometimes even datasets. They're more open than some areas of academia, in fact.
Really? I heard the opposite from someone in the field, who told me that they do publish but it is never the really relevant stuff. I can’t really judge that myself to be honest
I work in the field. New and relevant stuff is published frequently by Google, Facebook and Microsoft (as well as smaller companies like NVidia etc). Apple very occasionally publishes too.
Leon Gatys, the guy who published A Neural Algorithm for Artistic Style that revolutionized the entire industry while he was at a university is now employed at Apple. They have published at least one thing that was semi-interesting on cognitive functioning in older adults and how it can be expressed through smartphone usage: http://delivery.acm.org/10.1145/3310000/3300398/a168-gordon....
Interesting and certainly compelling, but not earth-shaking like style transfer has been.
Ideally everyone would collaborate on inputs and compete on outputs. All the data gathering, tagging, mapping etc could be put into a shared domain, and then after that the companies decide what to do with it and how to commercialise it.
Easier said than done, but I think it would strike the right balance between reducing duplicate work, and incentivising progress.
You can apply this line of reasoning on many markets, like the pharma or food industry which also have safety concerns. It strikes me as the kind initiatives EU attempts nowadays when realizing we are running behind on some tech and want to leverage the one possible advantage we have as a great centralizing power. Not too different from communist states, actually. I agree with the sentiment that redundancy seems wasteful, but it seems to me a necessary evil as a driving force in development, as with the right to private property in general.
I'm reminded of the Manhattan project and I don't think they would have succeeded in their goal if they had tried to run 10 of those at once. There just aren't that many really great scientists in a space this narrow.
They built complete factories for three different technologies, and finally delivered two totally different bomb designs.
Which is a sign they were pretty worried about choosing to put all their smart guys on one path, and picking the wrong one. And the number of smart guys they had was certainly a real constraint. Even under their very stressed circumstances, exploration and competition were important.
This is correct - read the fantastic “making of the atomic bomb” - the planners kept competition alive for as long as possible because no one really knew which path was going to work. In contrast the nazi project, among other reasons, failed because it focused nearly exclusively on heavy water and didn’t have internally competing projects.
It's kind of a different class of problem. With self driving you can have someone like Hotz or the Cruise guys knock up a system with limited resources and have it work quite well. You just can't do that with developing nuclear weapons.
Also self driving is not a problem where you can put a bunch of geniuses in a room and have them calculate the correct design. There are too many unknowns. It needs experimentation and trial and error.
Fair point. And the moon mission. Autonomous driving is such a consumerist issue though, seems quite well suited for a free market dynamic. I'd rather see a great joint effort on fusion energy or something.
Talk about twisting the argument. The Manhattan project was in competition with the Soviets and the Nazis, it was less a result of some unified front of all the smartest scientists in the world, and more of a result of "if we don't succeed first, someone much worse probably will".
Your logic of "if there were 10 of them simultaneously, it wouldn't have worked out" is flawed and self serving in that if you ignore all competition and just divide a single competitor into any number of smaller entities, of course at some point they'll be too small to be viable.
"Progress" is a rhetorical device used to paint whatever actually happened in a positive light. But coincident with the positive elements of the rise of western society, is a whole lot of tragedy. Wars with unprecedented brutality and scale, genocide, factory farming, mass extinctions and the systematic destruction of the planet, selfishness as a defining cultural trait.
Replacing competition with cooperation may have accelerated the positive aspects of progress (e.g. science), and maybe slowed or prevented many of the negative ones.
I think the diversity you see in cameras and lidar placement and existence is worth it enough to have different paths forward. Tesla seems insistent that it can be done sans lidar. It's definitely worth it to see which approach works best.
>Imagine every car manufacturer having a completely different take on what a car should be like from a safety perspective. We have standards bodies for a reason
Roads are also governed by public bodies. Road signs are standardized and public.
I think the government should take a much larger role in defining self driving cars. For example, rather than using computer vision to recognize signs, signs could be active standardized beacons; instead of having to recognize lanes, they could be repainted with rfid chips that are trivial for cars to recognize and follow.
Avoiding driving into people is also something that was somewhat regulated by crosswalks with pedestrian lights. Would it be absurd for the crosswalk to know roughly how many people are at it, then broadcast this to the car, rather than having the car have to recognize them?
There are many things the government could do with transportation infrastructure that would benefit everyone, many of which are literally impossible for companies to do separately. Can you imagine if we had to wait until IBM (or Siemens or Google or Apple) got into the business of launching satellites before we got GPS? There is a good chance that to this day cell phones wouldn't know their location or give anyone any mapping applications.
To me, self driving cars are similar. Many parts of transportation are a public good.
> Would it be absurd for the crosswalk to know roughly how many people are at it, then broadcast this to the car
Yes, completely absurd. For one, many places are too sparse, poor, or unstandardized for this to be remotely economical. Two, people often don't cross at crosswalks (jaywalking). Level 5 AV is a thing where 99% coverage isn't good enough. You need a lot of nines.
That's why many think lidar systems are a crutch. If lidar can't work in snow or heavy rain (at least 1-2% of days in the north), then you need fallback which must still be >99% as effective to avoid incidents. But then why not just use the fallback?
Generating data? Sure. But for actual use in the data pipeline? Makes you rely on an ultimately untenable solution.
In sparsely populated areas smart roads could inform the car that there hasn't been any movement whatsoever in the whole area for hours. (Along the entire road and adjacent to the roads). How the car responds (driving somewhat faster, perhaps) could mean a large measure of safety as compared to when there are a large group of people about to cross a rural road at night that the car may or may not see visually.
Anyway it's just one example. RFID or similar embedded beacons in the road paint would make roads much easier for cars to follow. They are expensive for any one company to do but cheap for the government to do if it is done everywhere at once and lasts 5-10 years. Road signs that broadcast what they are (instead of needing to be "seen" and interpreted visually instead of through radio by the car), are similar.
Finally, a government standard could coordinate cars into a caravan, avoiding pileups for example, and giving the participating cars several advantages you can find by Googling "car caravan". (Though this might be achieved by industry.)
Less than Uber, but more than Waymo, who is only offering ~$20k-ish stock packages like a late start-up who expects to 10x. Depends on how you value Tesla stock, though. See levels.fyi
levels.fyi suggests Tesla pays higher total compensation given that the stock is liquid. The Tesla salaries posted there are less than what I’ve seen in Waymo offers, but having liquid equity is a major differentiator in this case. A Waymo IPO could be 5 years off.
Without more details it sounds like an algorithm problem you'd be expected to solve from prior knowledge. Stuff like a breadth first search from the start point, up through various path finding algorithms to applying heuristics (I believe route finding on roads exploits the road topology).
Awesome presentation. Crazy that they're developing their own training hardware too. It's going to be a very crowded space very soon. Can they really stay ahead of everyone else in the industry? Can it really be cheaper to staff up whole teams to design chips for cutting edge nodes, fabricate them, build supporting hardware and datacenters and compilers, than to just rent some TPUs on Google Cloud?
I can see the case for doing their own edge hardware for the cars (barely), but I really don't think doing training hardware will pay off for them. If they're serious about it, they should spin it out as a separate business to spread the development cost over a larger customer base.
Also, I'm really curious whether the custom hardware in the cars is benefiting them at all yet. Every feature they've released so far works fine on the previous generation hardware with 1/10 the compute power. At some point won't they need to start training radically larger networks to take advantage of all that untapped compute power?
Watch the presentation from 6 months ago, where they explain the decision to build their own hardware for inferring :
https://youtu.be/Ucp0TTmvqOE?t=4309
It's not surprising that they also build the hardware for training. Correct me if I'm wrong, but Google use the same TPUs for training and inference, because the underlying operations are the same : multiply then add numbers. Once Tesla built the hardware for inferring, the design of the hardware for training is probably similar.
Unlike Google's TPUs, Tesla have a specific use case for the hardware (computer vision for automotive), and maybe than means they can further optimize the computation pipeline with their own specialized hardware.
Very good video, it contains answers to many of the questions that people are speculating here about and other interesting things about Tesla's custom chip.
- It's under 100W so they can retrofit into old cars
- lower part cost, so they can do full redundancy with doubling the parts
- they estimated that 50 TOPS is needed for self-driving
- lower latency with batch size of 1 compared to TPU's 256
+ GPU for post-processing
- security: only code signed by Tesla can run on the chip
- at the time (2016) there was no neural net accelerator chips
- some part's are built from bought IP (so not reinventing them) Probably things like the 12 ARM CPUs, LP DDR4 memory, video encoder, maybe the separate post-processing GPU too...
- physical size of the board is small
- performance example: on CPU 1.5 FPS, on GPU (600 gflop) 17 FPS, on Tesla's NN accelerator 2100 FPS
- Besides the convolution even the ReLU and Pooling is implemented in hardware
- Paying attention to the energy efficiency down to the arithmetic and data type usage.
- The silicon cost is less than their previous hardware (HW 2.5)
- old hardware 110 FPS new one 2300 FPS
- 144 TOPS compared to NVidia's Drive Xavier 21 TOPS
Google has Edge TPUs for use outside datacenters, and they don't support training. Neither do the chips Tesla made for their cars. It's a pretty different problem.
I wouldn't be so sure. Edge TPUs could be the exact same architecture than Google Cloud TPUs, but as you need less computation power for inferring than training, they have simply less transistors on the die and could be underclocked.
In other words, Cloud TPUs could be the same architecture than Edge TPUs but scaled to an higher frequency and more packed.
Training is currently done in floating point math, whereas inference can be done fixed point without much loss of performance. Fixed point is ~10x cheaper in terms of power and silicon area for equal performance.
Also, training requires a lot more RAM per unit of compute, since it needs to store all past layer activations, whereas for inference, that is unnecessary.
As far as I know, no player who has developed dedicated ML hardware (as opposed to using GPU's) uses the same hardware for both inference and training.
I think the size of the networks they are training might already be good motivation for developing custom hardware for training.
I would expect their training hardware to be something specifically aimed at optimizing memory bandwidth to support distributing training of their “shared” hydra feature. It’s interesting that the shared hydra feature extractor is able to converge as they keep adding more and more output predictions under a training regime of interleaving asynchronous updates to the model from different predictor networks ...
Seems to me the formula they are pursuing with custom hardware might be to support a strategy of
1. keep adding more predictions based on same feature
2. Increase the span of time represented by batches used to train the recurrent networks
Both pursuits seem very data efficient in terms of the amount of training data they could conceivably collect per unit time of observation ...
Custom hardware with a problem specific memory architecture aimed at efficiently supporting training with very large rnn time slices could be developed that’s more about “make it possible to train this proposed model at all” rather than “make it faster/cheaper to train existing common model architectures”. When custom hardware is required to make it possible to train the model they want, the validity of the hardware development cost bet might end up being more about the effectiveness of the model they think they want than it is about maintaining general purpose performance parity vs any off the shelf hardware options ...
At Tesla's scale and priorities, they'd probably be less keen on using external cloud providers. Using TPUs at their scale would certainly require Google's AI consultants to supervise which isn't ideal for Tesla.
Not agreeing or disagreeing with their decisions, but if you have the resources, you can certainly design a custom chip that performs a specific type of task very well that beats other competitors. Nvidia's GPUs are have to be reasonably good at training across different NNs. You could have a chip that's exceptional good at training one/two specific types of tasks.
For most companies, this would be a bad idea. However, Tesla knows how to produce hardware.
> At Tesla's scale and priorities, they'd probably be less keen on using external cloud providers.
Not sure if it’s still the case today, but previously Tesla’s training was done on-prem and with their own in-house Tensorflow clone.
And yes, if you get TPUs from GCloud, you are likely to be working with their engineers to get things working. Those engineers tend not to have much business conflict of interest, though. They want to help you because your problems are likely more interesting than what they’d otherwise be assigned.
Their own page says, "our recommendation algorithms ... learning characteristics that make content successful ... optimize the production of original movies and TV shows ... optimize video and audio encoding, adaptive bitrate selection, and our in-house Content Delivery Network ... and advertising".
At a first glance, they don't seem to be problems in which ML learning can have a huge impact. Netflix in the US has a catalogue of about 4000 movies plus a few hundred tv series. It's tiny- compare with 153707 items on a niche recommendation website (criticker.com).
Characteristics that make content successful.. here again, it's mostly decent quality content plus marketing. I doubt the scripts are reviewed by NNs.
I have no idea about the network and delivery stuff, but I guess a well designed network can take care of most of it.
My strong impression is that, similar to other well known cases (Uber, WeWork) Netflix is a mostly traditional company (a media company) that very strongly wants to be seen as a tech company.
Netflix put a lot of work into recommendation back in the dvd delivery days when their catalog was absolutely massive. Now in the streaming space the catalog they license is much smaller so recommendation is less important; basically they just advertise the popular stuff for your demographic.
Its a bit odd that legally one can rent out physical disks, but there is no corresponding way to legally get permission to rent out streaming content without negotiating with the rightsholder. But thats how it is...
> Its a bit odd that legally one can rent out physical disks, but there is no corresponding way to legally get permission to rent out streaming content without negotiating with the rightsholder. But thats how it is...
It's hard to think of a good fair regime to do this under.
At least with a physical disk, there's a maximum reasonable rate that you can turn the disk around between users and you need to have enough copies for whatever the lifecycle peak demand is.
We do have an audio compulsory licensing system for things that are purely songs. But with video works, there's not a clear boundary for "how big" the work is-- how do you treat 30 hour anime series vs a 5 minute Pixar short? How do you treat continuing medical education videos vs. fluff amateur made content? Etc.
Nothing crazy about it. TPU-like stuff is ~10x the energy efficiency of GPUs and several times the speed. When you're spending megawatt-hours and days to train a single model, it adds up in both real and opportunity costs.
Also, Google TPU TOS prohibits the use of TPUs for stuff that competes with Google (and I'm assuming with other companies under Alphabet umbrella), at Google's sole determination. Not that it would be a good idea to upload Tesla's proprietary data into Google Cloud even if it did not. Cloud, after all, is just somebody else's computer.
> TPU-like stuff is ~10x the energy efficiency of GPUs
10x is probably overstating it when talking about newer GPUs because they have ML hardware in them now. Also, that still doesn't make it a good idea to build your own chips because there will soon be many third party options to choose from. Doing your own chips is a bet that you will out execute dozens of companies ranging from startups to industry giants. Simply taking your pick of the best commercially available options is likely to be a better choice in the near future.
Yep that's the clause. The clause itself is not that problematic for Tesla. What's problematic is that it can be changed over time, and it'd be foolish to single-source something as important as deep learning compute without the option to go elsewhere. Not to mention the rather extravagant Cloud pricing. So Tesla is taking a page out of Steve Jobs' playbook and it will control its own core tech. That's smart, especially considering that they already have bits and pieces of the IP that they'll need.
That clause doesn't apply at all though. It's not even for the same product.
As for single-source, the models are written in PyTorch, not TPU machine code, and they're pretty standard models anyway (e.g. Resnet-50). They can easily transfer to other hardware if necessary. There's not a ton of lock-in there. It doesn't justify the massive costs of ASIC development just to avoid this nonexistent lock-in and imaginary TOS clause.
GPU advantage: more refined ecosystem and you can buy them for $<1000 or get laptops with them built in, and if NVDA has sweat more software engineering blood and tears than GOOG into your model's functions, it will run better on them
TPU advantage: Colab has a free tier that lets you play with them at no charge and if GOOG has sweat more software engineering blood and tears into your model's functions, it will run better on them.
All IMO of course. And deep down it can get more complicated than that, but I salute GOOG for being the first company to ship competitive AI HW, doubly so at scale.
Stuff like their TPU and Waymo's Honeycomb Laserbear (something along those lines... their lidar naming system is pretty long) shows that Google is making good products for a limited reach of people.
TPU? Seems like it has a lot of potential, but not for people directly competing with them.
Waymo's Laserbear lidar? Seems like it has a lot of potential, but not for AV companies directly competing with them.
Google's playing this game pretty fiercely... which given their size is pretty bad/daunting.
>Nothing crazy about it. TPU-like stuff is ~10x the energy efficiency of GPUs and several times the speed. When you're spending megawatt-hours and days to train a single model, it adds up in both real and opportunity costs.
could you share the stats on this? Google told me to use a K-80 for training.
Also, the software part of it (NNs and their algorithms) have been so widely researched and published that competitive advantages here are harder to come by than in hardware RD.
Also, vendor lock-in is a huge challenge in the cloud space. I don’t think Tesla would be comfortable with the fact that all their training data sits on a potential competitor’s datacenter.
A car is a hardware device as well, and an electric car does not have the kind of power budget that allows you to throw oodles of standard pieces at it without paying a severe penalty in range.
Compared to the energy needed to move the car, everything else is pretty irrelevant. Power hungry features like Heating/AC only makes a few % difference to range.
From autonomy day hacker news comment: "Pegasus consumes about 500Watts, compared to under 100 Watts for Tesla's FSD computer. Elon in particular emphasized the performance per watt (as it's always possible to cram more chips to increase performance if you ignore cost and power consumption).
The comparison made in the video: 500Watts for an hour consumes about 2-3 miles of range. In a city in slow traffic, going 12mph, that's a significant range reduction. So you might have a 10% improvement in range for the Tesla ASIC in low speed conditions" [0].
Also, Tesla is/was planning on running these chips even while not in autonomous mode in order to spot new scenarios which it can then record what the car sees and what the human does in order to collect a lot more unique road scenarios to train their models with.
Increasing their compute per watt allows them to get more compute per watt, so they will pursue it lots, low speed energy savings are a way of making the range extension seem like a big deal when it isn't (maintaining highway speeds requires tens of kilowatts).
It was a big deal for Tesla, though, because they had to fit into a tiny power budget. HW2.0 wasn't enough for what they wanted to do, and to retrofit existing cars they had to consume a similar amount of power.
Power efficiency does make a range difference and is worth seeking, but Tesla is exaggerating this, IMO, to conceal one of the ways that the churn-heavy cycle for FSD has imposed organizational costs (an entire chip-level hardware program, in this case).
Designing ICs is a very expensive treadmill. If you can buy fast enough hardware to do your job, it's really hard to justify spending a lot for something that may be nominally a little better on some metric.
There's commodity computing of comparable throughput that other automakers can incorporate without too much cost, but it doesn't fit into the existing TM3 vehicles because it has a larger footprint in power, space, and cooling than HW2.
Here, Tesla spends a bunch of money to develop HW3 and overcome this problem, but the advantage in a new car design between HW3's efficiency and commodity hardware is limited.
>Also, I'm really curious whether the custom hardware in the cars is benefiting them at all yet. Every feature they've released so far works fine on the previous generation hardware with 1/10 the compute power.
The latest OTA finally brings a hardware v3 only feature, traffic cone visualization, and traffic cone automatic lane change.
I would guess that while the new hardware has the same features, the accuracy might be lower on the old GPU's because they are forced to use smaller networks or to run them at lower frame rates.
Looks like they are really nicely orchestrating workloads and training on numerous nets asynchronously.
As a person in the AV industry I think Tesla's ability to control the entire stack is great for Tesla... maybe not for everyone who can't afford/doesn't have a Tesla.
>maybe not for everyone who can't afford/doesn't have a Tesla.
Affordability is not as much of an issue as some make it out to be. Cost-wise it's like owning a Camry or an Accord, if you go for the lower end models. If you mean not everyone can afford a new car, then sure I agree with you.
Edit: if you think I'm wrong about this, please explain or ask me to clarify anything?
As a small anecdote, my parents couldn't afford/didn't want to spent over $30k for a car. Surely we could've gotten a Tesla for $5k+ more, but given the relatively new infrastructure with electric charging stations (and the fact that none are available in the apartment I live in) my parents didn't find all the new cool features appealing and instead got a regular Toyota Sienna that has nothing fancy, just enough to take the family around.
Similarly, the infrastructure around electric charging stations I believe hasn't fully matured yet and as a result many people who've already owned a car, I believe will stick with gas cars since there's no huge incentive to change, unless it becomes easier to charge (faster, more convenient).
Do note that I don't have a drivers license. I never intend on getting one (I believe in what I do in the AV industry). I'm just guessing on the habits of people, not that I have any real experience in buying a gas/electric car.
Also note I didn't think you were wrong, not sure why the downvotes.
I think for a good majority of people especially in America will go along the lines of "If it ain't broke, don't fix it" for gas cars.
Regardless, I think Ghost Locomotion and Comma.ai have a lot of potential for what they're doing now. I think they'll coincide with fully driverless cars like Cruise, Waymo, or Aurora. Regardless of if they're electric or not (I think electric cars will be more heavily adapted if we use Cars as a Service)
Also, yes, I don't ever intend of getting a license. This problem is really fascinating and I'm excited to play a little part of it. By not having a license, I can spend more time having the perspective of a person in 20-30 years when drivers licenses will become less common and utilize it in my work.
>I can spend more time having the perspective of a person in...
This is awesome. I've always found that only by really experiencing the future (or some part of the leading edge that is soon going to become the future for most people) you gain much more understanding of it, ahead of others, and can apply it in your life planning. It would be worth living this way even if only temporarily just to get that as you say perspective... Not ready to give up driving forever, heh, but it I might have to try a short sample of that non-driving lifestyle sometime.
I'm still amazed that Teslas team isn't using a map... I know maps get outdated and are sometimes wrong, but having inaccurate knowledge of what's around the corner is far far more helpful than not having any clue whats around the corner.
The smart solution would be to consider a map a probabilistic thing, which neural networks are really good at handling.
I'm still amazed Tesla has decided not to use lidar and instead just stick with cheap cameras. Better sensors are there, they're available, they're cheap and they can probably "see" better than plain old cameras... it doesn't make too much sense not to use them IMHO. But then again, I am not coding NNs for Tesla...
While LIDAR's are certainly 'better' from a technological standpoint than not having anything, from a business standpoint it's less clear.
LIDAR's are cheap, but not cheap enough yet to not seriously affect the bottom line if you put them into every car. It also will kill the resale price of cars without it, which in turn hurts the companies image and stock price.
>“Lidar is a fool’s errand,” Elon Musk said. “Anyone relying on lidar is doomed. Doomed! [They are] expensive sensors that are unnecessary. It’s like having a whole bunch of expensive appendices. Like, one appendix is bad, well now you have a whole bunch of them, it’s ridiculous, you’ll see.” https://techcrunch.com/2019/04/22/anyone-relying-on-lidar-is...
You're underestimating the cost issue. Obviously if it was literally 0% extra cost they would have a value benefit. The problem is making a great and cheap and profitable electric vehicle.
lidars are used to generate humongous amounts of labeled training data for depth perception networks so that you don’t have to use them during inference
Interesting that they don't have a full 3D world model. I'm certainly not a machine learning expert. I'm still amazed the route from image recognition to a 2D map of "what's drivable" to autonomous driving is so direct. One would expect to hit a ceiling really soon with that approach.
One thing I didn't quite understand is how training sub-graphs in parallel works. If you are editing a sub-graph of a monolith type model, aren't you affecting other graphs that have dependencies on the one you're editing? If these are independent graphs, then what's a "sub-graph" even mean?
In PyTorch you have full control on the graph and weight, everything feels like Python.
So feeding some of the learning between “sub-graph” is easy. Not sure if this is possible on Tensorflow/Keras?
He describes the sub-graph training in the context that they they have all the predictors in one big model, and with control of the network can feedforward and train sub-graph (read sub-parts) of the model.
This is possible in keras, just drive new models that are functions of a monolith model and train independently. I still don’t understand the point though. If you train a “subgraph”, the other tasks dependent on the part of the graph will have to get retrained anyways, since those edits will affect the other tasks.
For those who want to learn more, I would start with Mask-RCNN where you have a very similar architecture: one shared backbone with multiple heads that can be retrained for various tasks (bounding boxes, masks, keypoints, etc): https://youtu.be/g7z4mkfRjI4?t=628
- TensorFlow is great at deployment, but not the easiest to code. PyTorch isn't frequently used in production until recently.
- If you have the resources for great AI engineers and researchers, your team will be good enough to build and deploy both frameworks.
- Preference toward the easier framework your tech leads prefer.
- Lots of new academic research is coming in PyTorch
- TensorFlow is undergoing a massive change from 1.1x to 2.0; if you choose TensorFlow, write on 1.1x just to then refactor to TF 2.0? Or write on TF 2.0 now and deal with all new edge cases? Or write in PyTorch (easier) but handle the more difficult deployment process.
- ML code quickly rots. Bad PyTorch code is just bad Python code. Bad TensorFlow code can be a nightmare to debug.
- PyTorch's eager execution makes coding NNs much easier to prototype and build.
Not at expert, but as far as I understood PyTorch is much better to build new models, while with tensorflow it’s easier to assemble the predefined blocks. Source: somewhere in the motivations on why Fast.Ai courses switched to PyTorch for the second edition.
Because PyTorch literally triples researcher productivity. Imagine a deep learning framework which you can actually debug when something goes wrong and which you don't have to fight every step of the way to do even simple things. That's PyTorch.
The good news for me is that the upper bound for fully autonomous self-driving cars is no more than 50 years away. What a time to be alive. If it happens before then, that will be an absolute bonus.
He is an excellent presenter who really has a passion for teaching.
Im not really involved with the industry, so I cant really speak to how he holds up to other experts. However he is by far the most digestable resource I have found for learning about NN and science behind them.
If you are just discovering him now, google his name and just start reading. His work is truly binge worthy in the most meaningful way.
The description of SmartSummon about halfway through the talk is interesting. One of the views looks like SLAM using a particle filter, but Andrej seems to say that it's done entirely within a neural net.
Pytorch is used to train models on servers/cloud, not to drive the car later. The trained model is converted to something native to the embedded environment of the car.
That might be a trick. It’s not the trick human brains use. It might be equivalent to the way that when we say we want to “look at” a thing, we often also want to touch it.
The parent meant : gather training data with Lidar and Camera, then build a model with that data to learn to reconstruct a 3D space only from Camera data, and then embed that model in the cars.
Tesla is already using a model to rebuild a 3D space from Camera data only, the parent suggests to improve the quality of the transformation with high quality 3D representations from Lidar.
2D to 3D transform is simple trigonometry (using stereo / motion) and should be possible to learn without lidar. I think this is already a solved problem. One option though is to add lidars in random Teslas (e.g. 1/1000) to help with the labeling / learning.
Andrej Karpathy taught cs231n. The question is about Sam Altman who didn't teach anything related to NL. He's on the board of OpenAI because he was one of its founding investors.
Thanks, that makes sense. So now the question is - how did Sam become a YC/Angel investor from someone who taught a class at Stanford? I think we need an interview with Sam.
I've never heard that that was the reason for Elon leaving the OpenAI board. The official announcement said
"As Tesla continues to become more focused on AI, this will eliminate a potential future conflict for Elon."
Do you have a source?
Elon belittles lidar saying it is doomed and will never work yet Waymo and Cruise will probably be operating self driving taxi fleets in California next year. Tesla deserves getting dumped on for those comments because they are no where near self driving.
Andrej Karpathy just started working on Tesla's software 2 years ago, before what Chris Lattner did was a mess (he wanted to just have 1 task that learns magically everything), Andrej had to start everything from scratch.
Waymo had a 20 year advantage, but Google lost many key people there in the meantime as Larry Page didn't want to launch partial self driving.
I think both approaches are great and I wouldn't want to choose between the 2, just be a happy user of the end result of the competition.
> before what Chris Lattner did was a mess (he wanted to just have 1 task that learns magically everything), Andrej had to start everything from scratch
Sorry, I can't find the source, I'm reading Elektrek all the time and often there are people in the comments section with knowledge about what happens inside Tesla.
If you look at self driving automation as a black box, sure.
But at the same time people understand that on a highway Tesla Autopilot is safe enough to be used on a long boring road, and dricers generally feel less tired (and can focus more on the harder parts of the road).
Did you really follow how drivers reacted to the changes before/after Karpathy came there? People were extremely dissatisfied with the first versions of Autopilot 2 (as it was worse than Autopilot 1 when it appeared), but it improved a lot, and now it's definitely better.
I recall everyone saying Tesla "FSD" was awesome and industry leading, I see them saying that today, and I expect they'll be telling their grandchildren the same while being driven around in a Waymo.
In all seriousness, I like Karpathy and his work. Seems like a good guy. Not sure why you'd want to work to enrich a guy like Elon Musk though.
> Not sure why you'd want to work to enrich a guy like Elon Musk though
If you look at the alternative (Dieselgate), which is killing millions of people every year thanks to air pollution (sadly I'm highly affected by it, trying to not go near any polluted area is what my life is about at an age of 37), I'm happy for Elon Musk and I hope Tesla's mission will succeed as soon as it is possible.
Waymo has a significantly different approach using lidar rather than just vision. The approaches seem to have different strengths and weaknesses. Waymo is actually able to do full autonomy but in very restricted environments - basically semi deserted suburbs. Tesla's autopilot works in real city rush hour traffic but not reliably enough to be let lose on it's own. It remains to be seen which will win or if it will be some other solution.
Just listening to this talk scares me. The amount of errors - even in a seemingly normal, sunny day - is mind boggling to think people trust this crap.
How can we rely on the output of eight cameras? This is not a kid's science project.
It's all fancy neural networks until someone dies. Pretty callous and Silicon valley-mindset for such an important and critical function of the car.
No, it needs to be a lower rate since it will kill at random. Today’s rate includes drunk drivers, people on their phone and other “unsafe” drivers. If you are an attentive driver your chance of death would actually go up if the overall death rate was the same.
I’m maybe missing your point, but this is reflected in the overall death rate now. Similarly in the future the overall death rate will include deaths of the occupants of self-driving cars and also the cars/people that they hit.
The perception will be different however, since in the public eye the drunk driver “had it coming” and the only real loss is the other driver/passengers/pedestrians. If a self-driving car kills its own occupants and the occupants of another car then there are more “innocents” than in the drunk-drive scenario.
To really be a measurable improvement over humans, and especially an improvement that is statistically distinguishable from just letting the safety tech of 2019 percolate into the average car, self-driving needs to achieve fatality rates around 0.1 per billion miles.
Current averages for the US are around 10 per billion miles and decreasing; best countries in Europe are already below 5 per billion miles.
There's some really interesting progress both on vehicles monitoring driver attention, and on monitoring for alcohol in the air, that would yield substantial improvements even beyond 2019 tech. I have no doubt we'll see fatality rates below 1 per billion miles in Western Europe within a decade.
The corollary of the statistics quoted above is that you need to observe your self-driving vehicle system over tens of billions of miles before you even know if you're safer or not.
I'd quibble with your stats that self driving has to be 100x better than human to be a measurable improvement. You can estimate fairly well if a human driver is safe or not from a few hours as a passenger by seeing if they notice everything and if they have near misses. Also it's not to hard on published data to see the Tesla autopilot seems a fair bit worse than humans under the same conditions. I imagine they will improve.
I'm not saying it has to be 100x better than a human.
What I'm saying is, the averages today are across all vehicles on the road, some of which are old and have fairly low survival probability in an accident. And the deployment of technology to handle/avoid distracted driving is still in its infancy. If we just wait 10-15 years, I believe the safety rates for humans will increase by 10x over current US statistics.
Then self-driving has to manage another 10x improvement over that, in order to be worth it.
"It's also mind boggling to think we currently trust organic tissue to do this crap, some of which is bathed in psychoactive chemicals."
The appropriate standard for self-driving cars is a sober professional driver, not the average idiot on Saturday night. It isn't unusual for a driver to go a million miles without an accident, when that's their job.
No, the appropriate standard for a self-driving car is the driver they replace. If they're safer than whoever would have been driving (be it on a saturday night or not) then they're a net win. Right now the average idiots are buying more Teslas than mail carriers and cab drivers, though that might change I guess.
If you like numbers, then you'll like this one: Humans achieve 7 9's of reliability when driving a car, as measured for fatalities by time, or 8 9's if you measure by miles.
I've yet to see a computer just stay up that long, never mind actually doing anything the whole time.
(If you use the collision rate instead, the numbers are about 500 times worse - but that's still quite a lot of 9's.)
People have this fixed thought that humans are terrible drives. They are not! Computers have their work cut out for them to exceed the reliability record humans have set.
But that's not what I measured. I measured safe minutes of driving as a ratio to unsafe minutes. Which is a unitless number.
For 7 9's I assumed a crash had a 5 minute lead-in of unsafe driving before the actual crash, and that average driving speed was 30 mph.
If you assume the bad driving is 30 seconds (for example the accident in Tuscon the Uber car saw the pedestrian around 30 seconds before crashing), then you can add another 9, making both figures 8 9's of reliability.
Thanks for the clarification. I haven't checked your numbers but they seem reasonable.
FWIW, I think deaths, regardless of fault, is probably the best number for an apples-to-apples comparison. If the Uber car were driven by a person without a dash-cam, then there is no way they would have been ruled at fault.
Similarly, I have been involved in 5 collisions (most were not my fault). Of the 5, 3 were reported to insurance and only 1 was recorded by the police (For 1 other the police were called, but the dispatcher (on the non-emergency police line) said "Are both cars driveable? (yes) Is anyone injured? (No), then don't bother us!" (By comparison, in California even a single car collision on private property must, in theory, be reported to the police).
Deaths, at least, will get recorded by the CDC, if nobody else.
[edit]
However, I don't think that deaths are necessarily the best number for considering "failures" all deaths are in some way a failure, but most failures do not result in deaths. Nobody even had to see a doctor for the 5 collisions I mention. I've fallen asleep at the wheel and crossed the center-line without getting in an accident, &c. humans make a lot of mistakes, but most of them do not result in a collision, and most of the ones that result in a collision do not result in a death.
I also think it's inevitable that computers will become better at driving than not just the mean or median driver, but 90th percentile or more, at which point it would be immoral to not, in some way, encourage self-driving cars over human-driven cars. My guess is that we are less than a decade away from that point, but I could be wrong.
Certainly the lack of a safety-culture in the automotive field (as compared to say, avionics) is a huge barrier to be overcome, and that's a social hurdle, not a technical one and I can only make wild-ass guesses as to when that will change.
You are right this approach is scary, and it is astonishingly innacurate (im a tesla owner).
However the reason are not eight cameras. You should be able to drive fine with just one camera (thought experiment: could you drive a car 1000 miles from you, just by seeing what the driver of that car would see, no extra cameras, sensors or lidars?).
One stereoscopic ultra-HDR 4K camera would be fine... if it was backed by a strong AI. To even suggest that ML is anywhere wen remotely close to this level is the height of hubris.
How do we currently regulate and put behind bars the human drivers that drives the metal torpedoes like idiots ? Specifically the ones that are not caught.
I am honestly scared even of WayMo that I think unless NHTSA gets its acts together this shit shouldn't be allowed on roads.
It's often reactionary - people wait for someone to die and then suddenly you have nervous Musk in front of Congress and such crap. Why wait? Why can't these be regulated before allowing on roads?
And no, failover control is not acceptable given the past incidents and deaths.
Why are people so committed to the idea that self-driving cars are anywhere near human standards? It just seems like a groundless assertion of faith to me.
Professional drivers can go for a million miles without an accident, and I don't believe anyone's autonomous driving software can get within an order of magnitude of that without a disengagement, even in favorable conditions.
You may disagree that a disengagement is equivalent to a human having an accident, but I strongly feel that it is. In either case, you have a situation where the driver reached a point where it was definitively unable to determine an appropriate next action.
The only way to train these things is on actual roads. As far as I know, all of these systems require a human driver present at this time. None are fully autonomous.
But there will always be people who do not get this.
Those people should not drive Teslas, or pretty much any modern car for that matter.
If you are under the impression that you would be relying on this system to drive the car, then I agree you should not get a Tesla.
Of course full self driving is coming at some point, but that's a conversation for another day. Meantime Tesla is making steps toward it very incrementally with things like the "stop mode" rolling out right now.
this tech is interesting but so poorly understood that it's just using the (public) roads as one large alpha test. given a NN there is no way to verify what safety ranges are there. for instance if each camera slightly changed exposure or occlusion are the results smoothly changing? all they can do is try it and hope the inputs are in a safe part of their optimization space.
The autopilot is a drive assist , I can explain that there are less accidents when drive assists is one because
- the driver is still present and will most of the time intervene and save his life(there are some youtube videos where the Tesla AI was trying to kill the driver and those incidents don't appear as accidents int the stats)
- "autopilot" is engaged mostly on highways , but the statistics are not accounting for this
- they are also comparing the safety of a new car versus the median of all cars(old and new, cheap or expensive) on all demographics(how many teens own a Tesla?)
I would like that Tesla make all the data open or have an independent group analyze it, including the disengagements, I wonder how isolated are the cases where the car is driving you into trucks or stone wall but the drivers intervened and saved themselves and also saved Tesla from a bad statistic.
Watching the AI visualization of summons in-action was horrifying, and made clear why many have reported summons mode as resembling a drunk person navigating a parking lot.
Yes it's not perfected yet, which is why it requires human supervision for now. Having it operate in the wild as it is now (again, under human supervision) will help it become less horrifying, which I think you would agree is what we want.
I don't agree, as I don't particularly value/want a future with self-driving vehicles.
I also think the way Tesla is going about this is utterly idiotic and reckless. It upsets me that I'm sharing roads with vehicles having these dysfunctional immature systems.
Point this crap at video game engines and don't let it anywhere near real people until it can drive millions of virtual miles in something like GTA without hitting anyone/anything and without behaving like a drunk driver who lost their glasses.
>I don't agree, as I don't particularly value/want a future with self-driving vehicles.
If bad human drivers killed only themselves, that would be one thing, plenty of Darwin awards to go around. But they don't. I do value reducing traffic fatalities to near zero. I think that would be fantastic.
The GTA idea is a good one. I think they've done that, many times over, but I could be wrong.
They way Tesla is doing it is interesting. Have human supervision, so even though the system may behave as you say like a drunk driver who lost their glasses, at least the human is there keeping it in check.
Humans aren't perfect, but I've found (and statistics have shown) that the system plus a human is better than a human alone. That alone should make you less upset. These cars are less reckless than cars driven by humans.
Human supervision is just one element of the very smart way Tesla is going about this. The other element is incremental changes. A lot can be said about that but I won't now. Teslas are getting small and big safety improvements that roll out over time as full self driving gets closer and closer. This should make you feel better.
If you see a Tesla in autopilot, it won't be tailgating. If you see people tailgating in a Tesla, you can be sure autopilot is not turned on. Even with the follow distance set to the lowest, the distance kept is larger than what is done by most humans. This should also make you less upset.
When driving in stop and go traffic with Teslas with autopilot on, if the owner sets the follow distance to say the middle range, you'll find you are often able to merge in in front of them, because they allow space for that (and for safety). This should help too.
Over time, the system is getting better and better. Eventually, it will all be ok!
Some time ago (around ~10 years) this guy (the presenter) was internet famous for being a Rubik cube speed solver and making tutorials and videos about that: https://www.youtube.com/watch?v=609nhVzg-5Q