M1 and AMD GPU support. I'm personally more interested in the latter as I haven't yet upgraded my MacBook Pro and I expect that my Vega 20 to be faster than M1 at ML training.
The raw compute power of M1's GPU seems to be 2.6 TFLOPS (single precision) vs 3.2 TFLOPS for Vega 20. This can give you an estimate of how fast it would be for training.
Just for reference Nvidia's flagship desktop GPU(3090)'s FP32 performance is 35.5 TFLOPS.
So Apple would need 16x its GPU Core, or 128 GPU Core to reach Nvidia 3090 Desktop Performance. Or roughly 480mm2 Die Size, 192W TDP excluding memory controller and interconnect.
Doesn't look too bad for Nvidia, especially when you consider 3090 is still on Samsung 8nm, which is equivalent to TSMC 10nm, compared to 5nm on Apple M1.
If Apple could just scale up their GPU and trounce a 430B market cap competitor's premiere product at 1/2 the power, 60% of the die size, that actually looks pretty bad for nvidia, doesn't it? Scaling is more difficult than that, and who knows if they could so easily, but who thought Apple would render both Intel and nvidia irrelevant?
Regardless, Apple's threat to vendors like that is their complete vertical integration. Ran some of the new object capture code (photogrammetry) on my M1 Mac yesterday and in no time at all the 11 trillion op neural engine blasted through and generated a remarkable model. We've seen Apple respond to discovered performance needs by plugging in a matrix engine, a neural engine, and scaling appropriately, dedicating cores and silicon to the greatest needs. They are in a very unique position relative to someone like nvidia who effectively throws something over a fence.
That's besides the point though. You can't pick and choose which workloads you're going to run on Apple Silicon, because the ultimate goal is that it will be able to compete with with the rest of the industry in raw performance metrics, which is simply not the case right now. My M1 Mac's GPU still loses in several benchmarks against my 7-year-old 1060. If Apple wants to lure people like me into their pro segment, they're going to need to scale aggressively: something that ARM has notoriously had trouble with in the past.
Also, Apple desperately needs to support a real graphics API. Metal is a joke, and even the translation tools like MoltenVK, while impressive, still end up beholden to Apple's arbitrary limitations. If they don't end up supporting Vulkan on the M1, it's a moot point for me. You could have the most powerful GPU in the world, but I won't use it if it's bottlenecked by the shittiest modern graphics API.
"My M1 Mac's GPU still loses in several benchmarks against my 7-year-old 1060."
The GPU in the AS M1 is the fastest integrated graphics available in the mainstream computing market [1]. That is the competition, not a standalone, 120W GPU. Apple is purportedly now working on separating their GPU designs into a much larger heat and power profile (which contrary to some of the comments on here clearly isn't going to be for laptops, beyond an external TB4 enclosure) and it might just change things a bit.
Scaling a GPU is easier than scaling a CPU, by design. Apple's GPU has nothing to do with ARM.
And to your original point, yes, Apple does largely choose which workloads run on Apple Silicon and how. By controlling the APIs along with the silicon, Apple abstracts it to a degree that gives them enormous flexibility. The Accelerate and CoreML APIs are abstract vehicles that might use one or a thousand matrix engines, neural nets, or an array of GPUs. Apple has built a world where they have more hardware flexibility than anyone. And while close to no one is doing model training on Apple hardware right now, Apple has laid the foundation so a competitive piece of hardware could change that overnight.
[1] The SoC graphics of the chips in the PS5 and Xbox Series X have more powerful graphics, but the GPU chiplet alone on those systems uses a magnitude more power and more die than the entire M1 SoC. In another comment you mentioned that Zen 2 integrated graphics come close. They aren't within a ballfield, literally with 1/4 or worse the performance. In discussions like this unfortunately the boring "n years old / n process" trope is used to excess, yet again there are zero competitive integrated graphics on the market. None. Apple isn't a GPU company, yet here we are.
This won't scale like this, also for deep learning CUDA and CUDNN will be still probably 2-5x faster then AMD/Metal drivers as was proven before (in case of AMD shitty deep learning drivers)
This. Nvidia's CUDA is a genuine technical marvel, and Metal really has nothing that can compete with it. I also get the feeling that unless Apple bites the bullet and supports Vulkan, they won't actually get any good GPU libs.
This is more for the Mac Pro line. Not so much the laptops.
We’ve only see the Apple equivalent of the i3 with integrated graphics. It’s going to get interesting over the next several months as Apple unveils their middle and upper performance solutions.
I own an M1 Macbook Air. At its hottest, it's a fraction of the "normal" heat that my year old Intel based Macbook Pro runs at.
To get the M1 warm, I need to be charging it while also playing an Intel based game like the latest Subnautica, with the machine on my lap. Even then, it's not as hot as my Macbook Pro gets while unplugged and browsing the internet.
No. According to rumours: the high-end 64-128 GPUs will be exclusively for the Mac Pro and the MacBook Pro will have 16-32 cores GPUs instead. It think it's fair to assume that we won't see much changes with the cooling for their laptops.
Vega 20 seems to also refer to a discrete GPU. This has been later rebranded to Radeon VII (maybe because of this confusion). The number you are quoting is for the discrete GPU.
Same with 10 LOL. VEGA10 codename is for the original Vega Frontier Edition/RX Vega 56/RX Vega 64. But now there's also "Vega 10" used as a description for the 10 Vega compute units on Ryzen APUs.
A large number of Apple devices in the field, and even still for sale by Apple, have AMD GPUs. It's most likely legacy support driving its inclusion -- if Apple abstracted the pluggable device, appropriately branching it for their existing AMD and new chips seems a given.
It's extremely doubtful any new Apple Silicon device will come out with AMD GPUs.
I wanted this for a long time, but eventually gave up on my MacPro5,1 that I had a Radeon VII in (before that, Vega Frontier, before that R9-280X).
Would like to see some benchmarks with that sort of hardware or with a 6900XT, driver support which came to MacOS only recently. Now I just have an M1 Mac Mini & a PC with Nvidia.
I tried it on a Radeon VII. It is little hard to get running as it does not work with latest kernel, but otherwise it kind of works with some quirks. One of the quirks is the first epoch of your training is very slow to start compared to Nvidia cards. Also, the speed of training is slow.
> Just for reference Nvidia's flagship GPU(3090)'s FP32 performance is 35.5 TFLOPS.
In context of ML, nvidia’s flagship is the A100, which has 312 TFLOPS. You can also compare with a TPU device which has 180 TFLOPS (v2) or 420 TFLOPS (v3). You can use at least the TPU v2 reliably on colab for free.
It's not really fair to compare a discrete GPU to a mobile GPU, I only provided this as a comparison for someone who maybe has one of these at home. And btw, you are talking about TF32 performance not FP32. TF32 actually uses 16 bits. A100's FP32 performance is actually lower than 3090, it's 19.5 TFLOPS:
https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Cent...
FP16 performance is also relevant as a lot of people now train in FP16. The default for pytorch/TF is still FP32.
I’ve been mostly using TPUs lately and non-TF32 GPUs at work, so I don’t have any practical experience with TF32, but the sales pitch seems pretty good. Do you have any personal experience on whether it’s as much of a drop in replacement for fp32 as they suggest?
I haven't used TF32 personally, but I think the sales pitch is not too far off. Most of the time I use mixed-precision training which should be similar to FP16/TF32 in terms of performance. It does tremendously speed up training.
I have found the M1 air fine for web browsing but kind of hard to install software on.
Following the instructions:
-----
python -m pip install tensorflow-macos
...
ERROR: Failed building wheel for numpy
Failed to build numpy
ERROR: Could not build wheels for numpy which use PEP 517 and
cannot be installed directly
-----
(base) dave@daves-air ~ % uname -a
Darwin daves-air.lan 20.5.0 Darwin Kernel Version 20.5.0: Sat May 8 05:10:31 PDT 2021; root:xnu-7195.121.3~9/RELEASE_ARM64_T8101 arm64
`pip install --upgrade pip` fixed this for me. (not in tensorflow directly, but while installing something else on my M1 last week which required numpy)
OK, so we’re in mid-2021, why is installing Python THAT HARD? I think the only reason Node is so popular is because it JUST WORKS. Windows, Mac, doesn’t matter. One-click installer and you got NPM as well and access to thousands of packages.
As someone using nvm for work, I disagree. NPM can't be installed from normal package repositories, because it's outdated the moment any long-term support distribution accepts it. Then there's yarn, which is fighting for command line dependency management supremacy, with the exact same problem. I'm still not sure what npx does but I think it comes with NPM, unlike nvm which you use to manage NPM installs. I'm hoping I don't need to learn it because I expect some new javascript tool to replace NPM and yarn any day now, as those did bower and grunt before them.
I've also had to use nvm to install an old version of NPM for a specific project because otherwise one of the NPM dependencies couldn't compile a certain C++ executable that I apparently needed? There was also an incompatibility with some binary that another dependency downloaded that required me to mess with soft links to libraries in specific places.
I don't think either NPM or PIP are inherently hard to use as long as you keep them updated (which is exactly what the parent comment is suggesting to do) and as long as you don't need binary dependencies. When you end up in binary territory, which this type of software eventually will, you'll run head-first into stuff that requires arcane commands to get stuff to run.
While I don't think the problem is as bad as you describe, we should note that Node's initial release was in 2009, while python's was 1991. They pioneered a lot and Node was thus able to spring from quite hefty shoulders.
I consider myself a novice but competent python programmer (not particularly skilled or expert) and everytime i have to use something made in python i cringe because i know it will require 30 minutes of futzing with things to even run it.
Python’s ecosystem is the worst for get-up-and-go usage.
Nvm or nodenv make managing node environments trivial compared to virtualenv, pip, and co. Same with rustup or rbenv… in fact python is the only language i have the problem with.
i think the comparison with node is good, because my policy for both of them is the same - if i want to use either, i do it inside docker.
i can't be arsed to deal with all the various version dependencies and incompatabilities and system install vs local install nonsense that comes with installing it on my actual computer.
Now you can activate your tensorflow env at any time by running this:
source ~/tfenv/bin/activate
I alias this to `tf2` in my ~/.zprofile file:
alias tf2='source ~/tfenv/bin/activate'
Open up a new terminal and run `tf2`. Now you're in a clean virtualenv, with none of the conda BS. The nice thing about this venv is that if you already have some libraries installed, you can just `import` them. No need to reinstall them for every venv, which I quite like.
So, the goal is to install tensorflow-macos and tensorflow-metal, but the problem is that their pip3 command is failing with some obscure numpy error.
The way I arrived at that command was to run `pip3 install --no-dependencies tensorflow-macos tensorflow-metal`, open a python repl, and try 'import tensorflow as tf'. If it threw an error about package_foo, I added `package-foo` to the end of the command.
That method worked for tracking down every library except absl (unknown library name). But googling pip3 install absl showed that it was named absl-py, not absl.
and it shows I have both a CPU and GPU device! I'm really hyped about that. I've wanted tensorflow GPU on my laptop for... about two years? more?
Note that it spits out a warning like this:
WARNING:root:Limited tf.compat.v2.summary API due to missing TensorBoard installation.
I'm going to leave it as-is, until problems pop up for me. But if you want to try to address it, try adding tensorboard to that pip3 install command above, and repeat the process I described to install any other dependencies.
ERROR: Could not find a version that satisfies the requirement tensorflow-macos (from versions: none)
ERROR: No matching distribution found for tensorflow-macos
But it turns out that was using an x86_64 version of Python3 - setting up an arm64 version (under /opt/homebrew) worked.
Is “break pip if there is a newer version available”, at least last I checked. (It uninstalls the existing pip but can’t complete the install of the new one.)
I’ve used the latter several times a month over the last year or so, but only since about pip 19.x, didn’t use Python much before a couple of years ago.
The first time I did 'import tensorflow', the
import line ran for a long time and then complained about
flatbuffers being missing.
Then I ran several commands to install flatbuffers - all of which
I could have sworn claimed it was already there, but then I tried
the import again and it worked.
'import numpy' works too so the env feels complete enough.
My python binaries are coming from ~/miniforge3/bin
I would love to hear more from other devs regarding where we're at with M1 for development.
How is Node/React development, for example. Last I heard, Postgres was good to go on native M1. I know Node 14 is good natively. Are there commonly encountered problems? I hate my 2017 MBP and am desperate to upgrade.
I run everything python under Rosetta. Easiest way is to install home brew for Intel processors then make an alias to that version for managing rosetta stuff
You’re missing out on a gigantic speed boost. If you haven’t checked lately check again, most of the big libraries have been recompiled over the last few months (tensorflow excluded apparently).
Maybe, but lots and lots of software applications are not even close to being hardware constrained or in this specific case Rosetta constrained. If you're trying to get work done and you have the computational overhead available, this absolutely sounds like the right solution. Even if it's less efficient, I can easily imagine situations where "I need this to work without thinking about it" could win out.
Yeah I am not running code which is hardware constrained and numpy was a real doozy tto get working because of pep517. I found Rosetta pretty fast anyway. Certainly faster than my Ryzen 2600 on arch Linux at running the same code.
So basically the software that you require to run and do your work is still not available for Apple Silicon? In this case, Apple Silicon support for Python libraries.
The whole point of Apple Silicon is to supersede its Intel Mac counterparts and to run software natively with a noticeable performance increase, especially with software with high performance requirements.
In the case of python users and the ecosystem around it, it is still not ready.
There's no need to be so angry about a hypothetical situation in which you by a laptop that you never wanted. I just prefer simplicity right now. I'm a PhD student and I don't need the code to be super fastz just need it to work right now. As it stands j just got it all to run under native so I guess my slight modicum of patience you don't have lags dividends :)
> There's no need to be so angry about a hypothetical situation in which you by a laptop that you never wanted.
Nothing hypothetical about being an early adopter and then complaining and wasting months since the November 2020 release day chaos that several software is still unavailable or unstable, the very basic software doesn't even run on the system and the excessive disk writes on the M1 quickly wears the SSD on the machine. All completely real and happened to many people since launch day.
So, what's the point of buying a laptop that doesn't even run your software in the first place? Might as well stay on your existing laptop.
You would rather wait N amount of months for the software to mature and use the laptop reliably for your work than to skip all of that, use your existing laptop and get a better one? (M2 Mac).
Why would I want to wait for months for the developers to port their software and its library ecosystem to Apple Silicon or waste time with broken workarounds when I can use my existing laptop that already 'just works' with everything.
Mind you, I actually bought the M1 MBA recently to try it out and returned it at full price due to the software ecosystem not being ready. Not only I saved my money, I >5x'd that money it in a recent investment anyway and now I'm glad I did that.
I'm still trying to find a way to monitor the Neural Engine on my Macbook air M1, but the APIs are non-existent, there's barely anything in the docs and no answer from Apple. My models train fast, 3x faster than most i7 computers with GPU, which is excellent for a fanless ultraportable computer but I wish Apple would treat the NE as a 1st class citizen on these machines, with Mac SDK APIs and usage visualization in the Activity Monitor.
Can you back that statement up with anything, or at least clarify it? You seem to suggesting a non-mac i7 with a separate GPU. Also, just an FYI, "i7" says pretty much nothing. The i7s have existed since 2009.
I don't know. The statement is just so vague and ridiculous. The M1 is probably the worst hardware you could have picked in 2020-2021 if computational power was your main concern. For highly parallelizable work tasks, the top end GPU alone has 10x the computation power than the M1, and a top end CPU has around 4x the computation power than the M1. Not to mention a rather limiting 16GB of memory. That the M1 is computationally powerful is a myth started out by exceedingly misleading marketing and reinforced with hard-to-compare benchmarks.
To possibly save someone the trouble, the responses to a comment such as this, from experience will be:
1. "power consumption is much much better on M1 than anything else." True, but, then again, your use case must then prioritize power consumption, and not computation power. So which is it? The use case for a compute cluster on a train is rather contrived.
2. "When apple scales up the M1 to more cores, they will magically be able to retain all the benefits possible with a low core count, and scale it up without any problems or compromises, just you wait.". Ok, I'll wait.
3. "It's not fair to compare a laptop with a desktop". Of course it is. The constraints for comparison are already stated: computation power being the main priority. If someone buys hardware to do heavy computations, you can pick and chose depending on your needs. If you need it to be a laptop, or you need it to draw little power, then I'm sure you can factor this in accordingly.
He's not wrong though. I own an M1 Macbook Air, and while there are some workloads that it can outperform my desktop at, it's still not even close to the level of functionality or comparability of my other machines. Hell, most days I just end up tossing my Thinkpad in my work bag, just because the keyboard and OS gets in my way less.
> Then don't buy a bloody M1. The M1 has always been Apple's entry-level efficiency-first processor.
No, I gotta disagree. Apple's marketing around the M1 was intentionally deceptive: they were forced to revoke their claim of having the "fastest CPU cores" after it was vehemently disproven. Their "faster than 97% of Windows laptops" conveniently didn't compare itself to AMD laptops or laptops with dedicated graphics. I'm just not really impressed. I seriously worry for Apple if this is all they were able to get out of the 5nm node on ARM. Considering how poorly ARM scales with higher TDPs, I don't think I want to see their "Pro stuff".
Why should they be forced to compare their integrated graphics to laptops with dedicated graphics? This is yet another uninteresting comparison. Of course a machine with some dedicated mobile 3080 is going to win the day in a head to head. But it's an absurd comparison because said machine will draw vastly more power and be heavier to boot. It's a different category of machine at that point.
> I seriously worry for Apple if this is all they were able to get out of the 5nm node on ARM
I think you're the only one who's worried.
Apple very clearly laid out their design goals with the M1: supremacy at performance per watt. And the fanboys protested with "I can build a faster desktop" or "my 4.5 lb laptop with dedicated graphics can get more FPS." It's just baffling. You've missed the whole point.
> My point is that this isn't even an interesting conversation to have. Your desktop can outperform a chip that runs an iPad? Cool story bro.
The desktop I'm comparing it to is 7 years old and cost $600 new. You can't even buy an M1 iPad for that price today.
> Their announcement explicitly said "when it comes to low power silicon."
Which is an arbitrary goalpost that means nothing. The M1 uses 7w at full tilt, does that mean we can compare it to an AMD 5800u running at the same wattage? It's a nothingburger, and that's why Apple doesn't use that zinger anywhere else in their marketing material.
> Why should they be forced to compare their integrated graphics to laptops with dedicated graphics?
Because that's what you can buy for $1000. That's the performance standard. If Apple wanted the M1 to be compared to machines with integrated graphics, they should have released a computer at that price point.
> Apple very clearly laid out their design goals with the M1: supremacy at performance per watt.
Sure, they have it. But I frankly don't care, and I have a hard time believing that other people do too. Performance-per-watt is Apple's neat way of giving performance a denominator, because they simply can't compete with the rest of the industry wholesale. It's something they've done over and over, insisting on pointless metrics like thinness and beauty to measure a product of objective capability. The datacenter market is looking at the M1 and laughing. Unless your business was already Mac-based, it's not like enterprise customers are going to be interested in beta-testing Apple's new hardware either. Honestly, I'm more impressed with Apple's social engineering than hardware engineering here.
> The desktop I'm comparing it to is 7 years old and cost $600 new. You can't even buy an M1 iPad for that price today.
Which desktop is this with what components? What's the metric you're comparing? Show me the benchmarks. This still feels like the world's dumbest comparison but hey I'm at least curious now.
> Which is an arbitrary goalpost that means nothing.
But it's the goalpost they very clearly set--you're the one who either missed it or willfully chooses to ignore it, which means you're the one being misleading, not them.
> Because that's what you can buy for $1000. That's the performance standard. If Apple wanted the M1 to be compared to machines with integrated graphics, they should have released a computer at that price point
A pure price comparison on its own is missing so many relevant factors and you know it. You can also buy a desktop that outperforms a smartphone that costs the same. Is that a fair comparison? How much energy do these machines consume, what's their battery life under load, how much do they weigh, etc etc. I don't just walk into Best Buy and say "here's a $1,000 bring me the thing with the best Geek Bench score, no other criteria."
>The datacenter market is looking at the M1 and laughing
What are you even talking about now. Since when was the datacenter in the design goals for the M1.
>But I frankly don't care, and I have a hard time believing that other people do too
No one cares about battery life or the weight of their laptop? This is news to me. I have a feeling it's news to all the people who buy ultrabooks, too.
>Performance-per-watt is Apple's neat way of giving performance a denominator, because they simply can't compete with the rest of the industry wholesale.
Because they're not trying to outcompete the entire industry in every form factor with the M1, since the goal is to provide stellar performance while still achieving efficiency that enables small form factor designs. You are exhausting with this "willfully missing the whole point" thing.
Again, they haven't even released the parts that are actually meant to compete at the higher wattages you're trying to compare against the M1.
I'll make an analogy--maybe it'll help. Car Company A just came out with a brand new sedan with industry best Miles Per Gallon. And beyond just the MPG rating, it can actually tow some pretty heavy loads, too! Heavier than other sedans in its weight class. Impressive stuff.
You're the guy with the F-350 dually pumping his chest saying "mine can still tow more!"
You broke the site guidelines repeatedly in this thread. That's not cool. Would you please review https://news.ycombinator.com/newsguidelines.html and make your substantive points respectfully in the future? That means no name-calling, no personal attacks, and no swipes.
Please don't respond to someone breaking the site guidelines by breaking them yourself. I know that's hard when you're feeling provoked, but it only makes this place even worse.
> The M1 is probably the worst hardware you could have picked in 2020-2021 if computational power was your main concern. For highly > parallelizable work tasks, the top end GPU alone has 10x the computation power than the M1, and a top end CPU has around 4x the
> computation power than the M1. Not to mention a rather limiting 16GB of memory.
This statement is so banal that I am not sure how to comment on it. I never understood the logic of people who point out that an entry-level, low power chip targeted at ultraportable laptops and kitchen computers cannot compete with high-end desktops. It's just as ridiculous as to complain that AMD EPYC is too big to fit into a laptop.
M1 is obviously a terrible choice if you are looking for a deskbound HPC workstation. It's a terrific choice if you are looking for an ultraportable laptop with excellent battery life that you would still like to prototype your ML code on before running the full workload on a mainframe.
> That the M1 is computationally powerful is a myth started out by exceedingly misleading marketing and reinforced with hard-to-
> compare benchmarks.
More like a myth perpetuated by people who like to take facts out of the context. In terms of the underlaying IP, M1 is terrific technology. It can deliver the same performance as state of the art designs at a fraction of the power consumption. In terms of absolute performance, it's obviously an entry-level chip, and it performs exceedingly well compared to other offerings in this segment. And even outside its segment it is no slouch either. It runs my database scripts and builds code faster than my Intel i9 laptop, despite using 70% less power and half as many performance-oriented CPU cores.
> This statement is so banal that I am not sure how to comment on it. ...
> M1 is obviously a terrible choice if you are looking for a deskbound HPC workstation ...
So, in one breath you say my statement is banal, in the next you agree with it. The thing I take issue with is having to deal with the misconception that it is a replacement for a HPC workstation, because many (not you obviously) think it actually is a powerhouse.
You go on to repeat the stuff about power performance, which I've already listed, and you could have spared yourself the trouble.
If you thought a MacBook Air was going to replace a purpose-built machine learning workstation of course buying it would be a mistake, because it won’t do that. It only supports up to 16GB of RAM! But what other computer in that form factor comes close? The argument is that the higher efficiency will translate into more powerful chips in HEDT products too. I wouldn’t take that on faith but I think they have a decent chance of pulling it off. Apple doesn’t do transitions like this is if they don’t think they can knock them out of the park. There have been rumors about them switching the Mac to ARM for nearly 10 years at this point.
I don't know about the M1, to be honest, but I used to be an Apple customer and ardent fanboy (so embarrassing!) during the golden PPC age and remember very well Apple's inflated claims about performance, which all turned out to be false the minute they switched to Intel. So I agree with your comment, it's advisable to always take miraculous performance claims with a grain of salt.
They were misleading to false, depending on how you interpret it. The claims were true for certain specialized microbenchmarks and false in general. PPC programs running tasks that could be optimized for using Altivec instructions could be much faster than on comparable Intel chips, but those tasks were rare and normal programs were slower than on Intel.
Apple themselves changed their tune about the difference almost overnight.
Surely, with the M1s having been actually publicly available for many months now, there's enough data and benchmarks out there that we don't need to take Apple's performance claims with any salt?
I meant to write most i7 machines with a GPU where I trained my models. My post was about the lack of adequate SDKs for the NE on the M1. I don't know why the fuck you grabbed that tiny fraction and just ran off with it like I was writing an appleboy marketing manifesto on the M1 prowess.
I mean... you are doing what I find more than a few always seem to do on threads discussing the M1. Which is to subtly suggest that it is a computational powerhouse, with vague and non-verifiable metrics (your post that I originally replied to is a good example). Agreeing to disagree should be possible, and the request you got for a follow up to give some more specific details was polite, and should not have been that hard to deal with. It was also genuinely interested in an answer, so I question the appropriateness of the langue your responded with. And, I might also remind you that as far as answering the question, it did not.
If you haven't already, I'd suggest looking at https://news.ycombinator.com/newsguidelines.html. And I mean that too in a polite way, as it is easy to get carried away and assume the worst in anonymous conversations with strangers.
So, to summarize. Apple originally made highly misleading statements on the computational power of the M1, and since then, this misconception is repeated quite often. After a while, it gets annoying to see new posts on HN, week after week. Now, alluding that an M1 is 3x faster at training models than discrete GPUs can only be true if it is a decade old hardware. And as that stands, your original comment was asked to be clarified. After all, it's not unheard of for M1 benchmarks to disable the discrete GPU of the PC it is compared to, for "fairness". Limiting the training to a single CPU core for "even more fairness" would be par for the course.
So, I did not suggest you were an "appleboy", as you put it. I asked what kind of hardware you actually experienced a 3x speedup on, as the i7 goes back to 2009, and "GPU", well, further back.
In any case, and this might come as a surprise, feel free to not answer. But, then, I can only wish that you refrain from making the effort at a rude response.
> The constraints for comparison are already stated: computation power being the main priority
Who compares ultrabook to desktop for the multithreaded computation power? Assume you only want that without any regards to power or space or cost, by your logic you could buy 1000 mac mini and the computational power will be more than any desktop computer.
I was able to install this fairly easily (much more so then the crap they dumped out here - https://github.com/apple/tensorflow_macos. Just take a look at the 200 github issues that were ignored for the most part...)
I also noticed that in my project I got a decent speedup immediately when executing my model, but I have not run any benchmarks.
But, where do you go to file bugs? Ask questions? etc. I am not a big Mac developer, so is there something I don't know?
You can file bugs using the Feedback Assistant app in /System/Library/CoreServices/Applications, and you can ask questions on the Developer Forums at <https://developer.apple.com/forums/>. Both will require you to have a (free) Apple ID.
Anyone know if this shows up as an actual GPU device? The last tensorflow-macOS thing did not. If you list devices using that, you’ll see only one: CPU:0.
Does this give a GPU:0 device? You can check via:
import tensorflow as tf2
from pprint import pprint as pp
tf = tf2.compat.v1
sess = tf.InteractiveSession()
pp(sess.list_devices())
I’d check myself, but I’ve been so burnt by tensorflow 2 and M1 problems that I just don’t have the energy to figure out the inevitable compilation issues, and it sounds like at least one other person already has it running. Plus I’m on mobile.
Python 3.9.4 | packaged by conda-forge | (default, May 10 2021, 22:10:52)
[Clang 11.1.0 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow
And it was released almost a month ago with the CVEs saying "these fixes will be backported to previous branches that are still supported" but releases for those branches haven't happened yet, so if you're on an older version and just want to get the security fixes you have a problem.
It is much, much slower than a dedicated GPU or TPU, so it is useful for "edge" inference eg. local speech or image recognition, but not for training new models. It is using shared memory with the CPU so I don't know how much memory you could eat up while training.
In short, you will probably get much better performance using Google Colab to train a deep learning model, than the M1.
Installed these - which seemed to work bar a few messages about `numpy` not being installable - but trying to use `textgenrnn` ran into a whole bunch of Keras problems (for which the internet's answer is "use tensorflow.keras" expect `textgenrnn` is already doing that...)
I yearn for the day when someone makes a nice, simple, "install this and python ML works fine with your GPU" package.
Given what the difference in computation power is between a computer with top-end nvidia card and top end CPU, the numbers suggest that if you have a heavy work task that you on such a system, start up in the evening, say 8 PM on a Monday, and have ready the next morning at 8 AM. The same task would not be ready until the following Monday at 8 PM on the M1.
The M1 itself is an SoC. It's composed of parts, however, which can stack.
Eg., just look at the M1 variations present atm: variable gpu cores.
The "chiplet" speculation is that given the M1's variable high-perf, low-perf and cpu cores, this can scale up.
Leaks at least are all pointing in a 32-core, Xeon-competitor direction. It is theoretically possible they could do the same with GPU count, and try to compete perhaps in the mid-pro gfx range.
Basically all silicon works like this. "Variable cores" actually means some cores are disabled. This is usually done to increase overall yields: chips with damage inside one of the cores can still be binned as the lower core count SKUs.
The "you can scale up" thing is actually just how e.g. Intel makes bigger monolithic chips (Xeon/HEDT) with the same or very similar cores as the desktop ones. Meanwhile AMD, actually using chiplets, can cheaply do something more like "scale out" in the sense that they put more of the exact same die on a package.
No. Nvidia still has a massive software lead over Apple, and probably will continue to until Apple gives in and supports Vulkan on MacOS. Furthermore, I find Apple's GPU performance on 5nm to be pretty disappointing. It's pretty comparable to the x86 Zen 2 integrated graphics, which came out 18 months before the M1, on laptops half the price, on the 7nm node. I think Apple has a long ways to go in GPU engineering before they can even stand toe-to-toe with the industry.
Would nice to see performance comparisons on M1 Mac/iPad for which way is more performance and efficient. (Admittedly TF vs TF-lite is a 100% apple to apple comparison, pun-intended).
Just running native models on OpenCL has existed for a while now, through PlaidML and other companies. Unfortunately you need CUDA support for custom kernels for a lot of modern architectures like Transformers, and AMD's CUDA transpiler leaves a lot to be desired.
The raw compute power of M1's GPU seems to be 2.6 TFLOPS (single precision) vs 3.2 TFLOPS for Vega 20. This can give you an estimate of how fast it would be for training.
Just for reference Nvidia's flagship desktop GPU(3090)'s FP32 performance is 35.5 TFLOPS.