Hacker News new | past | comments | ask | show | jobs | submit login
Vgpu_unlock: Unlock vGPU functionality for consumer grade GPUs (github.com/dualcoder)
468 points by fragileone on April 9, 2021 | hide | past | favorite | 132 comments



> In order to make these checks pass the hooks in vgpu_unlock_hooks.c will look for a ioremap call that maps the physical address range that contain the magic and key values, recalculate the addresses of those values into the virtual address space of the kernel module, monitor memcpy operations reading at those addresses, and if such an operation occurs, keep a copy of the value until both are known, locate the lookup tables in the .rodata section of nv-kernel.o, find the signature and data bocks, validate the signature, decrypt the blocks, edit the PCI device ID in the decrypted data, reencrypt the blocks, regenerate the signature and insert the magic, blocks and signature into the table of vGPU capable magic values. And that's what they do.

I'm very grateful I wasn't required to figure that out.


I love the conciseness of this explanation. In just a few sentences, I completely understand the solution, but at the same time also understand the black magic wizardry that was required to pull it off.


Not to mention the many hours or days of being stumped. This sort of victory typically doesn't happen overnight.

What bugs me about companies like NV is that if they just sold their hardware and published the specs they'd probably sell more than with all this ridiculously locked down nonsense, it is just a lot of work thrown at limiting your customers and protecting a broken business model.


Is a “broken business model” one that requires you to pay for extra additional features?

If Nvidia enabled all their professional features on all gaming SKUs, the only reason to buy a professional SKU would be additional memory.

Today, they make almost $1B per year in the professional non-datacenter business alone. There is no way they’d be able to compensate that revenue with volume (and gross margins would obviously tank as well, which makes Wall Street very unhappy.)

That’s obviously even more so in today’s market conditions.

Do you feel it’s justified that you have to pay $10K extra for the self-driving feature on a Tesla? Or should they also be forced to give away that feature for free? After all, it’s just a SW upgrade. (Don’t mistake this for an endorsement...)


> Do you feel it’s justified that you have to pay $10K extra for the self-driving feature on a Tesla? Or should they also be forced to give away that feature for free? After all, it’s just a SW upgrade.

I feel like i already paid for the hardware. If telsa says it's cheaper for them to stick the necessary hardware into every car, i'm still paying for it if i buy one without self-driving. Thus if i think the tesla software isn't worth 10k and i'd rather use openpilot, i feel like i should have the right to do that.

But nvidia is also actively interfering with open source drivers (nouveau) with signature checks etc.


(Leaving aside that Tesla also actively prevents you from running your own software on its platform...)

The whole focus on hardware is just bizarre.

When you buy a piece of SW that has free features but requires a license key to unlock advanced features, everything is fine, but the moment HW is involved all of that flies out of the window.

Extra features cost money to implement. Companies want to be paid for it.

A company like Nvidia could decide to make 2 pieces of silicon, one with professional features and once without. Or they could disable it.

Obviously, you’d prefer the first option, even if it absolutely makes no sense to do so. It’d be a waste of engineering resources that could have been spent on future products.

Deciding to disable a feature on a piece of silicon is no different than changing a #define or adding “if (option)” to disable an advanced feature.

By doing so, I have the option to not pay for an advanced feature that I don’t need.

I don’t want the self-driving option in a Tesla and I’m very happy to have that option.


I agree with you, but I also think the commenter you're replying to agrees with you.

The issue is not that Tesla FSD should come with the hardware, the issue is that if I buy the hardware I should have the right to do whatever I want with it, and so we shouldn't leave aside that Tesla prevents us from running our own software.

This is relevant to the NVidia situation since their software doesn't add features, it limits things the chip is already capable of. Just like Tesla won't let you run Comma.AI or something similar on their hardware...


>When you buy a piece of SW that has free features but requires a license key to unlock advanced features, everything is fine, but the moment HW is involved all of that flies out of the window.

This doesn't make sense at all. In your scenario you always pay for what you get. and developing additional features has a non-zero cost asociated with it (Unless you download software which targets unskilled consumers, Like Chessbase's Fritz engine which was essentially stockfish but 100$ instead of 0)

>Extra features cost money to implement. Companies want to be paid for it.

This doesn't make sense in your scenario either. You already have the sillicon with the 'advanced features' in your hands. The reason they lock the feature is so that you have to buy a more expensive card, with overpowered hardware that you dont need, in order to use a feature that all cards have if it weren't disabled. The only reasonable explanation you could have at this point that doesn't involve monopolistic practices to make more money (Nothing wrong with that) is that the development of the feature itself was so prohibitively expensive that it required consumers to pay for much higher margin cards in order to offset the development costs. Which is what's happening

>A company like Nvidia could decide to make 2 pieces of silicon, one with professional features and once without. Or they could disable it.

That would cost alot of money. all the more reasons why it might have been done to upsell more cards instead of offering quantitative improvements for a different price.


But it's not like buying a software key, is it? I want to play Windows games in a VM, not compute protein folding or whatever. I'd pay 15€ more for a 1060 to unlock advanced virtualization features, but why should I have to pay 600€ more to get a Quadro just for that one feature?


Yes, NVIDIA is actively malicious on that front.


> Is a “broken business model” one that requires you to pay for extra additional features?

Yes. Who actually likes being segmented into markets? We want to pay a fair price for products instead of being exploited.

> If Nvidia enabled all their professional features on all gaming SKUs, the only reason to buy a professional SKU would be additional memory.

So what? A GPU is a GPU. It's all more or less the same thing. They would not have to lock down hardware features otherwise.

> Today, they make almost $1B per year in the professional non-datacenter business alone. There is no way they’d be able to compensate that revenue with volume (and gross margins would obviously tank as well, which makes Wall Street very unhappy.)

Who cares really. Pursuit of profit does not excuse bad behavior. They should lose money every time they do it.


Having to pay more for product than what you’re willing to pay is exploitation.


More like selling me hardware with a built-in limiter that doesn't go away unless I pay more and they flip the "premium customer" bit.


Are you against buying a license to unlock advanced software features as well, or do you have the same irrational belief that only products that include a HW component shouldn’t be allowed to charge for advanced features?

Would you prefer if companies made 2 separate pieces of silicon designs, one with virtualization support in HW and one without, even if it would reduce their ability to work on advancing the state of the art due to wasted engineering resources?

Or would you prefer that all features are enabled all the time, but with the consequence that prices are raised by, say, 10% for everyone, even though 99% of customers don’t give a damn about these extra features?


> Are you against buying a license to unlock advanced software features as well

I'm against "licenses" in general. If your software is running on my computer, I make the rules. If it's running on your server, you make the rules. It's very simple.

> do you have the same irrational belief that only products that include a HW component shouldn’t be allowed to charge for advanced features?

When I buy a thing, I expect it to perform to its full capacity. Nothing irrational about that.

> Would you prefer if companies made 2 separate pieces of silicon designs, one with virtualization support in HW and one without

Sure. At least then there would be real limitation rather than some made up illusion.

> even if it would reduce their ability to work on advancing the state of the art due to wasted engineering resources?

The real waste of engineering resources is all this software limiter crap. They shouldn't even be writing drivers in the first place. They're a hardware company, they should be making hardware and publishing documentation. Instead they're locking out open source developers, adding DRM to their cards and blocking workloads they don't like.

> Or would you prefer that all features are enabled all the time, but with the consequence that prices are raised by, say, 10% for everyone, even though 99% of customers don’t give a damn about these extra features?

That is how things are supposed to work, yes.


[flagged]


Not really? I don't really care how much money they burn on useless stuff. You brought up misuse of engineering resources so I pointed out the fact they didn't actually have to write any software. All they have to do is release documentation and the problem will take care of itself.


> If Nvidia enabled all their professional features on all gaming SKUs, the only reason to buy a professional SKU would be additional memory.

> Today, they make almost $1B per year in the professional non-datacenter business alone. There is no way they’d be able to compensate that revenue with volume (and gross margins would obviously tank as well, which makes Wall Street very unhappy.)

You're looking at it wrong. If Nvidia were to enable all features on their hardware, they wouldn't be giving up that additional revenue, they would instead have to create differentiated hardware with and without certain features.

Their costs would increase somewhat (as currently their professional SKUs enjoy some economies of scale by virtue of being lumped in with the higher-volume gaming SKUs), but it would hardly be the catastrophe you're describing. The pro market is large enough to enjoy it's own economies of scale, even if the hardware wasn't nearly identical (which it still would be).


But they’d also sells fewer high end models. I don’t doubt that they’ve done the math.


> they'd probably sell more than with all this ridiculously locked down nonsense

It's currently impossible to find any nVidia GPU in stock because the demand far outstrips the supply.

Market segmentation is only helping their profit margins, not hurting it.


I have a feeling that they've done the math and have realized what makes them the most money.


Their business model is not broken. Not yet. With hardware unlocking software like vgpu_unlock, we can break it.


Count on driver updates breaking this workaround


If they break it, the people who really need this feature will simply not upgrade. Companies run decades old software all the time, this isn't going to be any different. It's just like nvidia's ridiculous blocking of cryptocurrency mining workloads. Once the fix is out, it's over.

Also I have no doubt people will find other ways to unlock the hardware.


stuff like video games often require driver updates to function, which is a major use case of this hack. Not to mention older nvidia drivers do not support newer linux kernels.


> Count on driver updates breaking this workaround

If the workaround results in enough money being left on the table, this might prompt 3rd party investment in open source drivers in order to keep the workaround available by eliminating the dependence on Nvidia's proprietary drivers.


Also, I wonder what kinds of skills are required to figure this out? I don't think just knowing Linux kernel internals would be enough.


The actual "trick" behind this is well known and has been done for quite some time. One could actually solder in a different device id in the GTX 6xx series [0] or flash a different VBIOS on the GTX 5xx ones. The real achievement here is implementing this in software without touching the GPU.

This is not to downplay the OP, of course - this is truly great and I'm sure it was a lot of work. But the hardware part is not new.

[0] https://web.archive.org/web/20200814064418/https://www.eevbl...


Amazing! Simply amazing!

This not only enables the use of GPGPU on VMs, but also enables the use of a single GPU to virtualize Windows video games from Linux!

This means that one of the major problems with Linux on the desktop for power users goes away, and it also means that we can now deploy Linux only GPU tech such as HIP on any operating system that supports this trick!


> This means that one of the major problems with Linux on the desktop for power users goes away, and it also means that we can now deploy Linux only GPU tech such as HIP on any operating system that supports this trick!

If you're brave enough, you can already do that with GPU passthrough. It's possible to detach the entire GPU from the host and transfer it to a guest and then get it back from the guest when the guest shuts down.


This could be way more practically useful than GPU passthrough. GPU passthrough demands at least two GPUs (an integrated one counts), requires at least two monitors (or two video inputs on one monitor), and in my experience has a tendency to do wonky things when the guest shuts off, since the firmware doesn't seem to like soft resets without the power being cycled. It also requires some CPU and PCIe controller settings not always present to run safely.

This could allow a single GPU with a single video output to be used to run games in a Windows VM, without all the hoops that GPU passthrough entails. I'd definitely be excited for it!


> GPU passthrough demands at least two GPUs

It doesn't. As I said, you can detach the GPU from the host and pass it to the guest and back again. I elaborated a bit more in another comment [0].

> This could be way more practically useful than GPU passthrough.

I think that depends on the mechanics of how it works. How exactly do you get the "monitor" of the vGPU?

[0]: https://news.ycombinator.com/item?id=26755390


You can use "Looking Glass" to access the screenbuffer of the vGPU. Basically, directly read the memory of the guest OS.


It only requires 2 GPUs if you plan on using Linux GUI applications as you game on Windows. Besides, any shared single GPU solution is going to introduce performance overhead and display latency, both of which are undesired for gaming. Great for non-gaming things though - but generally you don't need Windows for those anyways.


> an integrated one counts

From experience not always. If the dedicated GPU gets selected as BIOS GPU then it might be impossible to reset it properly for the redirect. I had this problem with 1070.

I have to say vGPU is amazing feature, and this possibly brings it to "average" user (as average user doing GPU passthrough can be).


Certainly, but this requires both BIOS/UEFI fiddling and it also means you can't use both Windows and Linux at the same time, which is very important for me.


I run gentoo host (with dual monitors) and a third monitor on a separate GPU for windows. I bought a laptop with discrete and onboard GPUs, and discovered that the windows VM now lives in msrdp.exe on the laptop, rather than physically interacted with keyboard and mouse. i still can interact with the VM if there's some game my laptop chokes on, but so far it's not worth the hassle for the extra 10% framerate. It's amusing because my laptop has 120hz display, so "extra 10% FPS" would be nice on the laptop but hey, we're not made of money over here.

Oh, i got sidetracked. I have a kernel command line that invokes IOMMU and "blacklists" the set of PCIE lanes that GPU sits on, the kernel never sees it, even when its in use. The next thing that i had to do was set up a vfio-bind script, that just tells qemu what GPU it's going to use. Thirdly, and this is the unfortunate part, since i forgot exactly what i did - there's some weirdness with windows in qemu with a passthru GPU - you have to registry hack some obscure stuff in to the way windows handles the GPU memory.

If i am not mistaken, 95% of all of my issues were solved by reading the ArchLinux documentation for qemu host/guests. My system is ryzen 3600, 64GB of ram, 2x NVME drives + one M.2 Sata drive, a gtx 1060 and a gtx 1070. Gentoo gets 16GB of ram (unless i need more, i just shut down windows or reset the guest memory) and the 1060. Windows gets ~47GB of ram, and the 1070, a wifi card, and a USB sound card. One of the things you quickly realize with guests on machines like this is that consumer grade motherboards and CPUs are garbage, there aren't enough PCIe lanes to, say, passthrough a bunch of USB or SAS/SATA ports, or a dedicated PCIe soundcard, or firewire. If you have an idea that you'd really like to try this out as an actual "desktop replacement" - especially for replacing multiple desktops, i recommend going to at least a threadripper, as those can expose like 4-6 times as many PCIe lanes to the host OS, meaning the possibility of multiple guests on multiple GPUs, or a single "redundant" guest, with USB ports, SATA ports, and pcie sound/firewire/whatever.

Why would anyone do this? dd if=/dev/sdb of=/mnt/nfs/backups/windows-date.img . Q.E.D.


I built a PC with two decent GPUs with the intention of doing this (one GPU for windows in the VM, one for Linux running on the host). It works great performance-wise but any game with anti-cheat will be very unhappy in a VM. I tried various workarounds which work to varying degrees but ultimately it’s a huge pain.


> Amazing! Simply amazing!

If it's such a cool feature, why does NVidia lock it away non-Tesla H/W?

[EDIT]: Funny, but the answers to this question actually provide way better answers to the other question I posted in this thread (as in: what is this for).


Entirely for market segmentation. The ones they allow it on are much more expensive. With this someone could create a cloud game streaming service using normal consumer cards and dividing them up for a much cheaper experience than the $5k+ cards that they currently allow it on. The recent change to allow virtualization at all (removing the code 43 block) does allow some of that, but does not allow you to say take a 3090 and split it up for 4 customers and get 3060-like performance for each of them for a fraction of the cost.


The RTX A6000 is at USD 4650, with 48GB of VRAM and the full chip enabled (+ECC, vGPU, pro drivers of course)

The RTX 3090, with 24GB of VRAM is at USD 1499.

Customer dGPUs from other HW providers do not have virtualisation capabilities either.


Well, I believe intel has it on iGPUs just very well hidden



I am interested in the recent change you are referring to. Is there a good article on how to use it on Windows or at least Linux?


The OP is referring to GPU passthrough setup[1], which passes through a GPU from Linux host to Windows guest (e.g. for gaming). This is done by detaching the GPU from the host and pass it to the VM, thus most setup requires two GPUs since one need to remain with the host (although single GPU passthrough is also possible).

Nvidia used to detect if the host is a VM and return error code 43 blocking them from being used (for market segmentation between GeForce and Quadro). This is usually solved by either patching VBIOS or hiding KVM from the guest, but it was painful and unreliable. Nvidia removed this limitation with RTX 30 series.

This vGPU feature unlock (TFA) would allow GPU to be virtualized without requiring the GPU to first be detached from the host, vastly simplify the setup and open up the possibility of having multiple VMs running on a single GPU, all with its own dedicated vGPU.

[1]: https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVM...


Because otherwise, people would be able to use non-Tesla GPUs for cloud compute workloads, drastically reducing the cost of cloud GPU compute, and it would also enable the use of non-Tesla GPUs as local GPGPU clusters - additionally reducing workstation GPU sales due to more efficient resource use.

GPUs are a duopoly due to intellectual property laws and high costs of entry (the only companies I know of that are willing to compute are Chinese and only a result of sanctions), so for NVidia this just allows for more profit.


Interestingly, Intel is probably the most open with its GPUs, although it wasn't always that way; perhaps they realised they couldn't compete on performance alone.


Openness usually seems to be a feature of the runners up.


I think AMD is on par with Intel, no?


AMD do have great open source drivers, but they have longer lag behind with their code merges compared to Intel. Also at least a while ago their open documentation was quite lacking for newer generations of GPUs.


trivial arithmetic will tell you it's not the cost of the hardware that makes AWS and Azure GPU instances expensive.


Yeah, but now the comparison for many companies (e.g. R&D dept. is dabbling a bit in machine learning) becomes "buy one big box with 4x RTX 3090 for ~$10k and spin up VMs on that as needed", versus the cloud bill. Previously the cost of owning physical hardware with that capability would be a lot higher.

This has the potential to challenge the cloud case for sporadic GPU use, since cloud vendors cannot buy RTX cards. But it would require that the tooling becomes simple to use and reliable.


Certainly, and both AWS, GCP and Azure even on CPU are much beyond simply hardware cost - there are hosts that are 2-3x cheaper for most uses with equivalent hardware resources.


Ngreedia - the way it's meant to be paid™


To make people pay more.


Nvidia sells an ever greater percentage of their sales to the data-center market, and consumers purchase a shrinking portion. They do not want to flatten their currently upward trending data-center sales of high-end cards.

NVIDIA's stock price has doubled since March 2020, and most of these gains can be largely attributed to the outstanding growth of its data center segment. Data center revenue alone increased a whopping 80% year over year, bringing its revenue contribution to 37% of the total. Gaming still contributes 43% of the company's total revenues, but NVIDIA's rapid growth in data center sales fueled a 39% year-over-year increase in its companywide first-quarter revenues.

The world's growing reliance on public and private cloud services requires ever-increasing processing power, so the market available for capture is staggering in its potential. Already, NVIDIA's data center A100 GPU has been mass adopted by major cloud service providers and system builders, including Alibaba (NYSE:BABA) Cloud, Amazon (NASDAQ:AMZN) AWS, Dell Technologies (NYSE:DELL), Google (NASDAQ:GOOGL) Cloud Platform, and Microsoft (NASDAQ: MSFT) Azure.

https://www.fool.com/investing/2020/07/22/data-centers-hold-...


While this is definitely welcome news, GPU VFIO passthrough has been possible for awhile now. I've been playing games on my windows VM + linux host for a few years at least. 95% native performance without needing to dual boot has been a game-changer (heh).


What is your setup like?

I’m planning on switching to VFIO for my next rebuild, and was curious as to how stable the setup was.


Could you share your configuration?


For virtualized Windows from Linux, check out Looking Glass which I posted about previously

https://news.ycombinator.com/item?id=22907306


That requires two GPUs.


Looking Glass is agnostic to the hardware setup. it works both with pci passthrough and with intel gvtg (single gpu sliced into vgpus)


Dual booting is for chumps. If I could run a base Linux system and arbitrarily run fully hardware accelerated VMs of multiple Linux distros, BSDs and Windows, I'd be all over that. I could pretend here that I really need the ability to quickly switch between OSes, that I'd like VM-based snapshots, or that I have big use cases to multiplex the hardware power in my desktop box like that. I really don't need it. I just want it.

I really hope Intel sees this as an opportunity for their DG2 graphics cards due out later this year.

If anyone from Intel is reading this: if you guys want to carve out a niche for yourself, and have power users advocate for your hardware - this is it. Enable SR-IOV for your upcoming Xe DG2 GPU line just as you do for your Xe integrated graphics. Just observe the lengths that people go to for their Nvidia cards, injecting code into their proprietary drivers just to run this. You can make this a champion feature just by not disabling something your hardware can already do. Add some driver support for it in the mix and you'll have an instant enthusiast fanbase for years to come.


I've been running proxmox. I haven't run windows, but I have ubuntu vm's with full hardware gpu passthrough. I've passed through nvidia and intel gpus.

I also have a macos vm, but I didn't set up gpu passthrough for that. Tried it once, it hung, didn't try it again. I use remote desktop anyway.

here are some misc links:

https://manjaro.site/how-to-enable-gpu-passthrough-on-proxmo...

https://manjaro.site/tips-to-create-ubuntu-20-04-vm-on-proxm...

https://pve.proxmox.com/wiki/Pci_passthrough

https://blog.konpat.me/dev/2019/03/11/setting-up-lxc-for-int...


>macOS VM

What is the current licensing situation on this? Can I use it legally to build software for Mac?


Passthrough is workable right now. It’s a pain to get set up, but it is workable.

You don’t need vgpu to get the job done. I’ve had two set ups over time: one based on a jank old secondary gpu that is used by the vm host, another based on just using the jank integrated graphics on my chip.

Even still, I dual boot because it just works. It always works, and boot times are crazy low for Windows these days. No fighting with drivers. No fighting with latency issues for non-passthrough devices. It all just works.


Oh I'm aware of passthrough. It's just a complete second class citizen because it isn't really virtualization, it's a hack. Virtualization is about multiplexing hardware. Passthrough is the opposite of multiplexing hardware: it's about yanking a peripheral from your host system and shoving it into one single guest VM. The fact that this yanking is poorly supported and has poor UX makes complete sense.

I consider true peripheral multiplexing with true GPU virtualization to be the way of the future. It's true virtualization and doesn't even require you to sacrifice and/or babysit a single PCIe connected GPU. Passthrough is just a temporary hacky workaround that people have to apply now because there's nothing better.

In the best case scenario - with hardware SR-IOV support plus basic driver support for it, enabling GPU access in your VM with SR-IOV would be a simple checkbox in the virtualization software of the host. GPU passthrough can't ever get there in terms of usability.


I guess I don't really see the benefit of "true" virtualization, other than it's usability improvements. I generally want to squeeze out ever bit of performance out of the GPU if I care to share it with the guest at all (at least on my home machine). I'd be using it for playing games.

For the cloud, I could imagine wanting vGPUs so you can shard the massive GPUs that are used there. But in cloud, you would then have a single device be multi-tenant, which is a bit spicy security wise. Passthrough has a very straightforward security model.


SRIOV would allow anyone with a single gpu to be able to get into virtualization. It would eliminate the single biggest barrier to entry in the space.


I have a Quadro card and at least for Windows guests I can easily move the card between running guests (Linux has some problems with yanking though). Still, virtualized GPUs would be nice.


It works with some cards, not with others. Eg. for Radeon Pro W5500 there's no known card reset method that works (no method from https://github.com/gnif/vendor-reset works) so I had to do S3 suspend before running a VM with systemctl suspend or with rtcwake -m mem -s 2

Now I have additional RTX 2070 and it works ok.


passthrough has become very easy to set up, just add your pci card in virt-manager and away you go

saying that, these days I just have a second pc with a load of cheap USB switches...


Given that I use my desktop 90% of the time remotely these days, I'm going to set this up next time I'm home and move my Windows stuff into a VM. Then I can run Docker natively on the host and when Windows stops cooperating, just create a new VM (which I can't do remotely with it running on bare metal, at least without the risk of it not coming back up).


Related but different:

- nvidia-patch [0] "This patch removes restriction on maximum number of simultaneous NVENC video encoding sessions imposed by Nvidia to consumer-grade GPUs."

- About a week ago "NVIDIA Now Allows GeForce GPU Pass-Through For Windows VMs On Linux" [1]. Note, this is only for the driver on Windows VM guests not GNU/Linux guests.

Hopefully the project in the OP will mean that GPU access is finally possible on GNU/Linux guests on Xen, thank you for sharing OP.

[0]: https://github.com/keylase/nvidia-patch

[1]: https://www.phoronix.com/scan.php?page=news_item&px=NVIDIA-G...


The Python script actually mostly uses Frida (https://frida.re/) scripting. I haven't seen Frida before, but this looks very powerful. I did some similar (but very basic) things with GDB/LLDB scripting before but Frida seems to be done for exactly things like this.


If anyone needs a list of currently supported CPUs, you can find it in the source code:

https://github.com/DualCoder/vgpu_unlock/blob/master/vgpu_un... (the comments behind the device ids)


One thing I want to figure out (because I don't have a dedicated Windows gaming desktop), and the documentation on the internet seems sparse: it is my understanding that if I want to use PCIe passthrough with Windows VM, these GPUs cannot be available to the host machine at all, or technically it can, but I need to do some scripting to make sure the NVIDIA driver doesn't own these PCIe lanes before open Windows VM and re-enable it after shutdown?

If I go with vGPU solution, I don't need to turn on / off NVIDIA driver for these PCIe lanes when running Windows VM? (I won't use these GPUs on host machine for display).


> One thing I want to figure out (because I don't have a dedicated Windows gaming desktop), and the documentation on the internet seems sparse: it is my understanding that if I want to use PCIe passthrough with Windows VM, these GPUs cannot be available to the host machine at all, or technically it can, but I need to do some scripting to make sure the NVIDIA driver doesn't own these PCIe lanes before open Windows VM and re-enable it after shutdown?

The latter statement is correct. The GPU can be attached to the host but it has to be detached from the host before the VM starts using it. You may also need to get a dump of the GPU ROM and configure your VM to load it at start up.

Regarding the script, mine resembles [0]. You need to remove the NVIDIA drivers and then attach the card to VFIO. And then the opposite afterwards. You may also need to image your GPU ROM [1]

[0]: https://techblog.jeppson.org/2019/10/primary-vga-passthrough...

[1]: https://clayfreeman.github.io/gpu-passthrough/#imaging-the-g...


Exactly. With GPU virtualization the driver is able to share the GPU resources with multiple systems such as the host operating system and guest virtual machine. Shame on nvidia for arbitrarily locking us out of this feature.


Got some time to try this now. It worked as expected, I have vgpu_vfio. However, it doesn't perfectly fit my needs. Particularly, my host system is "heavy", I need it to run CUDA etc, while the VM just to run games. However, it seems the 460.32.04 driver on host doesn't have full functionality, hence, cannot run CUDA on the host any more.


Is there info on this sort of usage? I'd love to use the host for NVENC and a VM guest for traditional GPU stuff, but haven't been able to find anything on doing that.


There's a lot of customer loyalty on the table waiting for the first GPU manufacturer to unlock this feature on consumer grade cards without forcing us to resort to hacks.


Not many people are foregoing a GPU given this limitation, though, except for maybe miners (which will virtualize and pirate/never activate Windows if they really need Windows).


that equilibrium only holds as long as your competitor doesn't offer it.


Is this for SR-IOV? It's too bad SR-IOV isn't supported on regular desktop AMD GPUs for example in the Linux driver.


Yes, this is basically NVidia's SR-IOV.


This is a dumb question, but which hypervisor configuration is this targeted towards to.

There's a lot of detail on the link which I appreciate but maybe I missed it.


KVM, maybe also Xen.


I built an "X gamers/workstations 1 CPU" type-build last year and this has been the main problem, I have two GPUs, one of which is super old and I have to choose which one I want to use when I boot up a VM.

Will definitely be checking this out!


The number of technology layers one must understand and control to make GPU draw what VM wants is frankly insane. Hypervisor channels, GPU driver code, graphics server API, graphics toolkit libraries - all these have several variants. It seems close to impossible to get anything done in this space.

I just want to send drawing commands from a node over network to another node with a GPU. Like "draw a black rectangle 20,20,200,200 on main GPU on VM at 192.168.1.102".

How would I do that in the simplest possible way? Is there some network graphics command protocol?

Like X11, but simpler and faster, just the raw drawing commands.


I wish Nvidia would open this up properly. The fact that intel integrated gpus can do GVT-G and I literally can’t buy a laptop which will do vgpu passthrough with an Nvidia card for any amount of money is infuriating.


GVT-g is gone on 10th-gen GPUs (Ice Lake) and later. Not supported on Intel dGPUs either.


Oh wow. Had no idea and I was explicitly planning my next server to be Intel because of the GVT-g. Why did they abandon it?


More like did not prioritize it.

https://github.com/intel/gvt-linux/issues/126


If - like me - you don't have a clue what vGPU is:

https://www.nvidia.com/en-us/data-center/virtual-solutions/

TL;DR: seems to be something useful for deploying GPUs in the cloud, but I may not have understood fully.


It instantiates multiple logical PCI adaptors for a single physical adaptor. The logical adaptors can then be mapped into VMs which can directly program a hardware-virtualized view of the graphics card. Intel has the same feature in their graphics and networking chips


Thanks for the explanation, but that's more of a "this is how it works" than a "this is why it's useful".

What would be the main use case?


Same as any hypervisor/virtual machine setup. Sharing resources. You can build 1 big server with 1 big GPU and have multiple people doing multiple things on it at once, or one person using all the resources for a single intensive load.


Thanks, this is a concise answer.

However, I was under the impression - at least on Linux - that I could run multiple workloads in parallel on the same GPU without having to resort to vGPU.

I seem to be missing something.


You can, but only directly under that OS. If you wanted to run, say, a Windows VM to run a game that doesn't work in Wine, you'd need some way to give a virtual GPU to the virtual machine. (As it is now, the only way you'd be able to do this is to have a separate GPU that's dedicated to the VM and pass that through entirely.)


In addition to the answer by skykooler, virtual GPUs also allow you to set hard resource limits (e.g., amount of L2 cache, number of streaming multiprocessors), so different workloads do not interfere with each other.


If you are running Linux in a VM, vGPU will allow acceleration with OpenGL, WebGL, Vulcan applications like games, CAD, CAM, EDA, for example.


This[1] may help.

What you're saying is true, but it's generally using either the API remoting or device emulation methods mentioned on that wiki page. In those cases, the VM does not see your actual GPU device, but emulated device provided by the VM software. I'm running Windows within Parallels on a Mac, and here[2] is a screenshot showing the different devices each sees.

In the general case, the multiplexing is all software based. The guest VM talks to the an emulated GPU, the virtualized device driver then passes those to the hypervisor/host, which then generates equivalent calls on to the GPU, then back up the chain. So while you're still ultimately using the GPU, the software-based indirection introduces a performance penalty and potential bottleneck. And you're also limited to the cross-section of capabilities exposed by your virtualized GPU driver, hypervisor system, and the driver being used by that hypervisor (or host OS, for Type 2 hypervisors). The table under API remoting shows just how varied 3D acceleration support is across different hypervisors.

As an alternative to that, you can use fixed passthrough to directly expose your physical GPU to the VM. This lets you tap into the full capabilities of the GPU (or other PCI device), and achieves near native performance. The graphics calls you make in the VM now go directly to the GPU, cutting out game of telephone that emulated devices play. Assuming, of course, your video card drivers aren't actively trying to block you from running within a VM[3].

The problem is that when a device is assigned to a guest VM in this manner, that VM gets exclusive access to it. Even the host OS can't use it while its assigned to the guest.

This article is about the fourth option – mediated passthrough. The vGPU functionality enables the graphics card to expose itself as multiple logical interfaces. So every VM gets its own logical interface to the GPU and send calls directly to the physical GPU like it does in normal passthrough mode, and the hardware handles the multiplexing aspect instead of the host/hypervisor worrying about it. Which gives you the best of both worlds.

[1] https://en.wikipedia.org/wiki/GPU_virtualization

[2] https://imgur.com/VMAGs5D

[3] https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVM...


The use case is allowing the host system and VM(s) to access the same GPU at the same time.


You have a Linux box but you want to play a game and it doesn't work properly under Proton, so you spin up a Windows VM to play it instead.

The host still wants access to the GPU to do stuff like compositing windows and H.265 encode/decode.


And outputting anything to the screen in general. Usually, your monitor(s) are plugged into the ports on the GPU.


Yeah, I got that from the technical explanation.

What's the practical use case, as in, when would I need this?

[EDIT]: To maybe ask a better way: will this practically help me train my DNN faster?

Or if I'm a cloud vendor, will this allow me to deploy cheaper GPU for my users?

I guess I'm asking about the economic value of the hack.


> To maybe ask a better way: will this practically help me train my DNN faster?

Probably not. It will only help you if you previously needed to train it on a CPU because you were in a VM, but this seems unlikely. It will not speed up your existing GPU in any way compared to simply using it bare-metal right now.

> Or if I'm a cloud vendor, will this allow me to deploy cheaper GPU for my users?

Yes. This ports a feature from the XXXX$-range of GPUs to the XXX$-range of GPUs. Since the performance of those is similar or nearly similar, you can save a lot of money this way. It will also make the entry costs to the market lower (i.e. now a hypervisor could be sub-1k$, if you go for cheap parts).

On the other hand, a business selling GPU time to customer will probably not want to rely on a hack (especially since there's a good chance it's violating NVidias license), so unless you're building your on HW, your bill will probably not drop. But if you're an ML startup or a hobbyist, you can now cheap out on/actually afford this kind of setup.


Running certain ML models in VMs

Running CUDA in VMs

Running transcoders in VMs

Running <anything that needs a GPU> in VMs


This is the exact same information you posted above.

Please see my edit.


4 people sharing 1 CPU and 1 GPU that is running a hypervisor with separate installations of windows for gaming

Basically any workload that requires sharing a GPU between discrete VMs


At the risk of piling on without value: "Amazing, simply amazing".

I've been (more or less) accused of being an Nvidia fanboy on HN previously but this is an area where I've always thought Nvidia has their market segmentation wrong. Just wrong.

This is great work. (Period, as in "end of sentence, mic drop").


Wow, does this mean NVIDIA consumer cards can now do multi-VM multiplexing of single GPU? On AMD side, only special cards like FirePro S7150 can do this.

Does this work also on Xen? NVIDIA drivers were always non-functional with Xen+consumer cards.


For virtualized Windows from Linux, check out Looking Glass which I posted about previously

https://news.ycombinator.com/item?id=22907306


To me this is laughably naive question, but I ask it any way.

My understanding is that CPU/GPU per application can make only single draw call in sequential manner. (eg. CPU->GPU->CPU->GPU)

Could vgpu's be used for concurrent draw calls from multiple processes of an single application ?


> My understanding is that CPU/GPU per application can make only single draw call in sequential manner.

The limitation you're probably thinking of is in the OpenGL drivers/API, not in the GPU driver itself. OpenGL has global (per-application) state that needs to be tracked, so outside of a few special cases like texture uploading you have to only issue OpenGL calls from one thread. If applications use the lower-level Vulkan API, they can use a separate "command queue" for each thread. Both of those are graphics APIs, I'm less familiar with the compute-focused ones but I'm sure they can also process calls from multiple threads.


My primitive thoughts:

Threaded Computation on CPU -> Single GPU Call -> Parallel Computation on GPU -> Threaded Computation on CPU ...

I wonder if it can be used in such way:

Asyc Concurrent Computation on CPU -> Asyc Concurrent GPU Calls -> Parallel Time Independent Computations on GPU -> Asyc Concurrent Computation on CPU


And VGPUS are isolated from one another, that's the whole point-so using multiple in one application would be very difficult, as I don't think they can share data/memory in any way.


amd hasn't even been advertising their old "mxgpu technology" nor any follow up, & I am quite sad about it.

hacks like nucleus coop[1] to run multiple instances of games on a desktop work via a shared screen & hacks, but I'd rather have a multi-head desktop with a bunch of gaming vms.

[1] https://github.com/lucasassislar/nucleuscoop


Does this mean we'll now see miners make 3-5 vVGPUs from one 3090 for triple the hashrate per card? Because I'm already having trouble getting a 3090 as it stands.


This is super! What would it take to abstract it similar to CPU/Memory by specifying limits only in croups? Limits could be like GPU Memory size/amount of parallelization?


I don't understand how and where I can download nvidia grid vgpu driver. Anyone can help me?



Thanks :)


Great work ! Does it limited by Nvidia's 90 day vGPU software evaluation ?


Hacking at its finest! Nice


Is this only for Nvidia GPUs? If so, why not put it in the title?


what are those "some Geforce and Quadro GPUs that share the same physical chip as the Tesla GPUs"?



thanks! it is interesting. gtx 1060 3gb is ok, but rtx 3070 isn't. and it is time to upgrade from my trusty 970


This puts me in a tough spot: There are good reasons to go with nVidia (once they release GPUs with proper memory configuration), DLSS, RTX and now this. On the other hand, do I really want to give money to the company that locked this feature in the first place? Difficult call, but at least the ridiculous prices mean I can still think about it for a few more months.


I'm very much AMD fanboy and I find RTX quite useless since it's only implemented in few games and work well in 2K only on really high-end cards.

Yet AMD is no different there. They also locked SR-IOV for premium datacenter hardware only and they certainly want to keep some features away from consumers.


Not particularly an nvidia fanboy, but RTX can also be used to speed up rendering in Blender with the E-Cycles add-on.

https://blendermarket.com/products/e-cycles




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: