I occasionally see stutters too, even with Full HD video. Or more precisely, mplayer complained about slowness and having to drop frames.
It often helped to actually give the VM more cores (not just the default 2), but sometimes it was due to some weirdo codec/quality setting, and recoding the video just solved it. Sometimes switching to vlc (from mplayer) helped. Other times it was simply due to the sys-usb vm being overloaded.
Not sure what "mpv" means in this context, but this reminds me the one actual pet peeve I have with Qubes - video/audio calls just don't work for me. It either doesn't work or the audio quality is really poor. I've tried all kinds of stuff, without much success. I'm using phone/tablet as a fallback, but it's not very convenient.
mpv is a free (as in freedom) media player for the command line. It supports a wide variety of media file formats, audio and video codecs, and subtitle types.[0]
I'm using Qubes OS as my primary for years - I think I started with the 2.0 release in 2014 (I might have tried/used the 1.0 release, I don't recall.) and I was immediately hooked.
I understand the usual story is that the goal is security benefits, and the compartmentalization (or rather the implied inconvenience) is the price for that. But for me the compartmentalization turned out to be a benefit on it's own, and actually convenient.
I find it extremely convenient to have multiple isolated / virtual workspaces for different stuff, even if you assume attackers / malice do not exist. Having separate VMs is not the same as having separate folders. I also love the VM templates, which allow me to do all kinds of experiments (e.g. install packages in the app VM, which disappear after restart). Or run VMs with a mix of distros/versions/... Yes, I could do some of that with plain VMs, but Qubes integrates that in a way that I find very convenient. The commands for copying stuff between VMs are muscle memory at this point.
Yes, there are limitations, like the lack of GPU acceleration. But movies in 1080p play just fine without it, and I'm not a gamer, so I don't mind much. I can't play with CUDA etc. on these QubesOS machines, and scrolling web pages with large images is laggy, but I find this to be an acceptable price.
I went through multiple laptops / workstations over the years, and the situation improved a lot I think. Initially I had to solve quite a few issues with installer, some hardware not working (or requiring setting something special), or poor battery life on the laptops. But after a while that mostly either went away, especially once I switched to laptops with official Linux support (Dell Precision were good, I'm on Thinkpad P1 G7 now). The battery life is pretty decent too (especially once I disabled HT in BIOS).
Is it perfect for everyone? No, certainly not. But it sure is great for me, and I hope they keep working on it.
Now that I've read this, I can also remember that I was also annoyed by jerks when scrolling web pages.
I also found the backup management too complicated. I didn't want to back up entire VMs, just the data within the VMs. In principle, I would have had to start up all VMs for backups and run a backup script for each individual VM.
I only noticed the jerky scrolling on pages with a lot of images, particularly hires + CSS effects (blur etc.). Everything else feels OK to me (I'm sure it could be smoother, but it's not too bad so I haven't noticed).
For backups, I don't them the qubes way, I do "regular" backups within VM using rsync/duplicity/... When moving to a new machine I prefer to setup everything from scratch (and then restore the data). And it gives me all the features like incremental backups etc.
Anyway, I have no idea what would it take to do something like that in Postgres, I'm not familiar with this sfuff. But if someone submits a patch with some measurements, I'm sure we'll take a look.
Interesting question. I think most optimizations described in the BOLT paper are fairly hardware agnostic - branch prediction does not depend the architecture, etc. But I'm not an expert on microarchitectures.
A lot of the benefits of BOLT come from fixing the block layout so that taken branches go backward and untaken branches go forward. This is CPU neutral.
Yeah, getting the profile is obviously a very important step. Because if it wasn't, why collect the profile at all? We could just do "regular" LTO.
I'm not sure there's one correct way to collect the profile, though. ISTM we could either (a) collect one very "general" profile, to optimize for arbitrary workload, or (b) profile a single isolated workload, and optimize for it. In the blog I tried to do (b) first, and then merged the various profiles to do (a). But it's far from perfect, I think.
But even with the very "rough" profile from "make installcheck" (which is the basic set of regression tests), is still helps a lot. Which is nice. I agree it's probably because even that basic profile is sufficient for identifying the hot/cold paths.
I think you have to be a bit careful here, since if the profiles are too different from what you'll actually see in production, you can end up regressing performance instead of improving it. E.g., imagine you use one kind of compression in test and another in production, and the FDO decides that your production compression code doesn't need optimization at all.
If you set up continuous profiling though (which you can use to get flamegraphs for production) you can use that same dataset for FDO.
Yeah, I was worried using the "wrong" profile might result in regressions. But I haven't really seen that in my tests, even when using profiles from quite different workloads (like OLTP vs. analytics, different TPC-H queries, etc.). So I guess most optimizations are fairly generic, etc.
I was working some years ago on 'happy path fuzzing', trying to find heuristics to guide through code avoiding all error handling, runtime checks. Never got better results than afl-go or other targeted fuzzing, but you have to know what's your happy path.
Also tried to use previous-version or previous-previous-version coverage ('precise' through gcov, or intel processor trace, or sampled perf traces, down until poor-man's-profiler samples) coupled with program repair tools, and... never managed to jump from fun small toy examples to actual 100+kloc applications. Maybe one day.
What exactly do you think PGO data looks like? The main utility is knowing that (say) your error handling code is cold and your loops are hot, which compilers currently (and so on).
This is indeed unknowable in general but clearly pretty guessable in practice.
I agree the +40% effect feels a bit too good, but it only applies to the simple OLTP queries on in-memory data, so the inefficiencies may have unexpectedly large impact. I agree 30-40% would be a massive speedup, and I expected it to disappear with a more diverse profile, but it did not ...
The TPC-H speedups (~5-10%) seem much more plausible, considering the binary layout effects we sometimes observe during benchmarking.
Anyway, I'd welcome other people trying to reproduce these tests.
I looked and there is no mention of BOLT yet in the pgsql-hackers mailing list, that might be the more appropriate place to get more attention on this. Though there are certainly a few PostgreSQL developers reading here as well.
True. At the moment I don't have anything very "actionable" beyond "it's magically faster", so I wanted to investigate this a bit more before posting to -hackers. For example, after reading the paper I realized BOLT has "-report-bad-layout" option to report cases of bad layout, so I wonder if we could identify places where to reorganize the code.
It often helped to actually give the VM more cores (not just the default 2), but sometimes it was due to some weirdo codec/quality setting, and recoding the video just solved it. Sometimes switching to vlc (from mplayer) helped. Other times it was simply due to the sys-usb vm being overloaded.