> In order for the document to help me, I clearly needed to find the four error parameters that used to be displayed with the blue screen on older versions of Windows. Windows 10 hides all the information by default, but I discovered that it’s possible to re-enable display of the extra error information by adding an entry to your registry.
I hate stuff like this. What's the point of hiding the parameters, instead of tucking them away in the corner of the screen or something? It just makes things needlessly difficult.
I know the answer is going to be "but it makes the UI look beautiful and most users don't know what the parameters mean anyway!" Error UIs don't need to be beautiful, they need to be functional. Definitely have a nice, clear user-friendly error message, but keep the overall screen technically informative. I can imagine that the only thing more infuriating than having a computer that won't boot, is having one that won't tell you the problem unless you change registry settings (which you can't, because it can't boot). Leaving the extra information there does not harm, and it give people something to Google if they do run into a problem.
Indeed. Regular users have no idea what to do with "friendly" error messages telling them about some unrecoverable failure either, while they just leaves the techies with nothing to go on.
When raising exceptions in code, I make sure the error message contains as much relevant detail as I can stuff in there. Users will just mail a screenshot to support regardless (if they want help before the full error report gets home), so might as well give as much info as possible to make troubleshooting easier.
Yes. What I can't understand is why this extra diagnostic information is not available with a keyboard shortcut, rather than with a change to the registry. If you get a blue screen on boot, you typically have no way to alter the registry. Holding shift or something would make much more sense.
Something similar is how remapping the Capslock key in Windows to another one has to be done via registry edit instead of a Control Panel setting. Given the renewed drive from Microsoft to become the developer’s OS of choice over the last five to ten years, choosing to not offer a more a accessible solution than a registry edit seems contrary to that drive for me.
Older Sun keyboards used to have control in that location labelled by default. I remember buying a happy hacker keyboard back in the early 2000's that also had that as the default configuration, with a dip switch to flip it.
After using a Chromebook for a while I discovered how much I love having my Caps Lock key remapped to launching the desktop search tool, and now I do it on all my computers. E.g. Milou for KDE, I think the MacOS one is called Spotlight.
Windows Input API is multi-leveled and toplevel one sends key combos and characters only. Because keyboard drivers are system-level (although configurable per-user), you can only remap via system reconfiguration, which makes sense.
Microsoft is capable of creating an additional tab in Control Panel to do exactly this: if it requires a system restart or a user logout, then it is fine even if not ideal.
It took switching from Windows to Linux to figure out why my PC had issues booting sometimes.
In Windows, it would just spin a little circle forever until I rebooted. After installing Linux and getting actual error messages, I learned that my SATA AHCI controller wasn't always working at boot and was fixed by plugging my HDD into a different port.
The problem could have been fixed so much earlier had Windows just told me what was wrong.
> Error UIs don't need to be beautiful, they need to be functional
They need to be both. If it can’t be both, there’s as many arguments for it to be either human friendly or actionable.
I think it comes down to it being a human/machine interface, and what the human side does with it. If for more than half of the users the next step from the error UI is to rage phone their IT department, and a sympathic UI might actually stop them from doing it (or at least be considerate), I’d argue a friendly UI should be prioritzed over an overly informational one.
If it can be both, that’s better, but I wouldn’t be surprised if after user testing it appears it’s just damn hard.
Also having an error displayed on screen that can’t be retrieved any other way (logs, diagnostic dump on a console somewhere else, whatever) is a design choice I’d loathe way more than not having the error number on the BSOD.
Another reason why I like Xiaomi software over other Android phones:
When app crashes Android simply tells you that "Unfortunately, App has stopped". Xiaomi UI additionally presents you the stack trace.
BTW: Android used to (?) have "Report" button, but it seems it's gone on new versions (Since 5.0?). Anybody knows where these reports were send? Probably to Android OS team, just like MS sends some crash infos of 3rd party apps to themselves.
You can see the stack trace on stock Android when you click "Report", then "Show system info", then scroll down to "Stack trace" (didn't check exact wording).
The Report button is still there - maybe it is not shown if there is no one to report to? For apps on Google Play, the user reports can be seen in Google Play Developer Console.
Man, this gave me flashbacks to trying to put 3.5-4GB of RAM into motherboards in 2003. After a bunch of iterations of telling the motherboard manufacturers what we wanted, and having them hack the BIOS to work for exactly one DIMM configuration, then fail on anything else, I eventually gave up expecting most of the BIOS to work at all. I added code to the Linux startup sequence that ignored everything the BIOS said about memory maps, queried what physical DIMMS were there, and rebuilt my maps by hand.
It's been so long that I can't tell an e820 map from an MTRR, but this was still a fun read.
> I added code to the Linux startup sequence that ignored everything the BIOS said about memory maps, queried what physical DIMMS were there, and rebuilt my maps by hand.
I'd love to see this code; doing memory management entirely in the OS and ignoring the BIOS sounds like fun (modulo working around BIOS-specific memory reservations so the OS doesn't get its memory stomped on by the BIOS).
I spent some time last year running various scripts to get an NVIDIA GPU working over thunderbolt in windows on a macbook pro. The problem is the DSDT table in many macbooks doesn't allocate enough space for pci-e devices. The NVIDIA driver in windows tries to allocate memory via that table to talk to the eGPU and it fails.
For some reason it works fine under MacOS - either macos ignores the DSDT table completely, or it allocates memory a bit differently than windows. In any case, the answer is to use obscure tools to download and patch the DSDT tables to allocate more RAM toward PCI-e. Doing this through UEFI feels very magic.
I don't know why so much of this stuff is up to the hardware vendor. Maybe there's a good reason, but I would expect the windows memory manager could do a much better job if it didn't have a bunch of memory range sizes hardcoded by apple.
My Toshiba Libretto has similar issues where the BIOS will lie about the capacity of the hard drive, its so bad that when reinstalling Windows I've got to format the drive in another computer.
One of the things you'll quickly learn from working in IT (and probably other fields) is that "not supported" can mean anything from "physically impossible" to "we don't want you to".
In the case of RAM, "physically impossible" would mean something like the CPU not having that many address bits or the PCB traces not routed, but there are a lot of different not-well-advertised configurations of DIMMs available with the same size[1], so it could be that the manufacturer specified a lower maximum just to avoid having to answer subtle questions. For example, 2x8GB may be OK but 1x16GB not.
It could also be a "X if Y except Z else B" situation, and they just couldn't be bothered to document all the possible combinations and/or explain the details of the PC system architecture that result in such limitations.
To test if all the memory is present (and if it all works correctly), running something like MemTest86 might be sufficient.
> "not supported" can mean anything from "physically impossible" to "we don't want you to"
More like:
from "physically impossible" to "we haven't tested that"
> they just couldn't be bothered to document all the possible combinations
Each combination adds exponential testing and documentation requirements.
Not tested != doesn't work. But that doesn't make not supporting it malicious. There are practical and financial cost to testing every combination, ultimately born by the consumer.
My favorite is when they say "not supported" and it's easy to make it work and is just their way of scaring Enterprise customers to their bread & butter. (One example that comes to mind is a bunch of Rigol oscilloscopes that, even though they have FPGA's on them, can still have more functionality enabled with resistors.)
- Something that will be rather obsolete soon, so it won't be tested
- Something that usually works, but has poorly understood corner cases or other implications
- Something that is so infrequently used that it's not worth it to build up proper testing infrastructure to prevent regressions
In this case, who thinks Foxcon has an engineer on call, who's familiar with the motherboard, who can actually answer this question? I personally doubt it.
My Dad bought us an IBM PCjr with 256KB of RAM, which was kind of elite in 1984.
But he wanted more. So, he purchased a pair of thick memory expansion board that brought it up to an unheard-of 640KB of RAM.
Of course, these slabs generated an impressive amount of heat, so we had to set up a series of cooling fans, or else computations would go awry and weird bugs would appear in programs as it overheated.
It had all manner of strange behaviors, and IBM engineers on the support contract would dutifully visit and give us stacks of floppies containing custom builds of MS-DOS to try and help us out when he called about problems with his mods.
The base PC AT shipped with 256k of RAM in 1984, the Fat Mac with 512KB. I think the elite thing here is not the amount of memory but buying a PC jr and then really committing to trying to upgrade it.
> All I know is it passes my RAM tests. Since Linux has been working fine with the 16 gigs of RAM installed, I am not too worried. It’s possible I would have problems if I had more PCI/PCIe cards installed or something, but in my use case, it seems to behave fine. Obviously, your mileage may vary
I run into this so often. "Look, it works for me. Don't try it at home. Who knows if it keeps working? What can I say I'm going to ride it until the wheels fall off".
Well, it's not like you're in any real danger for trying it, if your system becomes unstable you just take out that extra ram. It's exactly the same as overclocking, your box might not work as stable as before, but you get the extra speed, so thousands of people do it...
It's a long term difference between the Unix culture and the Windows NT culture (which date back to VMS) that Windows will give up when things are wrong and Unix will barrel ahead anyway.
I think there's reasons that either approach is the right one, but Microsoft has the benefit that if they panic on boot, there's a good chance the issue will be fixed, and most likely before the product is released. (In this case, the fix was apparently just to list a smaller maximum ram size though)
This is it. Firmware and OEM system configuration in the PC industry is basically garbage, and always has been. OEMs tweak until it boots whatever version of windows will ship on it, rush it out the door, and never touch it again unless some volume customer comes back with a bug report.
Linux needs to come in after the fact and run on whatever garbage happened to ship.
(FWIW: in this case the root cause was a host bridge in the tables which had been granted a truly outrageous memory space despite having no devices in it. It was likely a typo, or some test stuff that got left in.)
An excellent quote. The line between the two isn't always clear either, though usually the uncertainty is in only one direction for me, i.e., things I thought were clever at the time turn out to be stupid down the line.
In another career I sat on a conf call where an engineering manager was really upset and went off on a tangent about how X code should not "freak out" every time it sees something unexpected.
I was not writing code at the time and had nothing to contribute.
So apparently they fixed it later.... then came the security issues, unexpected behavior, shit going sideways.
Sometimes a straight crash is not such a bad thing.
I would rather something crash and log than continue silently most of the time, especially with backend and server systems.
One of the things I like about Kubernetes is that the ecosystem (generally) tries to adhere to the "Not healthy? Then crash and keep crashing" mantra when something doesn't work. If I see something is in a CrashLoopBackoff I at least know its b0rked. Stuff that reports its up and running when it's actually hosed is really annoying.
> (In this case, the fix was apparently just to list a smaller maximum ram size though)
You're assuming this was an issue. Manufacturers religiously create artificial tiers in software to upsell users. This is true of hardware just as much as it is software.
That's because back in the day, Unix was designed for hackers to mess around and play games, while VMS was designed for actual production workloads (and Windows inherits this legacy).
Toward the end of the Bell system, AT&T used Unix to run the control plane of phone switches and also to do administrative tasks for the phone network. I remember a paper in the Bell Systems Technical Journal where they did auditing of billing records for the whole U.S. with a set of tools kinda like grep, awk, etc. but with binary-format data records.
There are situations where operating systems get behind the 8 ball and there isn't a perfect thing to do. Giving up to prevent data corruption is one choice, but trying to soldier on and do the best you can is another.
The L3 cache is large enough to run an 80s operating system, there is no reason why BSODs to be so minimalistic, information about the possible why it failed could be done.
Lists of recently switched processes, or the general category of errors it falls in.
There's a school of thought that would say that by default, error messages shouldn't be a huge scary wall of text, because you have to consider that a majority of the people who would see an error message are not developers, and have no idea what to do with that information.
It should certainly be possible for someone who can diagnose an error to see the details, but making it the default can be problematic when the message has a chance of being shown to real end users.
> There's a school of thought that would say that by default, error messages shouldn't be a huge scary wall of text, because you have to consider that a majority of the people who would see an error message are not developers, and have no idea what to do with that information.
I think normal users have a pretty good idea what to do with it -- show it to the nearest tech person or paste it into Google.
Just giving it to them up front makes things easier, not harder. They need help so they take a screenshot. If it contains the relevant information, someone can help them. If it just says "an error occurred" then nobody can help with that because it contains no information. Now they have to go back and have the tech walk them through trying to extract the information from some log file, which is seventeen clicks through an interface they've never used and now there are 5000 entries and they don't remember exactly what time it happened etc.
It also deprives the user of any opportunity to learn. The user has a problem, they get a message, they ask the tech what to do and the tech tells them. Now the next time they get the same message they know what to do. But if the message is always the same regardless of the cause then they lose the ability to match known problems with known solutions and give up.
Not all users care to learn that sort of thing, but you take it when you can get it, not purposely inhibit the user from doing it.
Thank you for that perspective - I've never thought of it that way before. Would it be worth appending "if you don't know what this means, ask your local nerd"?
Sure. You don't have to take over the whole screen with a gdb console they don't know how to use. Make the message as friendly as possible, but it still ought to actually contain the information necessary to identify what happened.
I suspect the modern trend of making problems opaque comes from companies that do it on purpose. If the user can solve their own problems, what do they need with your expensive support contract? Why have the user fix problems with their existing device when you can sell them a whole new one the first time anything happens?
And then other developers who don't even use those business models still cargo cult the same UX.
That sounds like airbags in a car. On the majority of trips you don't need them, but their presence is of little consequence. When you do need them, you're out of luck if they're not there.
First of all, how would you localize it? English-only errors would not be acceptable today like they were in the 80s. Secondly, what if accessing/storing that diagnostic data is what caused the crash?
Not acceptable? English is the lingua de franca of technology.
I rather have a detailed error report in one language and then have to find someone who speaks it to translate it (if Google doesn't do a good enough job), than fifteen poorly written versions. Not to mention, those error screens should be as minimal as possible in terms of features so less goes wrong.
I don't want animations, fancy colors, dealing with the horror show that is localization, or anything of the sort. Because the system when it hits a bsod is in an undefined state, so it's best to exercise as little as possible of the system, just enough to get the error message out.
Exactly what I'm saying! If all it does is show a bitmap, that's the simplest possible way of getting the error message out. The more detailed you want to make the BSOD screen, the more chance of something going wrong while rendering it. Localization is just one problem with making it detailed, but there are plenty of others too even if you settle for English-only errors.
Yeah, given the choice between a translated message that is utterly useless and an English-only informative error dump, I'm gonna bet everyone remotely capable of dealing with the error would choose the latter every single time.
> everyone remotely capable of dealing with the error
Already this excludes most people who will see the error. Meanwhile, experts already have more sophisticated tools available to them like the event viewer. Obviously that wouldn't be useful in a situation like this where boot-up is blocked, but like I mentioned previously, there's only so much diagnostics you can reliably provide on a system that is in the middle of crashing.
Well, you can't screenshot a BSOD, so I assume you mean they would take a photo of it (which would be awkward as hell just getting a high enough res copy from them that I could read it) or more likely they would just tell me the error message verbally and leave out all the advanced details. So I'm not sure this is really that good of a justification to make the BSOD screen more detailed.
A BSoD screen could display a QR code that encodes a URL at microsoft.com that unpacks all the data after the '?' in the URL, and provides an option to change culture and locale, including links to support pages known to be associated with that error, and a button to Bing-search for more information on the error. They could then copy-paste the URL for that page using their phone into a text or email to their favorite technical adept.
I meant that the QR code should be larger, to encode the additional information shown after the registry patch as additional URL parameters. On re-reading, I did not make that clear.
Nobody is decoding QR codes by hand, so it isn't that big a deal to go from "https :// www.windows.com / stopcode ? code=ACPI_BIOS_ERROR" to "https :// www.windows.com / stopcode ? code=ACPI_BIOS_ERROR & p1=0000000000000002 & p2=FFFF9A0..."
Ah, you're right. I didn't realize the bugcheck parameters weren't included in the code, that's definitely something which would be helpful. However then you still might run into the second problem of not being able to reliably render it in a crashing state. That's the only reason I could think why they would not include those parameters in the QR.
> If it is for the user then being able to google the exact error
And they already provide just enough information to be able to google the error. I am just opposed to adding more detailed information, the kind of information that's only relevant for experts and is not guaranteed to be available in the middle of a crash.
> In other news most programming languages aren't localized for example.
Programming languages aren't consumer products like operating systems are
My HP N40L "MicroServer" only supports up to 8GB RAM, but I have it running now with 16GB. Sometimes FreeNAS will boot saying there is 16GB, and other times it will boot saying there is only 8GB. I'm unwilling to risk loosing that old reliable system via a potentially dangerous BIOS hack, but it sure would be nice not have to reboot it until I win the lucky 16GB jackpot.
I would be concerned that there’s some electrical reason why it’s detected intermittently, and that the extra 8GB could easily disappear while the system is running.
Low-level software is responsible for most memory issues, but there are definitely electrical reasons, too.
For example, I have a devboard that only support 1 GiB of DDR2 RAM, but it's a 64-bit system and the memory controller on the CPU was supposed to support at least 2 GiB of RAM. Meanwhile, another board that uses the identical chip runs 2 GiB RAM without problems.
The engineers of the devboard briefly explained that the problem was electrical. The memory controller itself has inadequate drive strength, adding more RAMs would increase the load on the DDR bus and destabilize the system. On the other hand, the other board had better PCB layout so the problem did not occur.
I have a Mini-ITX board, I noticed that if I activate XMP, the board will cease to work, but it works with 4 GiB of RAM at standard frequency.
So it seems memory is a general problem among Mini-ITX boards? Perhaps the reason is that these boards have less available space for routing, fewer layers, and targets a lower price, so they trend to have worse electrical characteristics?
Quality of the PCB and the number of layers definitely plays a factor I'm sure it isn't limited to ITX though. I have noticed compatible differences between super robust Intel ITX boards vs ECS thin and wobbly ITX boards, with the ECS board having more issues, quirks and what not.
I had something similar with my 2011 MBP. I had it running just fine with 16GB of ram even though it officially only supported 8GB. After having the main board replaced because of a defective graphics card I could not run it with more than 8GB without it crashing continuously.
I once had a defective 512MB RAM module. Usually it would be detected as a 256MB module and it would work fine. Sometimes it would be detected as a 512MB module, but corrupted data would crash the system within a few minutes.
You used to be able to scan memory consistency and pass a kernel parameter to skip bad regions of memory. I ran a system for years with known bad memory that way.
...and it probably already has, multiple times, but you haven't noticed because you've not used that much RAM.
The BIOS does a (relatively) quick memory check in the POST to detect how much memory is actually available, basically by writing a series of patterns to all addresses and then reading them back to confirm; some desktops have a "fast boot" option which mostly skips it (I believe it's something like testing one byte per 4KB instead of every byte), and servers usually have a much more thorough test that can take many minutes.
The best way to check whether the memory is functional when 16GB is detected is to run a memory tester like MemTest86.
Are you sure about that? I bought mine with 16gb a number of years ago and still use it as a NAS + plex server (zfs loves to chew through ram). The wikia page also talks about it: https://n40l.fandom.com/wiki/Memory
I've done the same thing with a Synology NAS. Both Synology and the Intel CPU datasheet say it only supports 8 GB of RAM, but put 16 GB in it and it will use it.
The Synology web UI only shows 8 GB and it breaks the graph display in the memory usage monitor (although when I first installed it, it used to show 16 GB but after some update, or random reboot, it stopped. Haven't tried rebooting to see if it's just random)
I got two from the same guy on craiglist and they already had 16GB and it works everytime. Not sure what he did but I'm glad I wasn't the one fiddling with the BIOS
As kstenerud said, https://n40l.fandom.com/wiki/Memory discusses this issue. More accurately, the comments discuss it, with some reporting your issue. I am among those who've never had problems with detecting 16GB; I use two KVR1333D3E9S/8G.
Love this. I've got an old Pentium box I thought I was going to have to hack up the BIOS to get larger IDE disks working, but don't you know it the motherboard manufacturer (AOpen) still had the BIOS update files on their site and after updating I could boot 32 GB disks on a board that only claims to support up to 8 GB. I think the lesson is, at least check for BIOS updates before you throw in the towel :) Maybe not everyone needs to go as far as hacking ACPI tables in memory, but it sure sounds like fun.
In that era, you could often just ensure that the BIOS could load the bootloader/kernel by placing it into the region still reachable by the BIOS. OSes like Linux (and anything Windows NT based) practically completely disabled[1] the BIOS anyway, and their own IDE drivers would be recent enough to understand how to address the full disk.
It was not uncommon to have a small boot partition at the beginning of the disk for that purpose.
[1] Bit of a simplification. In reality, the BIOS being 16bit real mode code meant that you had to jump through very elaborate hoops if you wanted to use it in your protected mode OS in any way, and then for questionable gain.
Oh yeah, definitely. I could get it to boot by marking the disk as 8 GB in BIOS, but this had side effects:
- I'm using CompactFlash as many do, since it's fast, cheap, and reliable. Some of my CF cards are smaller than 8 GB, and if I hard code it I can't boot with those.
- I actually do run OSes on that machine that thunk out to BIOS (Win98 for example.) If I really wanted a computer for useful things, no doubt I'd run Linux on it, but all the same, I've got plenty of smaller computers with unilaterally more power (even a RPi is much faster.)
(FWIW: autodetect properly detected the right parameters, it just locked up on boot. My guess is it was some kind of simple integer overflow bug or something.)
> It was not uncommon to have a small boot partition at the beginning of the disk for that purpose.
still had to that a few weeks ago because on a dell r720 grub did not want to boot my zfsonlinux rootfs on a 6tb pool - it's either the dell hba controller, grub oder some other limitation but once you go beyond 2tb or 4tb disk I always run into strange behavoir.
EFI doesn't solve it directly, but GPT does. Regular MBR can't handle more than 2tb disks. Of course when running EFI you always want to use GPT. Linux might be able to handle >2tb MBR somehow, but in doing so it might be confusing your BIOS perhaps.
I remember running into this back in the day on one of my Windows NT boxes. Knowing that NT only uses the BIOS during the boot process, I just installed NT on a partition that was below the 8GB limit. Once the system booted the HAL communicated with the hardware directly and not through the BIOS. So it saw the full size of the drive, and I was able to create a second partition to use the full drive.
Yep this was one of the reasons that a boot partition for linux and other operating systems became some common back in the early 2000s. Disks outpaced the hardware and a lot of them didn't support reading from further in on the disk. Operating systems could use drivers to talk directly to the hardware and then get around that limitation once booted but everything they needed to that point needed to be available early on.
The workhorse oldie-but-goodie A1278 MBP mid-2012 non-Retina supports 16 GiB and two SSD's (one in the optical bay, and one in the HDD bay).
The Lenovo T480 supports 64 GiB. A 13" hackintosh-friendly laptop with:
- dual m.2 slots
- WQHD 2560 x 1440
- Thunderbolt 3
- water-resistant keyboard with drains
- 9 hours of battery life with the second, extended battery
- officially user-serviceable parts/guides
- & 64 GB!
The iPhone 6S, even with the headphone jack, is IP67 in all but name but didn't sell it as a feature... it has all the gaskets of the 7 but supposedly the headphone jack was an issue... mine's been in the shower a few times for YouTube morning news and still works.
macOS and most apps have nice UX and mostly work well together. Docker usually works on top of the built-in macOS intel-origin hypervisor. (VMware Fusion is another $olid option for $$; had problems with VirtualBox.)
What's neat about the T480 is there's an utility on the Windows partition to flash the BIOS startup boot logo from red "Lenovo" to whatever you want.
A true hacker. And a wonderful read. Reminds me that the ability to really understand and control your system is one of the biggest achievements and power of Linux and FOSS.
> I clearly needed to find the four error parameters that used to be displayed with the blue screen on older versions of Windows. Windows 10 hides all the information by default
A somewhat disappointing trend in user interface design…
About six years ago I updated a laptop to 16gb of RAM that only officially supported 8gb, and I remember thinking "the greedy manufacturers lied! It works fine!" However, I was running Linux, and now upon reading this I wonder if I would have had the same issues if I had tried booting into Windows.
My old 2011 MacBook supports only 8GB officially - though the Intel chipset onboard will support 16GB - It's run happily with 16GB of RAM since 2012...
Oh yeah, I actually might have had that same model of MacBook, and I also think I put 16gb in there with an SSD; gave it to my brother in law. Amazing how well those things can hold up long term.
I remember being 14 years old playing with my NEC PC because IRQ conflicts caused it to reboot at unexpected times. My dad might have fixed it, but he didn't have time so I had to rely on myself if I wanted to play Warcraft ever again. I didn't really know what I was doing, but it was the first time I took something apart trying to figure it out. I explored BIOS and all the hidden parts that Windows tries to hide from casual users. Looking back, that's really where my perseverance in working through hard problems began.
I was expecting this to be "Linux roolz, Microsoft droolz," but I really enjoyed this. It's been a while since such simple, fun hardware exploration came across my desk.
I have a kind of opposite problem on an old Lenovo laptop where Windows boots just fine every time but any Linux distro I try randomly fails to boot with an ACPI error.
It's not a RAM issue though. AFAIK there's some bug in the UEFI/BIOS even on the latest version which Windows handles just fine but Linux randomly doesn't. I can boot Linux every time if I disable ACPI at boot but then I lose power management and battery percentage. Latest kernels (can't remember the version exactly, 4.somethingoranother) should handle the issue according to some forum posts I found but it doesn't for me.
I wonder what the Mac equivalant 'fix' would be? The 2010 15" MacBook Pros only supported 8GB, where the 13" models supported 16GB. They used this same i5 chip.
Seriously. I tried putting the 16Gb of ram into the 2010 15" mbp, and it would boot into linux, but not macos.
Running memtest it was clear that there were issues with accessing ram in various segments, but I didn't have the knowledge to muck around with the acpi tables to see if that would fix it.
Back in 2012 I had a white MacBook. Apple said it only officially supported 4GB butt there were numerous reports of people getting it to recognize 8GB without issue. Back then I was still in high school and only semi-technical so I thought it was some sort of black magic.
Gather 'round, young 'uns: the 1998-vintage Powerbook G3 "PDQ" officially supported only 192 MB of RAM. Unofficially it was possible, if you bought the right DIMMs, to upgrade it to 512 MB. Which I did, just for laughs. (It was also possible, using XPostFacto, to upgrade it past the official OS X 10.2.8. maximum to 10.4.11. But it was so slow it was practically unusable.)
Nice find! Reminds me of the Dell Vostro 230. It will "officially" support 8GB memory but if you buy 8gb "low density" DIMMs it will accept 16GB just fine. Not nearly as difficult as your setup but similar in nature.
I had a similar adventure on the Vostro 230, followed by
editing the BIOS to activate the vmx flag so that I could do hardware virtualisation with the Xeon processor that it shipped with!
Since it's kind of related to something a wondered about - can you spoof a device ID with those tools? So an operating system just eats it up? I know it's possible with VMs and device passthrough.
My every day workstation is a Dell Studio 540 from 2009 that I've upgraded to the max in every possible way but without changing the motherboard and case.
The only upgrade I haven't pursued (because I thought it was impossible) is bumping up the RAM beyond 8GB. If anyone has experience bumping 2009 era Dell motherboards (in particular the M017G motherboard) beyond 8GB RAM and has any tips to offer, please let me know!
If I could bump up the RAM to 16GB, I might be able to keep using my beloved desktop for another 10 years :)
looks like it's a Intel G45 chipset - so 16gb should work, however you need to be careful when selecting memory - there might be max size on the chips on memory module (so there are memory modules with 8gb with 8x1gb chips or some with 2x2gb etc.pp. - we had some trouble updating older dells with the newer modules with less chips. It's called low density I think.
I think I would also run the memtest under the VM to make sure that the memory map is actually all reachable within Linux and doesn't mess anything up.
I don't know much about kernel memory management, but if he was booting memtest86+ bare metal, I wouldn't be surprised if he wasn't triggering the same edge cases Linux would do - which is why I'd prefer to make the actual system try to allocate as much memory as possible, use it and see if the results are sane.
I'm the author of the blog post. My first time here on Hacker News...very cool that it was picked up! I did my memory testing inside of Linux using memtester. I honestly want to do more thorough testing at some point to prove that it's actually working okay. If some of the PCI hardware is still paying attention to that overlapped range, weird stuff could definitely happen.
My MacBook Pro 13 mid-2010 "officially" only supports 8GB RAM maximum - but I run 16GB - thanks to the research by other folks on the internet related to the Intel hardware inside. The particular brand & model of sticks had to be fairly specific, but can confirm it's been rock solid for 3 years and certainly extended it's useful life for me when money was tight.
> don’t know if the motherboard is actually hardwired to look at those addresses for PCI stuff or what. All I know is it passes my RAM tests.
I guess you could allocate a humongous matrix with random cell values and do a CRC or something (once while you're filling it in, and again from square 1) to make sure the values are indeed what you think they are.
With respect to booting with a conflict in the system memory map: the Windows behavior is the right one, IMHO. An operating system shouldn't just ignore self-contradictory system configuration information and try to continue anyway. Who's to say that the resulting behavior is the right one? You're in undefined behavior territory. Failing fast when encountering invalid state, as Windows does, leads to more robustness in the end: fail-fast gets problems fixed, not swept under the rug.
Fail-fast only has a hope of getting the problem fixed if it occurs with popular software within a few months of product launch. In the case of using a motherboard with DIMMs that didn't exist until the MB was obsolete, you're never going to get the vendor to fix it, unless you have a very expensive long-term supply and support contract of the sort that doesn't exist for consumer parts.
I don't think it's a bug, it's a feature (no pun intended).
If there is problem with reported values from a BIOS, it's a good thing Windows does not continue. BIOS issues (such as a RAM address out of the reported range) can lead to unpredictable behaviour. Troubleshooting such issues is next to impossible.
If I were trying to explain to someone who knew computers but not cars why I would never buy a car with an engine swap, this is exactly the post I would point them to.
Making a joke about how miserable it is for some of us to deep-dive into systems to get stuff working. I wasn't expecting any upvotes, but it was neither a rude nor an idiotic thing to say, so y'all need to chill.
Sure. I've commented before I've read the article. Now I see. In fact the BIOSes I've ended up using during the last years of my BX PC (before I've bought a laptop) were community-patched. Now I wonder if it is possible to patch my Dell E5500 to support more than 4 GiBs or if it's actually an unbreakable limitation of its chipset...
I hate stuff like this. What's the point of hiding the parameters, instead of tucking them away in the corner of the screen or something? It just makes things needlessly difficult.
I know the answer is going to be "but it makes the UI look beautiful and most users don't know what the parameters mean anyway!" Error UIs don't need to be beautiful, they need to be functional. Definitely have a nice, clear user-friendly error message, but keep the overall screen technically informative. I can imagine that the only thing more infuriating than having a computer that won't boot, is having one that won't tell you the problem unless you change registry settings (which you can't, because it can't boot). Leaving the extra information there does not harm, and it give people something to Google if they do run into a problem.