Hacker News new | past | comments | ask | show | jobs | submit | ewalk153's comments login

My guess has been they both originate from heroku; docker heroku to dokku, pico heroku to piku


Great to see the updated docs.


This seems to be the narrative Microsoft was pushing at the Build conference this week.


Weird, because they make one of the best small language models (phi series) which is great for finetuning.


Do you actively use this function in any projects? What was your inspiration to write the crate?


I don't actively use it, unfortunately. The main inspiration was to faster sort inputs into the https://github.com/BurntSushi/fst crate, which I in turn used to try to build a search library.


Does this appear to be intentionally left out by NVidia or an oversight?


NVidia wants you to buy A6000


Seems more like an oversight, since you have to stitch together a bunch of suboptimal non-default options?


It does seem like an oversight, but there's nothing "suboptimal non-default options" about iteven if the implementation posted here seems somewhat hastily hacked together.


> but there's nothing "suboptimal non-default options" about it

If "bypassing the official driver to invoke the underlying hardware feature directly through source code modification (and incompatibilities must be carefully worked around by turning off IOMMU and large BAR, since the feature was never officially supported)" does not count as "suboptimal non-default options", then I don't know what counts as "suboptimal non-default options".


> bypassing the official driverto

The driver is not bypasses. This is a patch to the official open-source kernel-driver where the feature is added, which is how all upstream Linux driver development is done.

> to invoke the underlying hardware feature directly

Accessing hardware features directly is pretty much the sole job of a driver, and the only thing "bypassed" is some abstractions internal to the driver. Just means the patch would fail review in basis of codestyle, and on the basis of possibly only supporting one device family.

> through source code modification

That is a weird way to describe software engineering. Making the code available for further development is kind of the whole point of open source.

> turning off IOMMU

This is not a P2PDMA problem, and just a result of them not also adding the necessary IOMMU boilerplate, which would be added if the patch was done properly to be upstreamed.

> large BAR

This is an expected and "optimal" system requirement.


> The driver is not bypasses. This is a patch to the official open-source kernel-driver where the feature is added, which is how all upstream Linux driver development is done. [source code modification] is a weird way to describe software engineering. Making the code available for further development is kind of the whole point of open source.

My previous comment was written with an unspoken assumption: Hardware drivers tend to be very different from other forms of software. For ordinary free-and-open-source software, the source code availability largely guarantees community control. However, the same often does not apply to drivers. Even with source code availability, they're often written by vendors using NDAed information and in-house expertise about the underlying hardware design. As a result, drivers remain under a vendor's tight control. Even with access to 100% source code, it's often still difficult to do meaningful development due to missing documentation to explain "why" instead of "what", the driver can be full of magic numbers and unexplained functionalities, without any description other than a few helpers functions and macros. This is not just a hypothetical scenario, this situation is encountered by OpenBSD developers on a daily basis. In a OpenBSD presentation, the speaker said the study of Linux code is a form of "reverse-engineering from source code".

Geohot didn't find the workaround by reading hardware documentation, instead, it was found by making educated guesses based on the existing source code, and by watching what happens when you send the commands to hardware to invoke a feature unexposed by the HAL. Thus, it was found by reverse-engineering (in a wider sense). And I call it a driver bypass, in the sense that it bypasses the original design decisions made by Nvidia's developers.

> [turning off IOMMU] is not a P2PDMA problem, and just a result of them not also adding the necessary IOMMU boilerplate, which would be added if the patch was done properly to be upstreamed.

Good point, I stand corrected.

I'll consider stop calling geohot's hack "a bypass" and accepting your characterization of "driver development" if it really gets upstreamed to Linux - which usually requires maintainer review, and Nvidia's maintainer is likely to reject the patch.

> [large BAR] is an expected and "optimal" system requirement.

I meant "turning off (IOMMU && large BAR)". Disabling large BAR in order to use PCIe P2P is a suboptimal configuration.


> For ordinary free-and-open-source software, the source code availability largely guarantees community control. However, the same often does not apply to drivers. Even with source code availability, they're often written by vendors using NDAed information and in-house expertise about the underlying hardware design. As a result, drivers remain under a vendor's tight control.

This is not true for any upstream Linux kernel driver. They are fully open source, fully controlled by the community and not subject to any NDAs to work on. The vendor can only exert control in the form of reviews and open maintainership. This is the case for both AMD and Intel GPU drivers.

For numbers, Arch Linux bundles some ~7.5k loadable kernel drivers (which is a subset of all available, but AUR only has 8 out-of-tree drivers, out of which two are proprietary and 2-3 are community-led. "vendor-controlled" drivers are an extreme outlier on Linux.

While not true for nvidia's proprietary, closed-source driver stack, the linked open source nvidia (kernel) driver is nvidia's work-in-progress driver to be upstreamed in exactly this fashion.

> by watching what happens when you send the commands to hardware to invoke a feature unexposed by the HAL. Thus, it was found by reverse-engineering

It is true that documentation was lacking, but there was no reverse-engineering here, just standard (albeit, hacked up) driver development. "The HAL" means nothing, as it's just an random abstraction within the driver.

I used to work on proprietary network drivers for high-performance, FPGA-based NICs, and apart from it being more annoying to debug it's really no different than coding on anything else. Unless you're bringing up new IP blocks, it's mostly you and the code, not specs.

It would have been reverse-engineering if the registers and blocks weren't in the driver in the first place, but in this case all the parts were there to bring up a completely standard feature. What he did was use existing driver code to ask for what he wanted.

Not saying that it wasn't significant effort, but unless you consider "reading code you did not write" reverse-engineering (in which case all coding is reverse-engineering), then it has nothing to do with reverse-engineering, or bypassing anything. Considering this reverse-engineering is also a bit of a disservice to those that actually reverse-engineered GPUs and wrote open-source GPU drivers for them, like panfrost and recently the Apple silicon drivers.

Apart from the fact that the feature is hacked and not properly wired up (as evident by the IOMMU issue and abstraction break), the implementation flow is done exactly the same as when volunteers contribute fixes or features to other, complex open-source upstream GPU drivers, like AMDGPU. Not by reverse-engineering, nor by signing NDAs, but by reading and writing code and debugging your hardware. Stuff is just always harder in kernel mode.

> I meant "turning off (IOMMU && large BAR)". Disabling large BAR in order to use PCIe P2P is a suboptimal configuration.

The requirement is large BAR on, not off - the phrasing is a bit poor on the page, but they're trying to say that you need large bar support, and that you need IOMMU off, not that you need large bar and IOMMU off.


> This is not true for any upstream Linux kernel driver.

This is very true for at least a portion of upstream Linux kernel drivers.

> They are fully open source

True.

> fully controlled by the community

False (Of course, this claim is based on my belief that source code access is insufficient to do driver development, hardware datasheets are not just nice to have but mandatory for a true free operating system. If you disagree, nothing more can be said).

The upstream Linux kernel has absolutely no requirement that the contributed hardware drivers must include public documentation. In fact, many are written by vendor-hired contractors, or by independent embedded system companies who make computer gadgets based on these chips (so they have a business relationship with the vendor that give them datasheet access).

If you want to do any low-level work on these kinds of drivers independently, it almost always involves trial-and-error and educated guesses.

> and not subject to any NDAs to work on. The vendor can only exert control in the form of reviews and open maintainership. This is the case for both AMD and Intel GPU drivers.

Sure, the drivers themselves are not subject to any NDAs to work on, but hardware vendors often maintain a de-facto control - because they are in possession of technical advantages not available to outsiders, in terms of NDAed information and in-house expertise. Although theoretically everyone is free to participate, and it often happens, but when 80% of the experts are programmers with privileged information, it's difficult although not impossible for an outsider to join the development effort (it's no surprise that most outsider contributions are high-level changes that do not directly affect hardware operations, like moving away from a deprecated kernel API, fixing memory leaks, rather than, say, changing the PLL divider of the hardware's reference clock).

> For numbers, Arch Linux bundles some ~7.5k loadable kernel drivers (which is a subset of all available, but AUR only has 8 out-of-tree drivers, out of which two are proprietary and 2-3 are community-led. "vendor-controlled" drivers are an extreme outlier on Linux.

As a contributor to the Linux kernel, my personal experience is, drivers that contains code similar to the following are not rare:

    uint8_t registers[] = {
        0xFF, 0xFE, 0xEA, 0x3C, 0x5A, 0x6A,
        0x01, 0x02, 0x3A, 0x4D, 0x55, 0x66
        /* many lines */
    };

    /* initialize hardware */
    write_registers(device_ctx, registers);
I bet at least 10% of device drivers (or as high as 25%?) are written with privileged information and they're almost impossible for independent developers to work with (other than making educated guesses) due to incomplete or NDAed documentation. Datasheet access is the only way to interpret the intentions of this kind of drivers. Even with macro or bitfields definitions - which is better than none, the exact effect of a register is often subtle. Not to mention that hardware bugs are common and their workarounds often involve very particular register-write sequences - and that tend not to be publicly documented. Sometimes, even firmware binary blobs are embedded into the source code this way, and the code is almost always contributed by a vendor who actually knows what the blob is doing. If I was asked to do some development on top of an existing driver but without hardware documentation, I would refuse it every single time.


I have some news for you: you must disable IOMMU on the H100 platform anyway, at least for optimal GDS :-)


I stand corrected. If it's already suboptimal in practice to begin with, the hack does not more it more suboptimal... Still, disabling large BAR size is still sub-optimal...


> then I don't know what counts as "suboptimal non-default options".

Boy oh boy do I have a bridge to sell you: https://nouveau.freedesktop.org/


Thank you for making SQLite such a delightful way to share research through datasette.


17x is a big hurdle to overcome. Likely: - models will get a lot more efficient for storing and retrieving information - the marginal value created with this same hardware will increase - the market price for this hardware will continue to fall similar to historic norms

I expect this to set a floor on the crash.

We will continue to get better and better predictions at more affordable prices.


Could the internet archive offer this as a paid service?

For a reasonable fee, we'll archive your site, and give you back a copy of the assets you can turnkey host on the original domain for cheap with a static hosting solution (s3, cloudflare, etc).

Everybody wins


I've been wondering this too. I've even been considering authoring a web standard to allow hosts to specify how their pages can be archived in a standard way (e.g. which scripts to include, etc.) and then pitch the IA to offer a "pay $X to archive this data forever" deal to the universe.

I'm really curious what the cost per byte would be to make it worthwhile to offer a "host this byte forever, for one up-front fee" service.


> I'm really curious what the cost per byte would be to make it worthwhile to offer a "host this byte forever, for one up-front fee" service.

https://help.archive.org/help/archive-org-information/

> *What are your fees?*

> At this time we have no fees for uploading and preserving materials. We estimate that permanent storage costs us approximately $2.00US per gigabyte. While there are no fees we always appreciate donations to offset these costs.

There's some discussion about this idea on this thread, including comments by ?id=markjgraham, who manages the Wayback Machine, thoughts from John Carmack:

https://news.ycombinator.com/item?id=29639222

https://twitter.com/ID_AA_Carmack/status/1473327982605385735

https://threadreaderapp.com/thread/1473327982605385735.html


Amazing references! Thank you!


No no, the exact opposite. If this bundle of content is so valuable, then someone can make a business out of buying it. Vice could go to WeBuyOldIntellectualAssets.com and get a flat price for it all, and that company would host it or do whatever with it.

The same thing happens with brands - someone bought the Montgomery Ward brand at a bankruptcy auction or something - and with store inventory: once the store goes bankrupt, they just sell the entire store contents, right down to the fixtures, to a liquidator who brings in the "Going out of business! Everything must go!" signs.


This is how Saks Fifth Avenue is actually, when you peel back all of the onion, the honest-to-God 17th century Hudson's Bay Company


Found the young'un :)

That's too attractive to malware peddlers. It's not particularly widespread currently, mainly because most of the content has been centralized into the same big silos... But what you're envisioning here is just going to get abused by abusers


I think the issue with this solution is that the seller loses control over their branding (namely what ads to show) if they do this.


Not just brands, but software too. The primary software I support at my day job was acquired by a company that, based on their other assets, can be described as where software goes to die.

We’re migrating away but they’ll squeeze out what they can from those that don’t/can’t/won’t.

But they still have to provide continuous support, some amount of updates to keep customers functioning, and maybe even get some new customers as a “value” option (that’s barely functional).

Happens to forums all the time (fuck you Internet Brands and Vertical Scope).

100000x easier to do all this with static web content.


They seem to offer something like that:

https://www.archive-it.org/

However, the footer of that website says 2014 and the about page is broken, so not sure if it's still supported.

Also, Cloudflare has a partnership with Web Archive and they offer something similar, but I think it's only made for temporary outages and only archives the most popular pages on your site


The site's last act was to archive itself


Could they (or a for-profit company) bid on it? Do liquidators in the US have to consider any offer, even if it was unsolicited? Does it vary from state to state?



That works too, but there is something to be said about a turn key solution friendly to corporations who are willing to just throw some money to make a problem go away. Plus archive will get a bit of extra money for the Wayback machine!

Just pay some donation, redirect your DNS to the way back machine and bingo.


they are already on it. they do really important work. worth helping them out with funding.

https://twitter.com/Chronotope/status/1760755908219466017


Note that Archive Team and the Internet Archive are separate, unaffiliated entities, though they do often work together.

Archive Team is a loosely organised group of individual volunteers that share a common interest in Internet preservation, and develop tools and share notes to serve that goal. They're basically one of your old-school Mediawiki communities, with very little budget:

https://wiki.archiveteam.org/

Internet Archive is a full-blown multimillion dollar `501(c)(3)` nonprofit, which functions as more of a general-purpose library. They maintain physical offices and datacentres in multiple countries, host many petabytes of data, do activism, run conferences, and when they develop custom tools it tends to be somewhat more advanced than the Archive Team's decentralized web scrapers, like custom book scanning hardware:

https://archive.org/details/eliza-digitizing-book_202107

A lot of the information in the Wayback Machine, which is run by the Internet Archive, was saved and contributed by Archive Team. For example, as of writing this comment, that is true of the latest snapshot of `https://www.vice.com/en`. You can see this with the "About this capture" button on a Wayback Machine capture.

Both groups have ways to receive monetary donations.

For Archive Team though, I wonder if it would be more useful to donate compute by running their Warrior archiving VM/container, or contributing code to their GitHub:

https://wiki.archiveteam.org/index.php/ArchiveTeam_Warrior

https://wiki.archiveteam.org/index.php/Dev/Source_Code


That's the archive team, not the internet archive.


yes, but everything is going on the internet archive. https://twitter.com/Chronotope/status/1760764792887746724


I think the issue is for the IA that isn't lucrative enough to make it worth there time. Someone already did it for them for free, even if it wasn't 100% as good as they could have done it.


Daniel’s PR to implement this in node.js[1] is case study in:

- crafting a high context yet succinct description

- addressing PR feedback well

- giving respect to a pedantic commenter who understands the inner workings far less than Daniel while not conceded to make a destructive change.

I will share this PR widely as arole model in open source contributions.

[1] https://github.com/nodejs/node/pull/50288


The same thing struck me as well. This is one of the best optimization professionals on the planet, showing up with a huge improvement, and receiving some misplaced arrogance.

The lesson here is to always, always watch your own review tone, and not make this mistake.

The other lesson is that when a PR shows up with this kind of technical information attached to it, spend the 60 seconds it takes to Google for "lemire".


I'm surprised that the reviewer was so ignorant of amortized constant time insertion.


If I'm being super pedantic, I would argue that while `string::push_back` should take amortized constant time, `string::append` has no such guarantee [1]. So it is technically possible for `my_string += "a";` (same to `string::append`) will reallocate every time. Very pedantic indeed, but I have seen some C++ implementation where `std::vector<T>` is an alias to `std::deque<T>`, so...

One thing I don't like about lemire's phrasing is that he only looks at the current, often only most available, implementations and doesn't make this point explicit for most cases.

EDIT: Thankfully he does acknowledge that in a later post [2].

[1] https://timsong-cpp.github.io/cppwp/n4861/strings#string.app...

[2] https://lemire.me/blog/2023/10/23/appending-to-an-stdstring-...


> but I have seen some C++ implementation where `std::vector<T>` is an alias to `std::deque<T>`, so

I have a hard time believing that because std::vector guarantees that the memory is contiguous.


Yeah, it wouldn't be standard-compliant, and I think it was very ancient one---at least as old as STLport I believe.


I am not at all surprised. Kids these days have no idea what CPUs can do. ;)

I periodically have interview candidates work through problems involving binary search, then switch to bounded and ask them how to make it go faster over N elements, where N is < 1e3. The answer is "just linear search, because CPUs really like to do that".


But amortized analysis has nothing to do with what CPUs can do. CS students learn it in an algorithm course, not in a computer architecture course.


This feels like a conversation where it would have been useful for the participants to be very explicit about the points they were trying to convey: the reviewer could have said "Isn't this a quadratic algorithm, because each call to `+=` reallocates `escaped_file_path`?" (or whatever their specific concern was; I may have misunderstood), and the author's initial response could have been "No, because the capacity of the string is doubled when necessary."


Redirecting to a dedicate 404 page is bad SEO practice. There must be a better way to accomplish this.


Almost all the Rails apps I worked on in the past 18 years don't care about SEO because they don't have much that googlebot can browse without an account. Everything important is either behind a login form or served through an API. The API is probably used by a JS frontend, which googlebot browses, and it's up to the frontend to deal with 404 and other errors in a gracious way. Finally, if the API is machine to machine, there are no worries about SEO.


This is the better way. The redirection happens inside the server. Rack Attack runs early in the request processing and says "do the rest of the processing as if the client's request were for a page that does not exist", then the server acts accordingly.


It's also a bad user experience.

If I mistyped a URL, and just need to change one letter, now I can't because I've been redirected.


You haven't been redirected. The redirection happens inside the server, it's never communicated to the client.

Well, almost inside the server. The server first does TLS processing, then this part, then that which the backend developers see as "the server". From a backend developer's perspective, this code runs in the zone between browser and server, where load balancers and such live. Operationally the code runs on in the same rack, probably on the same CPU, as the backend code.


Ah that's cool, I noticed the other interaction yesterday where I was 301 redirected on a 404, I thought it was the same here.


People make all kinds of mistakes.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: