Hacker News new | past | comments | ask | show | jobs | submit | rudedogg's favorites login

I'm the author of ripgrep and its regex engine.

Your claim is true to a first approximation. But greps are line oriented, and that means there are optimizations that can be done that are hard to do in a general regex library. You can read more about that here: https://blog.burntsushi.net/ripgrep/#anatomy-of-a-grep (greps are more than simple CLI wrappers around a regex engine).

If you read my commentary in the ripgrep discussion above, you'll note that it isn't just about the benchmarks themselves being accurate, but the model they represent. Nevertheless, I linked the hypergrep benchmarks not because of Hyperscan, but because they were done by someone who isn't the author of either ripgrep or ugrep.

As for regex benchmarks, you'll want to check out rebar: https://github.com/BurntSushi/rebar

You can see my full thoughts around benchmark design and philosophy if you read the rebar documentation. Be warned though, you'll need some time.

There is a fork of ripgrep with Hyperscan support: https://sr.ht/~pierrenn/ripgrep/

Hyperscan also has some preculiarities on how it reports matches. You won't notice it in basic usage, but it will appear when using something like the -o/--only-matching flag. For example, Hyperscan will report matches of a, b and c for the regex \w+, where as a normal grep will just report a match of abc. (And this makes sense given the design and motivation for Hyperscan.) Hypergrep goes to some pain to paper over this, but IIRC the logic is not fully correct. I'm on mobile, otherwise I would link to the reddit thread where I had a convo about this with the hypergrep author.


‘Fables’ is excellent :D

Others my favourites:

Alan Moore, ‘V for Vendetta’

Alan Moore, ‘League of extraordinary gentlemen’

Alan Moore, ‘Watchmen’

Neil Gaiman, ‘Sandman’(series, but packed into albums)

Frank Miller, ‘Give me liberty’

Mike Mignola, ‘Hellboy’ (series)

Masamune Shirow, ‘Ghost in the shell’

Yukito Kishiro, ‘Battle angel Alita’ (series)


I'm glad that other people are still working on Vulkan documentation materials. Outside of just reading the specification, Overv's vulkan-tutorial.com is probably one of the most read Vulkan implementations in C++, but I have some minor complaints about it. And I hope this author considers making an improved tutorial, not just translating Overv's to Rust.

Vulkan is so painfully verbose in unnecessary ways, forcing you to be explicit about everything, when in many cases, whole swaths of lines of code are unnecessary or the standard could have established sane defaults that match what actually happens in practice in the industry.

Maybe I'm in the minority here, I'm sure I probably am, but I see no reason to declare particular structures in both the standard and in tutorial documentation like vulkan-tutorial.com when they widely aren't going to be of use to anyone until you reach broad adoption or advanced usage.

Most of these tutorials should be helping you get to a triangle as fast as possible, but none of them follow the order of what you're exposed to in the reference material, and they all go on tangents about needing to state application info (not required), creating your own validation layers (this is advanced material, why recreate VK_LAYER_KHRONOS_validation???) or declare what device you're going to use (it's almost always going to be the first exposed device--even systems with discrete GPUs will only declare that they have one GPU available and hide the fact that they have an integrated GPU on the CPU).

It leads to hundreds of lines of code that distract from the high-level goals: initialize the renderer, set your shaders, upload assets to the GPU, and draw them.

I think the industry is still lacking reference-quality documentation outside of the reference specification, which doesn't explicitly help you with implementation details for the most common use cases.

I wouldn't expect it to, but it would have been nice. A lot of modern reference specifications do actually do this now.


Congrats!

This has probably been asked before, but has there been any thoughts about moving community chat to a platform other than Discord? Discord has been brought up many times on HN for its accessibility/privacy/proprietary lock-in concerns that don't seem to be in line with the spirit of open-source. Also see [1].

[1] https://drewdevault.com/2021/12/28/Dont-use-Discord-for-FOSS...


It's a garden-path sentence [1]. Garden-path sentences have ambiguous parses that typically require backtracking to correct earlier misinterpretations.

[1] https://en.wikipedia.org/wiki/Garden-path_sentence


See also "input lag" by dan luu: https://danluu.com/input-lag/

It is not the main thing going on in this twitter post, but it does show a way modern computers feel slower than older machines.


This is an excellent resource and a great read, but DAMN do money markets seem stupid as all get out to me. Where is the productive output of all these arbitrage shell games? How is this more than an abysmal waste of time and resources simply to make a small handful of bankers richer?

Brandon Sanderson has a decent quote from Secret Project #1, that makes me think a bit.

> That is one of the great mistakes people make: assuming that someone who does menial work does not like thinking. Physical labor is great for the mind, as it leaves all kinds of time to consider the world. Other work, like accounting or scribing, demands little of the body—but siphons energy from the mind. If you wish to become a storyteller, here is a hint: sell your labor, but not your mind. Give me ten hours a day scrubbing a deck, and oh the stories I could imagine. Give me ten hours adding sums, and all you’ll have me imagining at the end is a warm bed and a thought-free evening.


What kind of sucks about tech is you start realizing people mainly just do the same stuff they did 25 years ago with it. Only today, the hardware has to be 1000x as powerful to run all the shitty bloaty software that's still just serving email, rendering spreadsheets, chat, images, videos, news, online shopping, etc, as its ever been. It's like we've been in this arms race for sexier and more resource demanding window dressing, versus something that I actually couldn't do before, and there has been absolutely no letting up.

Cool radix trie trick:

Set a fixed-size binary radix trie of 32-bit IP addresses, say 1000 entries. Track the nodes of the trie in a list, LRU order; insert an IP, its node goes to the top of the list.

When you exhaust the available nodes, reclaim from the bottom of the LRU list --- but first, find either a sibling for the node already in the trie, or a parent, or a sibling of what that parent would be, and "merge" the IP address you're losing.

(So in reclaiming 10.0.0.1/32, merge with 10.0.0.0/32 to make 10.0.0.0/31, etc).

Over time, "important" /32s --- really, important prefixes, period, not just /32s --- will "defend" their position towards the top of the LRU, while the noise will get aggregated up, into /16s, /15s, /4s, whatever.

What you're doing here is inferring prefix lengths (netmasks), which is kind of magical.

You can do the same thing with memory addresses in a debugger to infer (course-grained, but without much effort) data structures and allocation patterns. There are probably other integers you can do this with that nobody's thought of.

(The data structure is called Aguri).


I think the parsing debate is just one example of a spectrum that I've noticed a lot of developers (or indeed people in general) lie along; at one end are those who love complexity and abstraction and generality as well as indulge heavily in theory, while at the other end are those who want simple practical solutions even if they might be considered "hacks" by those at the opposite end. The former are obviously those advocating for parser generators and all of the theory behind it, while the latter are going straight for handwritten RD.

Is it from dated undergrad + research material?

If one thinks that creating research material is the goal, then that's what happens in abundance.

Lack of left recursion is a trivial limitation in practice.

The same goes for context-sensitivity.


I'd add one more tip that I think far more software developers should do: Add unit-level fuzz testing throughout your projects. Fuzzy bois are like assertions on steroids.

With large projects you often get modules which have an API boundary, complex internals and clear rules for what correct / incorrect look like. For example, data structures or some complex algorithms. (A-star, or whatever).

Every time I have a system like this, I'm now in the habit of writing 3 pieces of code:

1. A function that checks the internal invariants are true. Eg, in a Vec, the allocated length should be >= the current length. In a sorted tree, if you iterate through the items, they're sorted. And children are always >= the internal nodes (or whatever the rules are for your tree). During development, I wrap my state mutators in check() calls. This means I know instantly if one of my mutating functions has broken something. (This is a godsend for debugging.)

2. A function which randomly exercises the code, in a loop. Eg, if you're writing a hash table, write a function which creates a hash table and randomly inserts and deletes items in a loop for awhile. If you've implemented a search algorithm, generate random data and run searches on it. Most complex algorithms and data structures have simple ways to tell if the return value of a query is correct. So check everything. For example, a sorted tree should contain the same items in the same order as a sorted list. Its just faster. So if you're writing a sorted tree, have your randomizer also maintain a sorted list and then periodically check that the sorted list contains the same items in the same order as your tree. If you're writing A-star, check that an inefficient flood fill search returns the same result. Your randomizer should always be explicitly seeded so when it finds problems you can easily and deterministically reproduce them.

3. A test which calls the randomizer over and over again, and checks all the invariants are correct. When this can run overnight with optimizations enabled, your code is probably ok. There's a bunch of delicate performance balances to strike here - its easy to spend too much CPU time checking your invariants. If you do that, you won't find rare bugs because your test won't run enough times. I often end up with something like this:

    loop (ideally on all cores) {
        generate random seed
        initialize a new Foo
        for i in 0..100 {
            randomly make foo more complicated
            (at first check invariants here)
        }
        (then later move invariants here)
    }
Every piece of a large program should be tested like this. And if you can, test your whole program like this too. (Doable for most libraries, databases, compilers, etc. This is much harder for graphics engines or UI code.)

I've been doing this for years and I can't remember a single time I set something like this up and didn't find bugs. I'm constantly humbled by how effective fuzzy bois are.

This sounds complex, but code like this will usually be much smaller and easier to maintain than a thorough unit testing suite.

Here's an example from a rope (complex string) library I maintain. The library lets you insert or delete characters in a string at arbitrary locations. The randomizer loop is here[1]. I make Rope and a String, then in a loop make random changes and then call check() to make sure the contents match and all the internal invariants hold.

[1] https://github.com/josephg/jumprope-rs/blob/ae2a3f3c2bc7fc1f...

When I first ran this test, it found a handful of bugs in my code. I also ran this same code on a few rust rope libraries in cargo. About half of them fail this test.


Rates of Spontaneous Mutation (Drake et al., 1998)[1] says that a human's genome accumulates around ~64 new mutations per generation through meiosis alone. Those mutations are propagated into every cell of the newborn baby. Over a lifespan, all cells in the body will mutate at a rate proportional to their division rate, leading to thousands upon thousands[2] of uncorrected mutations per individual, among the trillions which are ultimately corrected by DNA polymerase[3].

[1] https://doi.org/10.1093/genetics/148.4.1667

[2] https://doi.org/10.1038%2Fs41586-022-04618-z

[3] https://doi.org/10.1126/science.aaf9011


>Both teams on both sides wanted this to come together

>It takes apple to make a move for us to break through communication issues and get anything done

Imagine paying millions of dollars for "the best" developers, designers, managers, yet they can't even communicate with each other properly to align with common interests. FAANG culture is becoming so slow and ineffective that I'm not surprised they are cutting tens of thousands of employees.

Where I'm from we have an expression - "many grandmas, lazy/spoiled child" - and it's especially rings true in creative work.

For something truly brilliant to be created, you need someone with a vision and freedom to implement it. It's extremely rare for something brilliant to come from committee or top-down designed by managers delegating work to a bunch of teams. It's one of the reasons even open-source struggles with design - you need one or a tiny team of aligned brilliant people to work together, and you need to give them freedom to do it. Not constrain them with meetings, micromanaging the product and letting everyone express their opinion. That's how you get a terrible, bland, uninspiring design.

Recently I had a chance to work at a theatre production that ended up suffering from the same issue - the director had no vision but only an idea, so he delegated different work to everyone, then micromanaged people and intersected into every attempt at collaboration with his own opinions and ideas. As new people were coming in, their ideas were added into the mix, creating a show that ended up being even worse than mediocre.


Intimidation is what these copyright monopolists are all about.

> We knew recording was useful, but the app’s ability to apply audio effects anywhere on the Mac carried much less legal peril.

Look at this. Innocent software developers, intimidated by these nebulous "legal perils" of copyright infringement to the point they felt afraid of marketing a perfectly reasonable and legal feature.

I can't possibly be the only person deeply offended by this. We don't need their blessing to record anything. We should be free from their tyranny but it binds us to this day.


Titus Winters leading the google abseil library eventually came to the conclusion that the only sane way to manage a large scale C++ system is to "live at head" [1] -- that is, libraries should live at the head production version of their dependencies.

This is patchworked around in more easygoing languages with dependency management systems, docker containers, etc. etc. but if you can enforce living at head from the start it makes everyone's life easier.

https://abseil.io/about/philosophy#we-recommend-that-you-cho...


I handle it by collecting quotes that tell me to knock it off. I've since started to focus on just the things I really care about:

    The purpose of knowledge is action, not knowledge.
    ― Aristotle
    
    Knowledge isn't free. You have to pay attention  
    ― Richard Feynman
    
    "Information is not truth"  
    ― Yuval Noah Harari  
    
    If I were the plaything of every thought, I would be a fool, not a wise man. 
    ― Rumi
    
    Dhamma is in your mind, not in the forest. You don't have to go and look anywhere else.
    ― Ajahn Chah
     
    Man has set for himself the goal of conquering the world, 
    but in the process he loses his soul.
    ― Alexander Solzhenitsyn
    
    The wise man knows the Self,  
    And he plays the game of life.  
    But the fool lives in the world  
    Like a beast of burden.  
    ― Ashtavakra Gita (4―1)

    We must be true inside, true to ourselves, 
    before we can know a truth that is outside us.   
    ― Thomas Merton

    Saying yes frequently is an additive strategy. Saying no is a subtractive strategy. Keep saying no to a lot of things - the negative and unimportant ones - and once in awhile, you will be left with an idea which is so compelling that it would be a screaming no-brainer 'yes'.
    - unknown

> 30,000 times more initial energy than any recorded events

As a statistician: this is why heavy-tailed modeling matters. Schools teach the finite-variance version of the Central Limit Theorem and students believe they live in a normal world. The real Central Limit Theorem [1] predicts large rare events following Lévy α-stable distributions.

[1] https://en.wikipedia.org/wiki/Stable_distribution#A_generali...


I'm really impressed with the readme file in this repository. Its a master class in effective documentation. Just look at the structure:

- 1-2 sentence summary of the project

- History (why we built this and didn't just use existing tools)

- What we've built, including what it does and how it compares to other tools (rsync in this case).

- How to install and use it

And the whole thing is full of images and animations showing the tool in action, and explaining enough of its internals.

I consider myself good at documentation, but I'm taking notes. This is excellent.


Been building in HTMX for the last month and I’m loving the new paradigm. This vid did a good job of explaining the shift. https://youtu.be/LRrrxQXWdhI

(Been using preact/react since 2016, angular/jquery before that)


"Data driven" is something I've seen occasionally from execs as an excuse to not have - and particularly, not commit to - a plan.

A lot of times it's much easier to find data to give you reasons not to do something long-term (very similar to investors focused on short-term results) than to confirm the value of committing significant resources to long-term bets.

Other than "keep milking ads" is there any clear strategy Google's consumer business has shown? Android looks like a good example of executing on that ads strategy, but other things that seem harder to tie to ads directly tend to languish since nobody there actually knows what they want to do in other areas: chat apps? Stadia? Random things like Cardboard? Consumer G-suite? I'd say even assistant/home stuff is languishing (where's the big money maker).


I use these settings to clear all visual clutter. Basically looks like Librewolf then. For privacy related settings I check Arkenfox user.js and add the ones I find useful and don't cause breakage manually (but going thoroughly through settings already does most of it).

Remove FF sync:

identity.fxaccounts.enabled

Remove recommendations in Extensions:

extensions.htmlaboutaddons.recommendations.enabled

Remove recommendations Side panel in Extensions:

extensions.getAddons.showPane (add +set to false)

Remove VPN Promo and More from Mozilla in Settings:

browser.vpn_promo.enabled

browser.preferences.moreFromMozilla

Remove Pocket:

extensions.pocket.enabled

Remove Focus promo in private tabs:

browser.promo.focus.enabled

Remove persistent topsites (facebook, amazon, etc.):

browser.newtabpage.activity-stream.default.sites (clear)

Bonus:

Pinch to zoom only:

mousewheel.with_control.action (1)

Full screen video like Safari:

full-screen-api.macos-native-full-screen

Calculator in tab bar:

browser.urlbar.suggest.calculator


The #1 rule of optimization is that it's only 10% or 1% of the code which matters. You only need to optimize the code which runs 1000 times a second, the code which runs 1 time every 10 seconds in a background thread can be inefficient and nobody will notice or care. You don't need to rewrite your entire app to make it run substantially faster, you only need to rewrite the hot-paths.

Another key rule is to limit the "main loop" to as little as possible: don't run big computations which don't affect what the user is focusing on. For example, web browsers will pause and unload tabs when you have several of them: this means you can have over 100 open tabs and your web browser will still run fast, because the unopened tabs are not doing anything, they are just caching the URL and (in some cases) whatever is rendered on the site. Similarly, most games try not to render or update things outside of the player's vision. You don't have to go to extremes like this, but if something doesn't affect the user's flow (e.g. an extra feature only a few people use), its performance impact should be negligible. Specifically: don't put code in your hot-path that doesn't need to be there.

Another key rule is to use well-written libraries for your algorithms. I assure you a vector-math library with 1000 stars implements vector-math operations much faster than you can, and those operations are neatly wrapped in easy-to-use functions.

When you have code in your hot-path which needs to be there, is too slow and you can't replace it with a library, then you bring out the big-O and zero-allocation techniques. And also, caching. A lot of optimization is ultimately caching, as computation is generally much more expensive than memory. Those super pretty render engines all use a ton of caching.

You can have a piece of software with tons of bloat, extra features, inefficient/redundant computations, and an Electron back-end, which is still fast (example: VSCode). Even the Linux kernel has a lot of bloat in the form of various drivers, but it does not affect the main runtime as these drivers are not loaded. Even graphically-intensive games and simulations have redundant computations and excess allocations, they are just not in the hot-path.

And last tip: don't write exponential-time algorithms. The above doesn't apply when you have an exponential-time algorithm, because even when n = 30 it will slow your computer, and when n = 75 it will run past the sun burns out.


Since TeX and LLVM were already mentioned, I will casually mention the comments in Laravel's codebase which are all three lines long with each line exactly three characters shorter than the previous line. The art of computer programming at work.

Example (config file, but the entire codebase is like that): https://imgur.com/UnIUZmZ


Temporal (runtime) safety could actually be provided via tagged index handles (where the tag is a generation count). But this should be implemented by libraries, not built into the language (like: https://github.com/michal-z/zig-gamedev/tree/main/libs/zpool)

(it's of course not a "general solution", but I wonder if something like 'tagged pointers' could be that, even without hardware support, and at some runtime cost)


Agreed with most of this comment, except:

> People think imperatively most of the time.

Most of the time people think declaratively. Heck, people dream, wish, and plan declaratively. They wouldn't even know how they got to their thoughts and dreams. They're just there instantly.


Nope! Both are fully native macOS apps, at least on the version of Monterey that I'm running.

You can examine which frameworks a binary links by running `otool -l /System/Applications/Photos.app/Contents/MacOS/Photos`. Traditional macOS Cocoa apps will link AppKit, while Catalyst apps will link UIKit.

For example, if you run that command with a Catalyst app (e.g., Messages or Maps), you'll see that those apps link against `/System/iOSSupport/System/Library/Frameworks/UIKit.framework/Versions/A/UIKit`, and AppKit is nowhere to be found.


Because new technology is as Thiel often quips limited to the world of bits rather than the world of atoms. Paul Krugman once asked, if you go into an average house right now and you take out all the screens, could you tell that you're not in the 80s?

Gordon in the Rise and Fall of American Growth gives a similar example, what if you went into a time capsule between say 1890 and 1950 compared to 1960 and 2010? In one case you're going to see skyscrapers, commercial airplanes, nuclear power plants, electricity everywhere, cars going at amazing speeds. In the latter case what's the difference, people paying with their phones and different fashion mostly.

'Innovation' in the internet age, say the last 30 years has mostly been limited to enable hedonistic digital consumption with very little impact on how we fundamentally move through the world. The difference between a car right now and a car 30 years ago is that you can now play angry birds on a tablet. A 100 years ago to 50 years ago meant going from horse carriages to trains and from weeks on a ship to hours on a plane. Today the average person crosses the Atlantic no faster than we did decades ago.

That's why productivity growth is low, the world hasn't changed that much. There's still marginal improvements obviously which do add up over time but the 'unprecedent pace of innovation' you hear about from tech evangelists is nowhere.

Another interesting thought experiment is, how many digital services, modern tech and so on would you be willing to trade for something mundane, say your dishwasher, a hot shower, the toilet, a car, soap, if you could only have one or the other? I think it really puts into perspective how much or rather little value those 'innovations' add.


Every system is corrupted by those willing to leverage the most from it.

It's all well and good to have a library with fast sin/cos/tan functions, but always remember to cheat if you can.

For instance, in a game, it's common to have a gun/spell or whatever that shoots enemies in a cone shape. Like a shotgun or burning hands. One way to code this is to calculate the angle between where you're pointing and the enemies, (using arccos) and if that angle is small enough, apply the damage or whatever.

A better way to do it to take the vector where the shotgun is pointing, and the vector to a candidate enemy is, and take the dot product of those two. Pre-compute the cosine of the angle of effect, and compare the dot product to that -- if the dot product is higher, it's a hit, if it's lower, it's a miss. (you can get away with very rough normalization in a game, for instance using the rqsrt instruction in sse2) You've taken a potentially slow arccos operation and turned it into a fast handful of basic float operations.

Or for instance, you're moving something in a circle over time. You might have something like

    float angle = 0
    for (...)
      angle += 0.01
      <something with sin(angle) cos(angle)>
Instead, you might do:

    const float sin_dx = sin(0.01)
    const float cos_dx = cos(0.01)
    float sin_angle = 0
    float cos_angle = 1
    for (...)
      const float new_sin_angle = sin_angle * cos_dx + cos_angle * sin_dx
      cos_angle = cos_angle * cos_dx - sin_angle * sin_dx
      sin_angle = new_sin_angle
And you've replaced a sin/cos pair with 6 elementary float operations.

And there's the ever popular comparing squares of distances instead of comparing distances, saving yourself a square root.

In general, inner loops should never have a transcendental or exact square root in them. If you think you need it, there's almost always a way to hoist from an inner loop out into an outer loop.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: