Hacker News new | past | comments | ask | show | jobs | submit | salted-fry's comments login

I use gron a lot, because I can never remember how to use jq to do anything fancy but can usually make awk work. (I may be unusual in that department, in that I actually like awk)

One warning to note is that gron burns RAM. I've killed 32GB servers working with 15MB JSON files. (I think gron -u is even worse, but my memory is a bit fuzzy here).

https://github.com/adamritter/fastgron as an alternative has been pretty good to me in terms of performance, I think both in speed and RAM usage.


Thanks for mentioning my project (fastgron).

It reads the file into memory once, then just goes through it only once, so it shouldn't need much more memory than the file size.

Also I put a lot of work into making fastgron -u fast, but you can grep the file directly as well.


Thank you for making fastgron! I use it daily and also have it aliased to 'gron' in my shell so I don't accidentally forget to use it.


Thanks, it would be great to make it official gron 2.0, I tried to achieve full functionality parity with the original. Also a serious buffer overflow bug was just fixed, so make sure to upgrade to 0.7.

I'm thinking of doing some marketing (for example a blog entry just to show what was the main learnings in I/O and memory management in order to achieve this speed).


Maybe I'm misunderstanding, but why does it need to read the file into memory at all? Can't it just parse directly as the data streams in? It should be possible to gron-ify a JSON file that is far bigger than the memory available - the only part that needs to stay in memory is the key you are currently working on.


> One warning to note is that gron burns RAM. I've killed 32GB servers working with 15MB JSON files.

That sounds seriously like there is something wrong with the tool


I think it's because Go's encoding/json package doesn't support incremental parsing.


It does - I patched that into my local fork of `gron` two years ago.


Just done a test with my 800MB stress test file.

`jq`: 1m26s 21G resident

`mygron -e --no-sort`: 18m14s 19M resident

`gron --no-sort`: 1m51s OOM killed at 54G resident


Can you try https://github.com/adamritter/fastgron as a comparision?


`fastgron`: 8.5s 2.2G resident

edit: Interestingly whilst doing this test, I piped the output into `fastgron -u` (39.5G resident) and `jq` rejected that. Will have to investigate further but it's a bit of a flaw if it can't rehydrate its own output into valid JSON.


I released fastgron v0.7.5 which contains fixed in string escaping. Could you please take another look?


Already commented on the github issue but will have another look, yep.


Sure, I saw it, thanks!

I fixed the semicolon bug, but of course correctness is more important.


Update for anyone following - 0.7.6 recreates my 800MB input JSON correctly after trip through `fastgron | fastgron -u` which is good work.


Thanks, it's a clear bug. I created a new issue for it: https://github.com/adamritter/fastgron/issues/19


> `gron --no-sort`: 1m51s OOM killed at 54G resident

Oh dear


If I remember correctly, it took a 128GB AWS EC2 to parse that file without OOMing. Go is not that efficient at deep multi-level size- and type-unknown data structures.


Thanks for the follow up. Is your fork public?



I think https://pkg.go.dev/encoding/json#Decoder do support steaming at least. Here is gojq's stream mode https://github.com/itchyny/gojq/blob/main/cli/stream.go


Is there an issue on this?


I have a version of `gron` which uses almost no RAM to parse files (uses the streaming JSON parser rather than loading the file.) Processed a 4GB JSON file on a Pi using it (admittedly, it took forever) taking, IIRC, about 64MB RAM tops.

`gron -u` is basically impossible to optimise unless you know the input is in "sorted" order (ie the order it comes out of `gron`, including the `json.a = {};` bits) in which case my code can handle that in almost no RAM also. But if it's not sorted or you're missing the `json.a = {};` lines, there's not a lot you can do since you have to hold the whole data structure in RAM.


> you have to hold the whole data structure in RAM

Sure, but something is seriously wrong if a 15 MB JSON data structure uses more than 32 GB of RAM.


That 15MB JSON expands when piped through `gron` - my 7MB pathological test file is 143MB and 2M lines after going through `gron` (which is lines like `json[0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][1][1][0][0] = "x";`)

Which is 20 levels of unknown-sized and unknown-typed slices of slices of `any` in Go and that is not super-efficient, alas. It gets worse when you have maps of slices of maps etc. `fastgron` gets around this by being able to manage its own memory.

(`gron` can, however, reconstruct the output correctly if you shuffle the input. `fastgron` cannot. Which suggests to me it's maybe using the same 'output as we go' trick that my `gron` fork uses for its "input is sorted" mode which uses almost no RAM but cannot deal with disordered input.)

(`gron` could/should maybe indicate the maximum size of the slices and if they're a single type which would make things more efficient and I might add that to my fork.)


What could possibly be so memory-intensive about gron? I suppose it could make sense for ungronning, but not in the forward direction.


It buffers all of its output statements in memory before writing to stdout:

https://github.com/tomnomnom/gron/blob/master/main.go#L204


Here's the fastgron printer if you want to compare:

https://github.com/adamritter/fastgron/blob/main/src/print_g...


Why?


It appears to be so it can sort the lines. Not sure how useful that is however.


An ironic near-miss on the UNIX philosophy. There's a great UNIX tool that will handle sorting arbitrarily large files!


It will mess up array indices, though.


Wouldn’t “sort -n” work with indices?


It's tricky to specify the sorting criterium: you have to indicate the column. Gron's output looks like this:

a.b[0].c.d[0]: ... a.b[0].e[0].f: ...


It shouldn't need to buffer the output to do that, right?


Correct.


I remembered where some of my old files are and re-tested; forward-gron was "only" about 7GB for the 15MB file. gron -u was the real killer, clocking in around 53GB.


Yeah that's a bug, no amount of buffering can justify that amount of memory.

And gron -u in theory should use less memory than gron-ifying a JSON, as you just have to fill a data structure in memory as you go.


> as you just have to fill a data structure in memory as you go

You don't know the size, shape, or type of any of the levels in the data structure until you get to a line specifying one part of it. If you did, yep, it would be trivial!


But you do: if the first line is `users[15].name.family_name = "Foo"`, all you need to know is there: there is an array of users, each is a map containing a field called name, which is a map with a field called family_name.

If users[14] is a string, or there are 1500 users, the amount of memory usage to ungrok that line is exactly the same. Prove me wrong, I can't think of any way it would not be trivial, provided one uses the correct datastructures.


> there is an array of users

How big is it? All you know at this point is that it's at least 16 entries long. If the next line starts with `users[150]`, now it's 151 entries. Next line might make it 2000 entries long. You have no idea until you see the line.

> each is a map

But a map of what? `string -> object`? Ok, the next line is `users[15].flange[15]` which means your map is now `string -> (object|array)`.

Then the next line is `users[15].age = 15` and you've got `string -> (object|array|int)`. Each line can change what you've got and in Go this isn't a trivial thing to handle without resorting to `interface{}` (or `any`) all over the show and reflection to handle the management of the data structures.

> Prove me wrong, I can't think of any way it would not be trivial, provided one uses the correct datastructures.

All I can suggest is that you try to build `ungron` in Go and have it correctly handle disordered input. If you find a better way of doing it, I'd be happy to hear about it because I spent several months fighting Go in 2021-22 trying to optimise this without success.


What happens when you ignore software engineering.


Deeply nested structures would get expanded a lot


I've been using an incredibly stupid bash script to do this; you've finally given me the push to publish it here: https://github.com/krsiehl/hn2mdir

Run mkdir -p /path/to/some/directory/{cur,new,tmp}, then ./hn2mdir.bash /path/to/some/directory/, and it'll crawl Algolia's HN API to dump a bunch of emails, one for each post/comment. You can read it with mutt -f /path/to/some/directory. Syncing with IMAP left as an exercise to the reader (I'm using mbsync).

Note that it gets large, fast, and may break your IMAP server; I periodically run find ~/Mail/HN -type f -mtime +30 -delete to clear it out.

Edit: should clarify, this is read-only, I've never bothered to set up any kind of response functionality


Wow, this is brilliant!


Perhaps you were using a different version, but I just tried and ChatGPT didn't seem to have any ethical issues with the question (although it was cagey about giving any definite answer):

https://i.imgur.com/5aIjtMz.png


Thank you for posting a link to an image instead of polluting the future training data of GPT-4 with the output of GPT-3 :)

I wish more people would do this. I'm getting pretty sick of the walls of text.


That pollution is inevitable, why delay it? It's a technical problem they should be able to solve, and if they can't, then they're revealing the weakness of their methods and the shortcomings of their so-called AI.

It's absolutely ridiculous to expect the entire internet to adopt some kind of hygiene practices when it comes to text from GPT tools simply for the sake of making the training process slightly easier for a company that certainly should have the resources to solve the problem on their own.

If that's why you're using images instead of text you're fighting such a losing battle that it boggles my mind. Why even think about it?!


No, that's just a bonus. I just personally find the walls of texts in HN comments to be necessary.

I saw someone on here refer to it as "listening to someone describe their dreams." I pretty much agree with that.


On Linux/macos, you can enable terminal output by setting PRINT_MODE:TEXT in init.txt


Hey, I know this issue! I ran into it in CK3 when it launched. You can also work around it by running chmod go-rx /dev/input/ while playing your game. Whether this is more or less invasive than binary-patching the game is up for debate.


In a similar vein, some time ago I tried to search for how many unicode code points there are with "How big is Unicode?" (https://www.google.com/search?q=how+big+is+unicode)

Google helpfully responds "16 bits", which is pulled from the History section of Wikipedia and hasn't been accurate in something like twenty-five years.

Edit: Should have listened to people saying to screenshot your queries. Google still quotes the paragraph in question, and bolds "16 bits", but no longer puts it in a big bold heading like it's the single answer to your question.

Double Edit: except in chrome, where I do still get the old page. Here's a screenshot for posterity, after Google somehow fixes this: https://i.imgur.com/7Ng6DyK.png


I get a different result "Unicode uses between 8 and 32 bits per character" https://imgur.com/a/hxmrMz3


Its like "rick and morty season 5" returns

    Number of episodes: 8
    No. of episodes: 6
garbage in garbage out


UTF32 is the way to go for internal storage, until you pack it back down to UTF-8 to store externally.


I do use Lyft to get to work most days (or did, in the Before Times). I just checked a few hopefully-representative months (June/July/August 2019) and it looks like I was spending about $400/month. Google tells me that the average TCO of a car is about $700/month, so this would seem like a net win. That average might not be representative of my needs though - it's likely being dragged up by people driving around giant SUVs and such, so take this comparison with a grain of salt.


I like the idea of a split keyboard, but I've never been able to go down that road because I sometimes hit keys with the "wrong" finger - most notably, I need to be able to hit B with my right hand because it's down-left in roguelikes (i.e. Nethack).

What I'd love to see is a "106% keyboard", where a couple columns are duplicated on both the left/right side. Does anybody make such a keyboard?


The idea isn't entirely new. The TGR Alice is a popular board that has two b keys [0].

The more straightforward approach to get a full extra column would be to just grab a keyboard that already has 7+ columns per side (ie the chimera[1]) and repurpose those to be duplicate keys.

[0] https://i.redd.it/9kyeyht1eqy11.jpg

[1] https://github.com/GlenPickle/Chimera


I have made for myself something alike:

https://imgur.com/a/By9YN2q

Is made on wood, to being on the style of MS Ergo but not curved (yet I think it feel nice as is).


How/where did you get the custom key labels? Dye sublimation?


A little more information I posted back in the day:

https://www.reddit.com/r/MechanicalKeyboards/comments/9wjpg0...


I used to hit some keys with the 'wrong finger' before. But it hasn't prevented me from adopting to the Kinesis Advantage.

For Nethack and games in general, I have a cheap 'normal' keyboard.


.. or you could just switch and fix your bad behaviors.

I would say that as a daily Advantage user for right about 20 years, it's not a keyboard to play games with and it's not a keyboard to use for a very heavy kb+mouse software situation (like cad or photoshop). In those situations you often keep your dominant hand on your mouse at all times and your non-dominant glued to the keyboard. For those situations I have a fairly standard 65% board on my desk. But for coding, emails, etc. That all happens on the advantage.


> What I'd love to see is a "106% keyboard", where a couple columns are duplicated on both the left/right side. Does anybody make such a keyboard?

I've seen some where the '6' is present on both the left and right-hand side of a split keyboard but '6' is really the one and only key on which there can be a disagreement as to where is the correct placement.

On non-staggered keyboard the '6' is, of course, on the right hand side of the keyboard but on staggered split keyboard it is, very often, on the left hand side.

Most split keyboards in that gallery that do have a numbers row (ie 60% of more, not 40%) do have the '6' correctly located on the right hand side.

Yet most (not all) split-staggered keyboard have the '6' located on the left hand side of the keyboard.

People who learned to touch-type using the "6 with left hand" school have a very hard time adapting to an ortholinear split keyboard. While those who learned to touch-type using the "6 with the right hand" have a much easier time adapting to an ortholinear split keyboard.


Strange. I type with 10 fingers since 20 years or so (self-learned with some programs back then) and actually my only real problem comes from the number row since it just doesn't come to me intuitively, like it's simply wrong (and yes, i don't type numbers really often - that's also why I wouldn't agree with your reasoning that the 6 is responsible for a hard time adapting, since the 6 will simply not be typed very often for non-accountants. If you had said C/V/B, I'd agree). Only now that I've seen the Atreus62 I've come to believe that the number row (and C,V,B) are simply wrong on a standard keyboard. They simply don't work as they should be. At least for me personally.


I have two split keyboard that have "6" on left and on right. It's confusing but not a serious problem after I've get used to. I still mistype the empty space instead of "6" key, but what should I do is just type it on other hand.


What you need, and I’m pretty sure this is a serious suggestion, is a pair of identical 60% keyboards. It might just work wonderfully.


I wish this worked. I tried it. The problem is pressing shift only modifies the keyboard that it is pressed on.


If you install Karabiner Elements[1] on macOS, all modifier keys suddenly work across all keyboards.

I'm using one "TKL" Apple USB keyboard per hand when I feel like opening my shoulders a bit. Took me all of two minutes to get used to, at a fraction of the cost for enthusiast keyboards. I wonder if there are any ergonomic advantages I'm missing out on.

(Karabiner Elements is a great tool, anyway; I've been using it for a long time to map Caps Lock to something useful for programming.)

[1] https://karabiner-elements.pqrs.org


Thanks for posting this! I had a split keyboard and typing on it felt so much better if not for the non standard stagger.

I now feel less guilty for having two 60% mechanical keyboards. :)


I think that's the case on Mac OS and Windows, but on Linux you can press Shift/Ctrl on one keyboard, and a letter on another.

Alt doesn't seem to work though.


problem is you'd then need to place them quite far apart from each other (while many of the "fixed" split keyboard have each half only slightly tilted) but... From an USB point of view I think this just works?


Fwiw I typed “wrong” when I got my first ergo keyboard. Like very wrong. About 20% of my key presses were with the wrong finger.

For about a week I was smashing the blank space between the halves but then it just suddenly clicked and I’ve been fine since.


The answer to that is to make your own and hand-solder wire to the switches.


The thought has occurred to me - and I wouldn't mind a thumb-trackball on the right hand, either. Design in the physical world is well outside my wheelhouse though - is a keyboard an approachable project for somebody who hasn't done woodworking/machining before, or would I be better served working on smaller projects first?


Totally doable as a first project, I'm midway through building my first keyboard with a large number of modifications and it's been a really fun project. The Dactyl Manuform is a great starting point https://github.com/abstracthat/dactyl-manuform

And /r/ErgoMechKeyboards on Reddit is a great community for understanding what you're trying to achieve.


Personally, I can hardly think of a better first project. The electronics are dead-simple and the hardware is so much up to personal preference that you can just tinker until you are happy.

(I built my first keyboard about 7 years ago as my first foray into 3D modeling and printing. It went well enough that I used it as my main driver until I got a Keyboardio Model 1 years later.)


I 3D printed my keyboard. Sometimes I talk about it on HN and have prepared a fairly detailed email about it because people have follow-up questions that require diagrams. I'm too lazy to put it on my blog, but if you want the details just email me and I can probably point you in the right direction.

Building a keyboard is tedious, but not particularly difficult.


I used Keyboard Layout Editor to generate the files to send to a laser cutting service.


there are 'trackdyl's out there with the trackball.. if you look in the photo gallery 'oddball' and 'beast' are probably the best known...


You can also build a PCB (or two) pretty easily. That’s what I did when I built my keyboard. You can find the source here: https://github.com/ecopoesis/nek-type-a


On my keyboard, B is exactly equidistant between F and J so I couldn't even tell you which finger is correct for it. I wonder how it was decided which side it should go on in a split keyboard?


> On my keyboard, B is exactly equidistant between F and J so I couldn't even tell you which finger is correct for it.

It's not about distance but about logic. If the right index finger does y,h,n and u,j,m then the left index finger does r,f,v and t,g,b. Otherwise your left index only does five letters while your right index does 7.

Depending on how "badly" a staggered keyboard is staggered, some keys can be closer to one hand or another, but it doesn't change what the correct way to touch-type is.

If in doubt, look at where the keys are placed on ergonomic split ortholinear keyboard: the people designing these things tend to know what they're doing.


It's not at all clear that they "know what they're doing" in some way that makes what they're doing "correct."

If they and a majority of their customers learned to type Z with the left little finger, X with the ring finger, etc., that does not magically make that a better way to type than typing Z with the ring finger and X with the middle one.

Quite the contrary: If you bring your hands together naturally in front of you, they form an inverted V. In order to type the bottom row with the little finger on Z, you have to cock your wrist significantly, which is clearly worse from an ergonomics perspective.

If you bring your hands together like hands naturally come together, on the type of staggered keyboard virtually everyone learns to type on, an ortholinear one with be entirely wrong for you on the whole bottom row.

Designing around bad training may be a type of "knowing what they're doing," but it doesn't make it "correct," or even better.


> Otherwise your left index only does five letters while your right index does 7.

Which, if you're right-handed, makes some amount of sense.


I hit air for the first week with a let's split, but it chilled out. I'm probably about as fast on a macbook keyboard two+ years on even though I use it way less.


So, this is one of those things that would likely be very painful for a short period of time as you adjust to typing on a split keyboard. I was the same way when I typed on a normal stagggered layout, but being completely unable to do that, brain relatively quickly adjusted to that change.


You can unlearn the habit with time. Also, I've played roguelikes (such as CDDA) using dvorak and no numpad. Even with the movement keys spread out, it's not bad. Helps that they're turn-based games. I remap all the keys in Minetest and Xonotic.


You get used to that pretty quickly.


For precedent on this, see the case Anderson v. Stallone, in which Timothy Anderson sued Stallone/MGM for allegedly ripping off his fan script for Rocky 4. Courts ruled that his fan script, as a derivative work of Rocky, had no copyright protection, and so MGM was free to rip it off if they wanted to.

I happen to disagree, in that I think the law should say that derivative works are co-owned by the owners of the original work and the creator of the derivative; but that does not seem to be what the law currently says.


Unless the derivative work was created with permission.


You're right - the case I'm quoting is specifically about unauthorized derivative works, which is a pretty important distinction, especially in this context (as presumably the colorizations of Garfield were authorized)


Right, and since the online Garfields have unique colours, and future licensees can't just use them. They'd need to put in the effort to re-colourize, or pay the site for their colourized versions too.


Like a few others here, I use an RSS to email converter, although I'm using a custom-written one. The main difference from rss2email is that I'm not actually doing any SMTP - I'm just dumping files into a Maildir and letting isync do the uploading/downloading. The actual reading then happens mostly with Mutt (which also just interacts with Maildir).

Like some others have noted, using email as a storage mechanism reduces part of the problem (tracking which items are read/unread) to one that's already solved (by IMAP). Additionally, using isync lets me have local copies of everything; this used to be really important when I was a "poor" grad student, because I could do cool stuff like download a bunch of comics ahead of time on my laptop, then read webcomics/mailing-lists on the 2-hour bus ride. I still like having local copies of things on principle, although nowadays everybody is always-connected so it's not as useful.


This makes a lot of sense, because RSS is quite e-mail like anyway in so many ways: the feeds look like an inbox, where items are marked read and such.


Also, a lot of sites use email where they could (should?) be providing RSS, for updates and news stuff. So to see those in the same place as RSS feeds, either you use a common client for both, or convert one into the other.

Feedbin seems to permit receiving email newsletters as a feed, so that's one of the few making email into RSS, whereas the others all turn RSS into email.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: