Great job. The title was misleading to me because I thought I'd learn something about Go but this basically boils down to these performance improvements
- store results in variables and don't call the expensive method over and over
- use batch queries for fetching multiple documents at once from your storage
Keep in mind that, in a field which is approximately doubling every year, about half the people in the field have less than one year of experience. Sure, advice like "avoid hitting the network when you don't have to", "avoid N+1 queries", "hash tables are more convenient to search for items than unsorted arrays", etc are pretty basic for many of us, but there was a day not so many years ago when they were news to us, too, so let's warmly welcome the more junior members of the community and assist them as we were assisted.
This extends to being appropriately thankful when people write guides to avoiding problems which hit someone in month 4 of their job. Did you write that guide? Did I write that guide? No. This guy wrote that guide. Bully for this guy. The world is better off that he spent two hours on this.
I don't agree. The web has many examples and tutorials of the basics. I come to HN for the special stuff. If I want to see a terraced house I can go to any town in the UK. I go to Barcelona when I want to see a Gaudi. Great that the author wrote this. Assuming it didn't get in the way of him doing something better then I might agree that the world is a better place (let's give that emotive defender speech stuff a kick), good for him, but I don't see that HN is a better place. This is not the first time that beginner dev Go stuff has been posted here, and hung around the top of the first page, it's increasing. If that's the direction HN is heading in, the impressive stuff displaced by junior tutorials, the Gaudis will leave, the juniors will post, watch the feedback loop.
At least we moved on from hearing about a new Javascript framework everyday. Now we argue about how Go is/isn't a good language. Oh and whether AlphaGo beating Lee Sedoul marks the end of the human race, or means nothing at all.
It's basic and obvious to those of us who are good at programming. But not all of us are programmers and this may give them some insight through an applied, real world example about what is obvious to us but apparently not obvious to every programmer.
No, we don't need every tutorial or every Go thread to come up, but this one wasn't so bad.
And since the optimization also was about not making a call to a database in every loop iteration if you don't need to, i'd say it's also pretty basic stuff.
Am i missing something as to why this is on the front page ?
Its Go. Current darling on HN. One could write a blog on how to format "hello world" in Go/Golang and it would likely make the front page, especially if it was given a catchy title like "Critical string optimizations in Go"
I remember reading somewhere to become a star in programming you just need to look up some papers from 10 years ago and republish them. Everybody has forgotten them since then so this stuff is suddenly innovative.
>Am i missing something as to why this is on the front page ?
Only that fact that it's a social bookmarking site -- so whatever is voted gets to the front page. In other words: lots of HN readers find it interesting.
It is Go and the title is interesting. I suspect that many of the upvoters didn't read the article and conclusions before voting. That usually works out ok, but sometimes it results in upvoting really obvious things like this.
As obvious as this advice is (and to be fair, to many of us it is exceedingly obvious!) I'm constantly amazed at the amount of code I see in production services that don't use local variables for storing frequently accessed methods or functions (even from experienced developers!!) and code that fails basic load tests because developers make new database requests inside -which should be- tight loops. So I can see submissions like this still benefiting some developers.
A popular school of thought says to not worry about such things until you run into a performance issue and through profiling you find a bottleneck to optimize.
Database requests, and really any IO, is never appropriate to use like this, I'm refering to stuff like itoa.
When you follow this principle, you end up with software that is exactly as performant as it needs to be (or whatever arbitrary standards were set).
I started following HN when I was a novice programmer, so I do appreciate seeing content appropriate for beginners.
In my book, some people are taking the quote "Premature optimization is bad" too far. To me, there is a difference between optimization of some kind, and not writing dumb code. After some experience, bulk loading data in a batch job, avoiding high complexities if you can, trusting your frameworks is not part of smart optimizations - that's just writing proper code.
Optimization in todays world starts when you're looking at reducing communication volume, lock contention, utilizing your persistence better, replacing locks with atomic operations, ... Those things shouldn't be done on a whim without measurements, because they will make code a lot harder for non-obvious reasons.
Yeah, I'd like to be in the 'not worry about such things' school of thought; personally I don't like the re-calling of methods all the time, because it doesn't look elegant, but in terms of performance it shouldn't matter - I'd like to think the compiler can do some optimizations there. Of course, the compiler may not realize or want to assume the result of the methods are the same for every loop. But still, it should be possible for it to infer that, and optimize this developer oversight.
Python is a fun vessel for that kind of things, it's sensitive to function lookups, when profiling you really see the timings drop with local rebinding.
I'm guessing what was meant was more like code where you see
X.Y.Z.get("Q")
over and over and over again, even though it can't change, and the X, Y, Z, and Q are never that short. Nothing Go-specific about that anti-pattern, I'm afraid.
> Nothing Go-specific about that anti-pattern, I'm afraid.
An optimizing compiler (which Go 6g/8g isn't really, though that may change) will usually be able to fix this up. The way it typically works is SSA conversion and method inlining followed by global value numbering. But this is not by any means the only way an optimizing compiler can repair this: many optimizing backends (for example, LLVM) have the notion of "readnone" functions that can be GVN'd away as well in many circumstances.
Excuse me for stating the obvious, but methods are not constants. They're functions inside classes. Thus they could theoretically return different results when called with the same arguments, even when called sequentially. For example, the methods could return a randomized number; or return an array with values added or removed from the stack; or uses logic based on the system clock.
It would be difficult for a compiler to reliably tell whenever a method behaves like a constant without adding to the compilation time - and that simply isn't an option for JIT runtimes nor Go. And given it's a pretty basic optimisation for humans to pick up, I reckon it's also pretty low down on the list of "must have" features for compiler optimisations.
Sorry, I'm suffering from manflu at the moment so my brain isn't running at full capacity; which means someone will have to explicitly spell out why my comment was downvoted. I'm assuming I've errored somewhere - in which case I'd welcome the opportunity to learn something :)
In an imperative language, proving the constancy of even slightly non-trivial values turns out to be surprisingly difficult. Again, nothing specific to Go here; you'd likely be surprised what the compiler can't assume in almost any imperative language.
I was just trying to make a quick, generic point. If it helps, imagine 4 chained method calls instead.
I haven't checked but if it was just a struct navigation (X.Y.Z.A.C, no method calls) Go probably does just optimize it to the correct offset. But bear in mind that will be an expensive pattern for CPython or Perl, for instance.
>
In an imperative language, proving the constancy of even slightly non-trivial values turns out to be surprisingly difficult.
No, it's not. A mature compiler backend can successfully do this in most cases (as in, most values in most programs are proved constant). SSA form coupled with SROA, SCCP, aggressive inlining, and memory analyses are really powerful. You just need a mature, optimizing backend.
> Again, nothing specific to Go here; you'd likely be surprised what the compiler can't assume in almost any imperative language.
Mature compilers are pretty good. They can usually figure out things you expect them to. People talk a lot about aliasing and what that means for const and the like, and that's usually where the gotchas lie, but Go, like most type-safe languages, is (presumably) much friendlier to the compiler than C is in this area.
My point is not that it's hard or impossible... the key word is surprised. I think most people would be surprised if they dug into what their compiler can't actually prove, especially in languages like Python. Which I mentioned by name as something I was explicitly including in the scope of my point. Which really should have been a clue to you.
With all due respect, would you please give me a bit more principle-of-charity here? You too often pop up after a post of mine to "correct" your own too-narrow reading of it in the first place. HN is not always discussing programming things strictly in terms of Rust or similar languages; that's still just one region of a rich space.
Whether or not you think Python or Perl is a disaster of a language or not, the fact remains I have, in real life, performed non-trivial speedups of real code experiencing real problems simply by trimming those lookups, and the fact that it wouldn't have been a problem in C++ or Rust or whatever isn't particularly relevant at that point. And the fact also remains that even when the compiler is not bothered, it's still bad software engineering to create such redundant code anyhow, since it's hard to see what the code is really doing under all the cruft. (I was just complaining about how Erlang code tends to create that problem rather a lot on reddit, actually.... another language where the compiler won't save you....) And thus, it is still a perfectly valid point on a diverse site like HN to point these things out.
> you'd likely be surprised what the compiler can't
> assume in almost any imperative language
If you're going to use language like "almost any imperative language" to implicitly mean "almost any imperative language with predominantly dynamic semantics", then you shouldn't be surprised when someone well-versed in static languages comes along to correct such an overbroad interpretation. Reconsider who's being uncharitable here.
(And I say this as a Python developer who has actually needed to hoist a lookup out of a loop before.)
The first one is only necessary in languages where functions can have side effects. Otherwise the compiler can easily deduplicate that for you.
In most cases you would of course still name the result yourself, as to indicate that the function call indeed is the same. However for something very semantic, such as `sum(list)`, maybe the only name you could give it, would be something like `list_sum` anyway.
> The first one is only necessary in languages where functions can have side effects
To be more precise, it is only necessary in language where all functions can have side effects. Some language (like D) have the ability to mark a function as "pure" (without side effects), which allow the compiler to perform additional optimizations (and purity validation).
The title was misleading to me because I thought I'd learn something about Go
The title was misleading to me because I thought I'd learn something about Go, but instead it was about Golang.
With all the AlphaGo news going on, it would be significantly less confusing if posts about Golang used 'golang' in the title, at least until we're not seeing two completely different 'go's regularly on the front page.
OTOH, .action() could change the whole dom so that .some_selector matches something else or nothing at all, so I guess jquery couldn't make that assumption / optimization without knowing what .action() does, or without doing a deep analysis of what DOM elements .action() altered and whether the selector on the next line is affected.
Or needless selectors like $('div.some-div input.some-input') where the classes are never used on anything other than divs/inputs. Another thing I've seen: $('#someId #childId').
I have been in discussion where somebody would argue that using a variable for $('.some_selector') would be inefficient because it's more lines of code.
Benchmarks are an important part of optimizing a program.
Another important part of optimization is profiling. Profiling lets you see where your program is spending the bulk of its time. Then you can make informed decisions about where to direct your efforts.
A bit off topic, but do we have a standard way to talk about performance improvements in terms of percents?
This went from 131 to 76. The 70% calculation was (start-end)/end. Is this the standard method, because it seems like you would divide by the start timing.
This might be pedantic, and I'm sorry, but this is something I always wonder about when I talk about performance.
When we say 1.7x faster, what is a fast? How does fastness increase? It's nearly 2 fasts instead of one?
I think the compression sample you gave is much better. Multiply the original file size by the factor to get the new, compressed file size. This is much better than a statistic that I hear a lot about the compression in a columnstore database we use, where everyone says "7-10x compression", because that was published first and is popular.
I'd prefer to see something like your compression example. It took 8 seconds before, and now takes 2. My optimization makes the operation take .25x the time.
1.7x faster means oldRunningTime / newRunningTime = 1.7. Or in your example, 8 seconds / 2 seconds = 4x faster. Sometimes this is called a speedup of 4x or a 4x speedup.
I totally get how the math works. I don't like the language. We're not measuring speed or velocity which can be represented as rates. We're measuring time. Speed would be something like operations per second. We're just measuring seconds.
I noticed that too. People get it wrong all the time but the right way to discuss a change relative to the starting value is to divide by the starting value.
This calculation makes sense if you compare it to miles per hour or iops. If a car went from 20mph to 30mph you would say its speed increases by 50%. Likewise, using 131ns to 76ns, this went from 7,633 per millisecond to 12,157 per millisecond which is about 72% faster.
The one that bothers me is when people say something got x percent slower.
I'm not against people posting their experiences about finding issues (even if trivial) in their code or learning something fundamental about db queries - I would actually encourage that. What I find a bit annoying is putting the name of the programming language they use front and center, as if it was relevant at all, to attract what is mostly called as "hype".
Indeed what a clickbait title, even though there's nothing new in the article. I remember reading something very similar 20 years ago, except it was about Java.
I'd also recommend that the author look into tidying up the `TransactionType` field at ingest - those `ToLower` calls aren't free either, and if you've got control of the system writing the data it's easier to just store a consistent case.
Failing that, `EqualFold` may be worth looking at. It expresses the same intent, dunno if it's more or less efficient.
I'm not sure if Go optimises your condition, but you're calling strings.ToLower(trade.TransactionType) twice, you could memoize the result of that operation and possibly squeeze a little more performance out of the impact function.