Code Smells: Iteration

jnty · on Aug 23, 2017

At the risk of sounding arrogant, isn't this a bit obvious?

To me, code smells are more about spotting when someone's been forced to do something a bit weird and convoluted. It's possibly the cleanest solution available given the immediate context, but this is actually a symptom (or "smell") that something is wrong with the wider design of the system.

This article seems more about detecting choices around data structures which are bad in any context and have no real link to the overall system design.

koolba · on Aug 23, 2017

That was my take as well. It's like seeing code with the same constant inlined in multiple places rather than centralized:

    // Use some inlined string as a magic code value:
    doStuff("Some Magic Code");
    
    ...
    
    // Later on same inlined string:
    doOtherStuff("Some Magic Code");

versus:

    // Define it once:
    const GOOD_NAME_FOR_MAGIC_CODE = "Some Magic Code";
    
    // Use it:
    doStuff(GOOD_NAME_FOR_MAGIC_CODE);
    
    ...
    
    // Use it somewhere else:
    doOtherStuff(GOOD_NAME_FOR_MAGIC_CODE);

If that triggers an "ah-ha!" moment for you then you've likely got bigger problems.

wereHamster · on Aug 23, 2017

Decreasing code duplication is good of course. But you need to be careful that the two pieces of code are inherently equal and not just by accident. Because once you consolidate the code, for example like you suggest, and somebody comes along and changes `GOOD_NAME_FOR_MAGIC_CODE` to something else, without being aware that it is used in multiple places, he might inadvertently make a change he did not mean to do. Tests help, a good IDE helps (to find location where a symbol is used). But still, it's a fine line to walk.

sbov · on Aug 23, 2017

The mess of duplicated code is much harder to untangle than de-consolidating code. Especially because subtle differences WILL leak into duplicated code and you won't know if it's on purpose or not.

If a dev can't be bothered to figure out where different pieces of code use a variable and whether the change is appropriate for them, do you think they will have the foresight to search for the duplicated code when they need to change it everywhere? Any decent dev should do both, and the former is a MUCH easier process than the latter.

Edit: despite the above, I don't completely disagree with your statement.

dllthomas · on Aug 23, 2017

If they would need to search for the duplicated code and change it in more than one place, there's something shared to factor out. If the two pieces of code are actually only accidentally the same, changing them independently should be the right thing.

jsight · on Aug 23, 2017

That is definitely good advice. I've seen some developers get overly obsessed with deduplicating code, and sometimes it really does just overcomplicate things. For example, it is easy to end up making things more brittle by moving a somewhat generic method with very specific pre-conditions and post-conditions into a generic utility library. It is even worse if those pre and post conditions were documented in the original location, but not necessarily copied over to the new shared module.

chairmanwow · on Aug 23, 2017

Wouldn't this name change be immediately caught by the compiler?

wereHamster · on Aug 23, 2017

If you have one, yes.

But I meant when you change the definition of that binding (the right-hand side of the equal sign).

dankent · on Aug 23, 2017

I assume wereHamster is referring to the value of the constant being changed, rather than the name.

Jach · on Aug 23, 2017

This whole potential problem is one of the reasons the particular example isn't that great. It's overhyped in Java-land where some people treat naked strings like lepers but in places like Clojure-land you'd get a lot of people saying "Just use a keyword" for both places. The keyword's name is its value. It's immutable. An IDE should be able to find all uses of it, or there's always 'ag'. If you really want that coupling behavior, change the interface of the thing you're calling to require conformance to a schema. The simplest way in Java is probably just by taking an enum instead of a string, but there are richer ways (especially outside of Java) worth looking into if you're needing a string value eventually in the implementation since you'll run into the other Java-ism of making a complicated enum class instead of just using enums like keywords with their string value being their actual name value.

koolba · on Aug 23, 2017

I'd bet dollars to donuts the situations where you want it to change outnumber the ones where you don't by more than 10:1.

s73ver_ · on Aug 23, 2017

"If that triggers an "ah-ha!" moment for you then you've likely got bigger problems."

Or you're just not as experienced. Remember, every day there's someone born who hasn't seen the Flintstones.

khedoros1 · on Aug 23, 2017

It seems like "inexperienced" should fall under the umbrella of "bigger problems". It also (hopefully) falls under the umbrella of "temporary problems".

hinkley · on Aug 23, 2017

It's worse than that. We have met the Enemy and the Enemy is us.

Java, as introduced, claimed that nearly everything was an Object. What we got instead was nearly everything is a String.

The Real WTF in this code is that all of the important information is passed around as Strings. The iterator and its source hint at this but she fixes the wrong problem. Ever has this been the way with Java. Despite having a statically typed language nobody turns the data into information and instead your modules just vomit strings at each other. Sometimes in groups of four or more.

chriswarbo · on Aug 23, 2017

One reason for this is the tight coupling in developers' minds between what I'd call "types" and "representations" (these terms are very overloaded, so others may use them in different ways).

Just because, say, a function name and a string are represented in memory the same way, that doesn't mean they are the same type; in particular there are many strings which aren't function names, and there are many operations (e.g. append) which make sense for strings but not for function names.

I can think of two things to blame for this:

- Languages which make it complicated, verbose and/or inefficient to introduce new, incompatible names for existing representations (i.e. nominal typing). For example, having to declare a new class in a new file with a private field, a constructor method and a getter method; that's a lot of boilerplate, and those method calls will incur a runtime cost. Haskell's `newtype` is pretty good in comparison, e.g. saying `newtype FunctionName = FN String` lets me use `FunctionName` in type signatures, I'll get a type error if I try to use a `String` as a `FunctionName` or vice versa, and I can construct and destruct a `FunctionName` using `FN` (e.g. `let x = FN "foo"`)

- Developers only coding for the happy path. When writing a test suite, it's important to test that bad outcomes are prevented, as well as just testing known-safe inputs. The same should apply to types: types should be used such that meaningless expressions are ill-typed, as well as just allowing known-good expressions to be well-typed.

Unfortunately these sorts of considerations tend to get derailed by similar-but-unrelated issues, e.g. 'YAGN a `FunctionName` interface because there's only one implementation'.

The end result of all this is Web applications concatenating together a bunch of "strings" which are actually HTML, plaintext, SQL, user input and URL parameters :(

msla · on Aug 23, 2017

You can go further with the type/representation dichotomy.

An obvious example is printing: All of a sudden, you need to convert an object of that type to a representation that can be read. Read by what? Humans? Other code? Both to some extent? The representation is dependent upon both the original type and the intended recipient.

(In Lisp, "readable" means "acceptable to the read function, which parses Lisp expressions"; some objects, such as compiled functions, inherently cannot become "readable" in this sense, so they get printed in an unreadable form. Should that be a type error?)

You can even have layers of representation: Length is a type of value, whether it's expressed in inches or centimeters or light-seconds is a representation, and whether it's in ints or floats or strings is another layer to the representation. You can add inches to centimeters with the right conversion, much like you can add numerical values represented ints and strings with the right conversions. The conversions just have to be at the right layer of representation.

Your ideas sound a lot like the original Hungarian notation, BTW: If your language-level types are representations (as in, your type system says int and float, as opposed to semantic notions like pixels-from-edge or alpha-percentage) you can encode the real type information in variable names. People mutilated this to encoding language-level type information in variable names, which is utterly pointless and potentially harmful.

chriswarbo · on Aug 24, 2017

Regarding things like serialising/deserialising, I don't think there's much issue 'philosophically': in the case of Lisp's `read`/`write`, we can spot the ambiguity and deal with it by writing separate functions, for example:

- `serialise`/`deserialise` to produce/consume machine-readable data; must be mutually inverse (which rules out your example of functions).

- `pretty-print` for human consumption; has no inverse.

As a MVP, these could just be wrappers around `read` and `write` (macros would prevent any runtime overhead); the names convey the intended meaning. Later on we could start enforcing some checks, but since Lisp is dynamically typed, we'd have to do runtime tag-checking (e.g. "if type of 'X' is 'function', throw an error"). A static type system/checker would be better.

More generally, each application should be responsible for parsing its input into a domain-specific model, with machine-checked types. A pair of communicating applications may choose to delegate that responsibility to some common library, but they should not assume that their input can be trusted since it's 'coming from that other system'; e.g. they shouldn't pass around a `String` of input as if it were a domain object, and mangle it by pull out particular characters, concatenating things, etc. That `String` should be parsed into its components ASAP, and those should be passed around.

You're right that "apps hungarian" is another possible approach: slightly better than just documentation, but still not manchine-checked. I think the history of apps hungarian degenerating into systems hungarian is another example of this type/representation conflation.

tXEHOKhd · on Aug 23, 2017

It's not just Java! I see this all the time in C and C++ as well. Developers seem to be afraid to use types as they were intended. For example, look up any OpenGL question on StackOverflow and you'll see things like what should be an array of 3D vertices with separate x, y, and z components passed as a 1D array of floats.

In C++ we see quite a bit of abuse of std::pair<> instead of just making a fucking struct to hold 2 properly named, easy to read pieces of useful data. Nope! Apparently calling all your data members "first" and "second" is better to some people.

I don't know if it's an educational problem, a language issue, or something else, but I see it so frequently that there must be some common underlying cause (or set of causes).

leeter · on Aug 23, 2017

It's an educational problem, Chandler Caruth of LLVM gave a really good talk on POD type slicing and how useful it can be. More to the point that for PODs it's supported in the standard. I do agree on the abuse of pair... pair is there for implementers primarily and shouldn't be used outside of deliberately templated code. Ditto tuple.

That said... with C++17 destructuring... it's a lot less painful since you can do the equivalent to a std::tie in a single line.

apta · on Aug 25, 2017

What was the title of the talk?

throwaway613834 · on Aug 23, 2017

One problem in C++ is if you define a struct then you have to define a bunch of operators yourself (operator<, etc.) which gets annoying after you've realized you've needed them a bunch of times.

marcosdumay · on Aug 23, 2017

Does C++ Pair<> define operator< and operator>?

That's surprising, and for me looks completely uncalled for.

EDIT: Just looked into Haskell (because it's easier), and tuples do derive Ord there. Is there some universal convention I'm missing?

throwaway613834 · on Aug 23, 2017

> Is there some universal convention I'm missing?

Tuples are always compared lexicographically, is the universal convention. Not sure if that's what you might be missing?

khedoros1 · on Aug 23, 2017

> Does C++ Pair<> define operator< and operator>?

It does.

std::sort(pairarray.begin(),pairarray.end()) works about how you'd expect, as long as operator< is defined for the constituent types of the pair.

nickbauman · on Aug 23, 2017

Agree. I think we're all Stringly-typed developers. We use strings everywhere. We throw strings across system boundaries at will.

seanp2k2 · on Aug 23, 2017

If you're just using objects to represent groupings of strings and encapsulate behavior around converting them back to strings for output at some point, why not just operate on the strings instead of all the serialization / deserialization? Maybe enough of the quick stuff I do isn't enough to warrant more complex data structures, but I get pretty far into a problem with just dicts (in Python) and generally get annoyed when something returns back objects that have no easy str representation without finagling the object into giving me what I need. I think that this is generally a bigger issue with libraries and treating object metadata (stuff like time stamps, extra attributes, etc) as of equal importance as the data it represents; if an object has one obvious attribute for what the thing is, make that the default thing you get when you just refer to the thing, instead of making everyone call some weird method to turn the instance into the representation that you likely want to get out of it.

nickbauman · on Aug 23, 2017

This is one of the major reasons why I love Lisp so much. Serialization problems disappear in homoiconic languages. It eliminates an entire class of issues.

cormacrelf · on Aug 23, 2017

msla · on Aug 23, 2017

If you're going to be snarky, fix the useless use of cat.

hinkley · on Aug 23, 2017

The first time I ran into this was on a project that was passing contact information around like String, String, String, int. Back and forth across the system in four or five workflows with extra data passed in here and there.

Of course they left off Address Line 2 and rather than try to fix this in thirty places as I was told I made an Address object instead.

With very rare exception, a street Address (Address 1) is meaningless without at least the zip code. And god help us if you had two addresses.

Later it took someone a couple hours to fix it for 9 digit zip codes. No further changes unless a new State joins the union (this was for a state court system and they required a domestic mailing address. One of the few times I didn't eventually need to process international addresses)

jerf · on Aug 23, 2017

"We use strings everywhere."

That's bad.

"We throw strings across system boundaries at will."

That's neither good nor bad, it's just inevitable. The point of not using "strings" is that strings generally do not have the semantics of whatever it is you are really dealing with, because strings are just raw sequences of bytes and are generally too permissive for the specific type. For instance, if you have a URI, that comes with a list of restrictions for validity such as "can't start with a colon". Programming-language strings aren't URLs because they permit starting with invalid scheme declarations, or being empty.

However, when crossing system boundaries, you are exposed in some sense to the raw truth of the hardware, which is "raw sequences of bytes" is the only thing that really exists. When someone sends you something claimed to be a "URI", you really can't blindly count on that, if you want robust code you are going to have to validate that. Even if you work very hard and use higher-level abstractions to try to map things to higher-level types, you still have the problem that the semantics have to be an exact match for this to work; in one program the URI type only permits http and https, in another it permits arbitrary (valid) schemes, and there you go, there's a difference sufficient that you must validate.

To match semantics between two programs sufficiently that you can claim with a straight face that you really, truly aren't throwing "strings" across system boundaries requires incredibly tight coupling between all participating programs.

nickbauman · on Aug 23, 2017

Tight coupling ... or just s-expressions ;)

jerf · on Aug 24, 2017

S-expressions are irrelevant to my point. S-expressions are just a serialization format; the use of S-expressions does nothing to synchronize semantics between two programs. Labeling a string as a "uri" with an s-expression does nothing to solve the problem I mentioned about mismatched semantics between two programs. In fact, it does nothing to solve (uri "::::::////SUPERINVALID"). Behold the fundamental problem with semantics and why the semantic web has never worked out as its advocates hoped; labeling is the easy part of the problem.

nickbauman · on Aug 24, 2017

S-expressions are much more than a serialization format.

jayd16 · on Aug 23, 2017

Just FYI, they ran this example over some built in reflection APIs. Hard to write code designed to get methods by string name without strings...

hinkley · on Aug 23, 2017

Exhibit A:

    The getLoadNames method returns a List of Strings, which we iterate over in order to see if a particular String value is in there.

Even in describing the algorithm we aren't informed it's a list of function names.

Edit: from the comments in the code it's pulling field names from a Mongo result set, so it's member variables more so than function names. One half of that relationship is still deterministic at runtime. Mongo records can change whenever but your ORM is dealing with Classes and their contents are known at runtime.

s73ver_ · on Aug 23, 2017

Because for the purposes of example, it doesn't matter what the data actually is.

hinkley · on Aug 23, 2017

I'll bet you a six pack that hasName is called in a tight loop somewhere. The cost of generating the collection on every call is far greater than the cost of iterating it was.

Can we at least agree that the chosen example has too many other confounding factors in it?

recursive · on Aug 23, 2017

Unix started it.

hedora · on Aug 23, 2017

Even C has types, and the UNIX API's make heavy use of them.

Awk and sed and grep and friends certainly encouraged strings, but I heard there was an even earlier thing called REPL. From what I've read it is not a coincidence that PERL is an anagram of it.

jnordwick · on Aug 24, 2017

Linux /proc and /sys were the epicenter

jrs95 · on Aug 23, 2017

It's obvious to some of us, but not all of us. I see things way worse than this on a fairly regular basis.

jnty · on Aug 23, 2017

I guess - but I feel like you'd call it an "error" rather than a "smell". Better to just teach people data structures rather than which constructs they should be afraid of. Approaching it "sideways" like this just leads to people going "I heard that loops can be bad, so I copied and pasted that line 25 times instead".

jayd16 · on Aug 23, 2017

Its a smell and not an error because it works perfectly fine and it actually is perfectly fine for now because the n for that linear search is low.

rootlocus · on Aug 23, 2017

Sets have a very specific use case: they only contain unique values and (unless using a specific implementation) they don't have any defined order. Switching from arrays to sets just to get rid of an iteration doesn't sound like fixing a smell to me. Also, insertion, iteration and memory usage are less efficient for a set.

JackFr · on Aug 23, 2017

Importantly Sets have different semantics than Lists. Creating a set is an information losing process. A list converted to set cannot be recovered, but a set converted to a list is easily recovered.

I don't think this is handled well in TFA. Because of some dubious iteration at the call site, the author changes the semantics of getLoadNames(), blithely assuming that duplicates should not be allowed and order is not important.

The author mentions that the function is called at two other places. For all we know the original author was aware of Sets but chose List because it more correctly matched the use of the structure.

chillydawg · on Aug 23, 2017

It's obvious to anyone with a CS (or similar) degree, but may not be to someone who is learning to code from various tutorials. Having hints like this in your IDE can greatly improve the quality of your code and help you learn.

dionian · on Aug 23, 2017

People with CS degree and decades of real world experience still benefit from automated code analyses. We deal with systems with millions of lines of code...

mikeash · on Aug 23, 2017

Exactly, this is pretty clearly aimed at beginner/intermediate programmers who haven't entirely figured out how to decide when to use List versus Set versus Map. It's probably obvious to the typical HN reader, but valuable for the target audience. Might not belong here, although the fact that it's currently at #3 on the front page suggests there's enough interest in it.

hedora · on Aug 23, 2017

The "smell" here is that the API does O(n) allocations for a read-only check of a database schema?

Why add an API to have a method that checks for containment directly (and hopefully delete the conatiner returning one entirely?)

Also, it probably makes sense to cache the schema data somewhere.

Finally, my gut tells me the check for the existence of this name probably can be moved earlier in execution (like during initialization of the called class), which will cause the system to fail earlier and be easier to debug (so it would be good to check that before touching it).

lstyls · on Aug 23, 2017

Agree. What makes a pattern a code smell is that it is immediately recognizable and is almost never a false positive. Either the immediate implementation is problematic, or something about the larger context of the implementation is problematic.

A functional programming diehard could assert that iteration is a code smell, because recursion is the preferred solution. But there is nothing inherently wrong with using iteration to implement a solution in Java.

flavio81 · on Aug 23, 2017

Specially if you require higher performance and there are no tail call optimizations.

lstyls · on Aug 23, 2017

I'm not making the assertion :). But what I was getting at is that the larger-scope problem from this perspective would be the use of a non-functional language.

burgerdev · on Aug 23, 2017

I think the blog post is not so much about advanced code smell mitigation strategies, but rather about highlighting the features of IntelliJ.

dionian · on Aug 23, 2017

Many of the syntax highlighting that IDEA does is very obvious once you're alerted. The nice thing is it does hundreds if not thousands of analyses and presents them to you as actionable feedback.

yxhuvud · on Aug 23, 2017

It is obvious, but the code examples in the original link scare me.

CupOfJava · on Aug 23, 2017

Naw man, code smells are anything that when you look at code and you wrinkle your face like you would if a cow farted right in front of it.

maweki · on Aug 23, 2017

The author proposes using data structures that use hashing without even mentioning the necessity of having a hash function and immutable data for the hashed objects, because in their case they were basically just using strings.

It isn't always that easy.

e12e · on Aug 23, 2017

  boolean hasName(String
    storedName) {

    return getLoadNames().\
       contains(storedName);
    }

> That’s it. No more looping, just a simple check.

And how is set.contains implemented?

https://github.com/openjdk-mirror/jdk7u-jdk/blob/master/src/...

wraps

https://github.com/openjdk-mirror/jdk7u-jdk/blob/master/src/...

public boolean contains(Object o) { return map.containsKey(o); // map is HashMap }

public boolean containsKey(Object key) { return getEntry(key) != null; }

And finally:

  final Entry<K,V> getEntry(Object key) {
        int hash = (key == null) ? 0: \
          hash(key.hashCode());
        for (Entry<K,V> e = \
           table[indexFor(hash, \
                     table.length)];
             e != null;
             e = e.next) {
            Object k;
            if (e.hash == hash && 
                ((k = e.key) == key \
                   || (key != null \
                 && key.equals(k))))
                return e;
        }
        return null;
    }

TL;DR how did they imagine a general setContains would be implemented without a loop?

[ed: obviously it makes sense to use a datastructure that more closely match your intent, but the wording here struck me as a bit odd...]

jnordwick · on Aug 23, 2017

But the loop your pointing out is only over the hash bucket, not the entire set. Ideally it will only have a single element.

e12e · on Aug 23, 2017

Good point :)

megawatthours · on Aug 23, 2017

Something I see often and is a huge code smell to me is not using the most restrictive form of iteration.

If you see collection.map(...) you know that each iteration is simply a pure function from original element to transformed element, which is an immense help when reading the code.

If you can use only map / filter / takeWhile / join etc to express what you are doing, use those! If not, try and just use reduce / foreach. If not, try and just use for. Only use while if nothing else works!

masklinn · on Aug 23, 2017

> If you see collection.map(...) you know that each iteration is simply a pure function from original element to transformed element, which is an immense help when reading the code.

You'd think so, but I've had colleagues who managed to fuck that up and use map or list comprehension solely for side-effects.

apta · on Aug 25, 2017

I saw some code like this:

    def update(): Try[Unit] = {
      parser.parse(...).map(result => updateState(result))
    }

And I thought it was abusing the map() call for side effects. However, it is still shorter than writing it out as follows:

    def update(): Try[Unit] = {
      parser.parse(...) match {
        case Success(result) =>
          updateState(result)
          Success(())
        case f@Failure(_) =>
          f
      }
   }

So I didn't have a strong opinion either way since semantically both do the same (and in the case of Scala, the first one is potentially more performant since it relies on the JVM doing virtual dispatch as opposed to calling unapply() and matching, not to mention potentially less garbage being generated).

OtterCoder · on Aug 23, 2017

Even Excalibur can be used to mince garlic, and even a supercomputer can be used to play Zork.

masklinn · on Aug 24, 2017

Yeah I mean my point is "trust but verify", I love restricted iteration construct, but just because they're being used does not mean they're being used "properly" unless the language ensures it.

mikojan · on Aug 23, 2017

This. I've seen FP features (and incidentally OOP features as well) misused so hard, I'm thankful for every plain loop (or struct for that matter).

jstimpfle · on Aug 23, 2017

What is the "most restrictive" form? The answer to this question is highly context dependent (for example, programming language / architectural framework) and expensive to give. Remember that everything has a cost. And especially overzealous formalism.

Personally I find simple C index-based for loops consistent and refreshing. And it's typically not a huge deal. If it is, the procedure might be doing too many things at once. (But yes, I use simple "for x in y" style loops in Python or C++ when they make the lion's share of the loops).

Many different types of loops in a single file, over a single datastructure, always remind me of odd syntax highlighting (for example in vim) in so many different colors that it's only a distraction. I don't care to make so many distinctions. I try to focus on the distinctions that we have to make to get a program done.

nemetroid · on Aug 23, 2017

Map-style iteration carries more semantic information than a C-style index based loop. An index based loop might be doing anything with the data being iterated over. When you do collection.map, already at first glance you get an idea about what the code is doing.

jstimpfle · on Aug 23, 2017

Yep - and nobody cares. Write clean code instead. Much harder than simply changing the looping style.

ubertaco · on Aug 23, 2017

I like this kind of article not necessarily because it's revolutionary or groundbreaking (because it's not), but because it serves as a helpful reminder sometimes, and can help stick an "observation bias" bug in my brain to notice more often the sorts of examples it calls out.

Ar-Curunir · on Aug 23, 2017

Yup, I already know where sets and maps are better than lists, but this article now primes my brain to look carefully the next time I write more code.

KirinDave · on Aug 23, 2017

Reading this article, I got a bit bothered.

I agree with the sentiment: people repeat entirely too much code. There are very few cases where a for-loop is the right thing to write. Using generic methods which can be tested and shipped in isolation is basically always better.

But the article seems to imply there are performance concerns in some cases with its talk of using "a data structure". This irked me, because it's not like Set will beat linear time in balanced read-writes and without care the hash tables end up being non-constant as well.

needcheapbw · on Aug 23, 2017

> There are very few cases where a for-loop is the right thing to write.

I'm trying to get my head around that, do you mean any for loop is bad or many nested loops?

KirinDave · on Aug 24, 2017

Indexed loops are just a bad problem waiting to happen. They introduce arithmetic bugs, are the least efficient way to traverse most any data structure but a sequential memory segment, maximize the opportunity for error, and are often harder for compile-time optimizers to work through (since it's harder to prove what they do in situ).

Abstracting away your iteration is important.

needcheapbw · on Aug 25, 2017

Like, for(int i=0; i<10; i++) { // todo } OK well yes there can be bugs using the index and even C++ back when I was doing it had the Boost library which had for_each. In Java we got something similar back in Java 5 or 6. Also we have fancy stream operations now in Java 8 which took me the longest time to come to appreciate. The only down side is removed compatibility with other languages like C and C++ and C#, locking in the app to a specific language more.

There is on rare occasion, however, sometimes a problem that is solved more clearly or handily using indexing, similar to step indexing in BASIC.

One use case where I've seen stepped indexing useful is in the field of robotics, which uses step characteristics for synchros and servos.

hepta · on Aug 25, 2017

> The only down side is removed compatibility with other languages like C and C++ and C#, locking in the app to a specific language more.

Using a Stream instead of a for loop does very little to make your code incompatible with C. It is already pretty much incompatible (modulo JNI).

What I mean is: if you want something that looks like C just use that, there is no gain in not learning new constructs just because older languages did not have them.

codeulike · on Aug 23, 2017

I really don't like the term Code Smells. Its just seems like it heralds finicky criticism that ignores wider context and real-world pressures. Outside of academia, you can't release a build without creating a few smells

glibgil · on Aug 23, 2017

Hashing is indexing and there is no faster and memory efficient index than an array (not vector) of sorted items. Iteration is the best if you can organize your data for it

jdmichal · on Aug 23, 2017

Not sure what you're trying to get at. Indexing by surrogate index, yes. Indexing by value? That doesn't help. The best you can do is binary search -- O(log n). Hash-based structures are average amortized constant time -- O(1). This works because the key is a function of the value, rather than a surrogate key which is unrelated to the value.

Straight from Wikipedia:

"In many situations, hash tables turn out to be more efficient than search trees or any other table lookup structure. For this reason, they are widely used in many kinds of computer software, particularly for associative arrays, database indexing, caches, and sets."

https://en.wikipedia.org/wiki/Hash_table

flavio81 · on Aug 23, 2017

To say that iteration should be "considered a code smell" is not a good thing to do for the programming community.

Present-day CPUs operate on structures by iterating. On a typical high-level programming language, they are at some point done by either (a) iteration or (b) recursion. If the compiler does not do automatic tail call optimization; then (b) recursion can't be made as optimal (in terms of speed and resources) than (a). The latest Java compiler does not perform tail call optimization and probably never will be able to do it. I mention Java because the article is targeted to the Java crowd.

Thus, iteration IS a very important tool for enhancing performance, and real-world (production) systems might have very important performance requirements.

Iteration, in a code, should just be considered... iteration. No kind of smells.

In the past, there was a very important computer scientist called Edsger Dijkstra wrote a paper called "Goto Statement Considered Harmful", GOTO being considered a "code smell": Dijkstra is a very important guy; an unit of measure, the nanoDijkstra, was named in his honor.

However, even afterwards, new programming languages did include a GOTO statement (or some form), because there are special cases when they are needed for higher performance or (believe it or not!) producing more readable code.

gpvos · on Aug 28, 2017

The autoplaying code editing animations are really annoying. By the time I've read the preceding text, it is somewhere in the middle and I don't know what it's doing.

Please, can we just agree that animations and videos (and audio) should only play when the reader initiates it? Also, a progress bar would be handy, although it may not always be necessary (for short animal videos and similar).

squiguy7 · on Aug 23, 2017

These examples seem to discuss data structure choice what I thought it might be about. I have encountered two major bugs at two separate companies all involving iteration and maps.

Both of the examples were written in Go and although language shouldn't matter, iterating over a map is non-deterministic. But each time the author intended for them to construct the returned data structure in a pre-defined order.

We had been suffering from poor caching performance and saw our Cassandra reads spiking as a result. Once we spotted the incorrect cache key construction, our reads to Cassandra dropped significantly allowing for better performance all around.

Perhaps others would have spotted this but it seems innocuous in code review. I know I will be more diligent in the future.

maxscam · on Aug 24, 2017

Clickbait title, article doesn't seem to back up this hyperbolous claim either

jnordwick · on Aug 23, 2017

[flagged]

seattle_spring · on Aug 23, 2017

Are you the guy in that same class that points out how "obvious" everything is, in an attempt to prove how much smarter you are to the professor?

strictnein · on Aug 23, 2017

Yes. He's in the discussion about building a Hash function in C doing the exact same thing.

jimbokun · on Aug 23, 2017

So everyone here is required to have a CS degree now?

Lots of self taught developers, and people with business or quantitative-but-not-coding backgrounds read Hacker News, I think.

jnordwick · on Aug 23, 2017

I expect a higher level of content from HN, not basic material. I think other sites are more appropriate for this.

Froyoh · on Aug 23, 2017

I wasn't taught this in my class, I was always dealing with very small lists so iterating did not make much of a difference.