Hacker News new | past | comments | ask | show | jobs | submit login
Understanding map, filter, and fold (dev.gd)
72 points by biesnecker on Dec 24, 2012 | hide | past | favorite | 38 comments



I liked this post, and find the use of map/filter/reduce one of the most fundamental changes to the way you can manipulate data. Truly eye opening.

Even though I have relatively little front-end experience, and use primarily python/ruby- I only started to use these with Javascript (via the underscore library).

The only thing I didn't quite agree with is " ... the syntax is obvious enough for most people to understand even if they don't quite grok Haskell".

Whilst the early examples were manageable, I was struggling with the syntax later on. The syntax looks clean, so it seems like Haskell is a neat language, but it wasn't enough to make it easy for me to fully 'get it'. Perhaps I'm too locked in a ruby/python mindset.


Point taken. I might redo this with Ruby at some point in the future -- the functions are all available and it might be interesting to see the differences between the two.


MDN also has pretty good documentation for map/filter/reduce:

https://developer.mozilla.org/en-US/docs/JavaScript/Referenc...

https://developer.mozilla.org/en-US/docs/JavaScript/Referenc...

https://developer.mozilla.org/en-US/docs/JavaScript/Referenc...

What's with the space shuttle / F22 hovering beneath the text right in the middle of the screen? :/


Haha, I've been playing with different background images... maybe that one isn't optimum for reading :-|


I liked the article. Just to let you know, when you zoom on an iPhone the background covers the text for a ~ second: https://www.dropbox.com/s/95f8ivqxodh2qnc/Photo%20Dec%2023%2...


Thanks for the head's up. I'm beginning to think the background image is more trouble than it's worth :(


If you are working with C# these functions are available in linq: map is Select, filter is Where, and reduce/inject/fold is Aggregate in Microsoft speak.

I use those methods all the time when I have to do C# code.

I do wonder why these operations lack a standard name. Still, I think map-reduce sounds a lot better than map-inject or Select-Aggregate or map-fold.


If you're already familiar with SQL (which your average C# programmer likely is), select/where/aggregate look familiar to you, so less of a mental leap to understand them. And being able to say 'it's like in-line SQL!' makes Linq more acceptable to pointy-haired-bosses at the same time.


Easily fixed with extension methods for aliases you like.


Going from java to ruby these functions were the most obvious improvement for my day to day coding.


I agree. These functions have completely changed how I approach algorithms and data manipulation.

I was super-excited to learn Go, but it was a showstopper to discover that Go's lack of support for generics effectively made this pattern of programming impossible. Major disappointment.


Genuinely curious, what do generics have to do with this?


The type of map is `(a -> b) -> [a] -> [b]`, which means that map takes a function from any type `a` to any type `b` and a list of `a`s, and returns a list of `b`s. Generics, aka parametric polymorphism, are required for this because the definition of map makes no assumptions on what types `a` and `b` will be instantiated to.


What does "from any type 'a' to any type 'b'" mean? That sounds like a range of types, but I don't see how that makes sense. Did you just mean that the map converts from 'a' to 'b'?


a and b are placeholders (type variables). They can be any type, for example Integer, or a Float. It doesn't matter.

[a] is a list of a, [b] is a list of b. Map takes a function that takes an a and returns a b, and applies it to every element of [a] in order to produce the [b]. An example usage is

  map lessThanThree [1,2,3,4]
In this case, map takes a function where a is an integer, and b is a boolean, then takes a list of integers and produces a list of booleans.

The result is [True,True,False,False]


It's also illuminating to note that haskell curries function application. This allows us to think of map as a function of one argument.

map :: (a -> b) -> ([a] -> [b])

so instead of thinking that map takes a function and a list and applies that function to the list, we can think that map takes a function and "lifts" that functions argument and result types to the list type.

This thinking leads us directly to Functors and fmap


I recently started a Java job after having avoided it in favour of better languages for some time. I've found the Google Guava library makes Java programming a lot more bearable: https://code.google.com/p/guava-libraries/

Thinks that will make your life easier

* Collections2.filter() -> this is just filter()

* Collections2.transform() -> this is map()

* ImmutableMap.of() -> for when you need to create a quick lookup

You still have to make an anonymous inner class every time you want to pass a function as an argument though.


Yeah, I actually wrote some similar functions for java when I first started experimenting with ruby on the side. I agree it's a big improvement on for loops but there's still a lot of line noise; the anonymous class(as you mention) and having to passing the collection into a static method which is never great. Importing Collections2 statically does make it a bit more pleasant though.


This is such a key insight core to many of the benefits of FP. Other resources on the matter that I like include

Meijer, E., Fokkinga, M., & Paterson, R. (1991). Functional programming with bananas, lenses, envelopes and barbed wire, 124–144.

Hutton, G. (1999). A tutorial on the universality and expressiveness of fold. Journal of Functional Programming, 9(4), 355–372.

Gibbons, J., Hutton, G., & Altenkirch, T. (2001). When is a function a fold or an unfold? Electronic Notes in Theoretical Computer Science, 44(1), 146–160.

(The third one is for theoretical interest mostly)


Good article but he makes the same mistake I did at first with foldl and foldr by saying they fold from the left and right respectively. Which leads to the mistaken thinking that foldr can't operate on infinite lists (it absolutely can, foldl can't).

It's better to think of them as left- and right-associative.


This was my one complaint as well, and it happens with the example given of any'

Defining any' as

any' f xs = foldl (\acc x -> f x || acc) False xs

Then trying to evaluate any' even [1..] runs forever, but if we modify it slightly to use foldr instead as

any' f xs = foldr (\x acc -> f x || acc) False xs

Then evaluating any' even [1..] terminates basically immediately with True.


Thanks much for the feedback. I clearly didn't understand them as well as I thought I did, though after reading http://stackoverflow.com/a/3085516/337184 I think I understand a bit more.

I'll update the post with a correction, and eventually write a "how I learned to stop worrying and love the difference between foldl and foldr" post. :-)


Nice to see you on HN, John!

I had previously learned these as map and reduce/inject (thank you, ruby). It's only this past few months when I took an algorithms course on Coursera that I realized how much of a pain it was to do some of this stuff in other langauges (like Java). To me it was just natural to write functions that operated on an array of "stuff", rather than arrays of ints or strings, etc... The course literally had us building identical functions for handling different data types and then finally solving the problem via generics and the use of an ugly typecast.


Doesn't building map with a fold hide its inherent parallelizability?

The nice thing about map, as opposed to fold, is that you don't need to know the result of the ith step to do the i+1th.


Not really. There's a really nice paper by Graham Hutton, "A tutorial on the universality and expressiveness of fold" which goes into a bit more depth than OP does. Your question is answered directly in section 3.2, "The fusion property of fold". Basically, a sequence of multiple folds can be fused to a single fold.

See http://www.cs.nott.ac.uk/~gmh/fold.pdf


The article you linked proves the universality of fold, that I never doubted for one second.

What I was trying to say is that map is not universal, in that all function evaluations can be performed in parallel, because you can't use the result of the ith to compute the i+1th. In this respect map is in fact a restriction of fold.


Awesome - I had the same question. I was basically wondering how to inform the system of the opportunity to parallelize.

BTW, Hutton's book on Haskell is just so excellent.


That depends on your map function, for most languages. If your map function includes external state (like a variable from an outer scope) then the execution sequence could change the map function's output. Eg, in C#:

    var i = 5;
    var result = input.select(item => { 
        var val = item.v + i; 
        i = val; 
        return val;
    });
This would be an awful programming practice, but it serves as an example that map isn't necessarily parrallelizable.


A small mistake: “In Ruby, these functions are map, filter” — is incorrect; filter in Ruby is called select.


Cheers, fixed (as soon as the cache updates)


Also Ruby has 'reduce' as a synonym for 'inject'


...and thank god for that. I find inject a really confusing name.


The Ruby names are copied from Smalltalk, as is much of Ruby.


Also: (++) :: [a] -> [a] -> [a]

Article implies: (++) :: [a] -> a -> [a]


Before people go off writing this `foldx foo initial bars` pattern:

http://www.haskell.org/ghc/docs/latest/html/libraries/base/D...


I ignored the existence of foldl1 and foldr1 because though they're available in Haskell and make a lot of things easier, they're not available in every language and not fundamental to understanding the concepts themselves.


> In Ruby, these functions are map, filter, and (confusingly!) inject

The ruby I'm using right now (1.8.7) does not understand "filter", but it does have the equivalent called "select".


Also, if you use map and you want to easily filter out values, and 'nil' is an ok sentinel for this, then just use compact after the map.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: