Data Analysis with Vector Functional Programming [video]

ingenter · on June 17, 2016

> "Only short programs have any hope of being correct"

Right, but I would not minimize the number of bytes in a program, but rather the number of nodes in AST: http://www.paulgraham.com/arcchallenge.html

Here's an exercise: parse your favorite APL program into an AST, and see how many nodes it has.

dang · on June 18, 2016

Almost certainly still relatively few.

There's something to be said for being able to see more code at once, too. This is something the vector language people emphasize (also Chuck Moore, IIRC). The APL/J/K style is a different way of both writing and reading code, and the standard objections ("readability", "that looks like line noise") mostly are just because of the gap between that experience and the more mainstream way of experiencing code.

Btw we once had a long HN discussion about pg's suggestion that one should measure code size in tokens rather then lexically. I remember arguing in favor but being persuaded out of it by someone who was even more radical about small codebases than I am.

joe_the_user · on June 17, 2016

Has anyone written on the relation between Haskell and similar language and APL, J and related languages?

michaelfeathers · on June 17, 2016

I've given some talks making the case that what we are doing with Java streams, LINQ, Rx, Ruby Enumerable, point-free style in Haskell, etc is prepping us for array languages.

They are all emphasize transformation pipelines. What the array languages bring to the game is shape polymorphism and many more operators. I suspect we'll see both move into more mainstream languages over the next 5 years or so.

joe_the_user · on June 18, 2016

Any linkes, video, etc?

michaelfeathers · on June 18, 2016

There's this one. It isn't deep. More about making people aware: https://www.youtube.com/watch?v=UX7xmhpUoi4

snaky · on June 18, 2016

Yes.

"J for Haskell Programmers" - http://crypto.stanford.edu/~blynn/haskell/jfh.html

Talking about data analysis

"Data Analysis using J" - http://csilo.com/static/DataAnalysisWithJ.html

J for IPython - https://github.com/Synthetica9/JMagic, https://github.com/adrian17/jkernel

and there's https://www.reddit.com/r/apljk/

yummyfajitas · on June 18, 2016

Having played a bit with J, and also used Haskell extensively, I'd say they aren't that close.

Probably the closest contemporary language to J is Python+Numpy+Pandas. I've gone about 1/3 of the way through Notation as a Tool of Thought (http://www.jsoftware.com/papers/tot.htm ) and translated most of it to idiomatic numpy. E.g. their first example is:

    +/l5

(read this as generate an array 1,2,3,4,5, then apply + as a reducer to it.)

In numpy this is:

    add.reduce(arange(1,6))

It may just be familiarity, but I do find the Python to be far more readable. (Hardly surprising since brevity is the core value of k/j/Q, but readability is the core value of python.)

eggy · on June 18, 2016

The examples in the paper are in APL which uses a special, non-ascii character set to express its syntax. J uses the standard ASCII character set. Readability comes with familiarity. If you want closer to natural language for programming then see Inform [1]. I think readability in Inform hinders compos ability and adds up to some lengthy programs. Mathematicians take the time to familiarize themselves with Greek letters, and other seemingly odd symbols, so it can be understood across cultures and languages, and is succinct.

In APL, you can change indexing from 0-based to 1-based. In J it is 0-based, so the need to increment the sequence with the increment verb '>:' as follows:

  +/\>:i.5

1 3 6 10 15

You could also rename or group and rename functions for readability if working with others not familiar with J:

  range =: >:@(i.)    NB. @ joins the two verbs increment (>:) and index (i)
  range 5

1 2 3 4 5

  add_reduce =: +/\   NB. rename +/ (plus apply) and \ (infix)
  add_reduce range 5

1 3 6 10 15

  +/ range 5   NB. This is a comment (from the Latin 'Nota Bene')

15

I am playing with Idris, and do not know Haskell much, but I still find it more like mathematics when composing functions in J as opposed to Idris/Haskell. I see Idris/Haskell as more easy to express the proofs in a style to a proofs textbook, but I see J as more representative of the actual mathematical formulas. They are short, and can be easily manipulated without so much typing.

People sometimes criticize J/K/Q/APL the way they do when they write about Forth or Lisp, but it doesn't take away from them. The Rosetta lander had a lot of mission-critical code in Forth, but there are not many who like the syntax or way of composing programs.

J's learning materials have greatly improved from when I first looked at it [2].

kdb+/q beats the pants off of Spark, and Jd is the J programming language's answer to kdb+/q.

Array processing with a language that has arrays as its fundamental unit makes sense, and that is where it is all headed. Whether it is one of the existing array languages or a hybrid is the question. All those high salaries programming in Q is not a myth, and companies don't pay for no return on salary.

FYI - The creator of Pandas, Wes Mckinney, was studying or looking over J for his next venture. The link seems to have disappeared, so perhaps it is not in development, or it is being developed in secret!;)

[1] http://inform7.com/

[2] http://code.jsoftware.com/wiki/Guides/GettingStartedSerious

pklausler · on June 17, 2016

Written code, yes; blog posts, not yet.

If you think of an array as a function that maps a linearized index to an element value, you get most of APL for free in Haskell.

seanmcdirmid · on June 17, 2016

Historically array programming and functional languages are peers with quite different evolutions.

textmode · on June 18, 2016

There's another two stories currently on the HN front page about IEX becoming the 13th stock exchange.

IEX uses k.

rar_ram · on June 17, 2016

I like the concepts introduced in the video. But, the Q syntax hurts my eyes. I am personally, not in favor of fitting my entire program in a tweet. Readability helps!

RodgerTheGreat · on June 17, 2016

Readability is subjective and largely colored by your experience. Verbosity or succinctness isn't the problem here; it's that the syntax and semantics of Q are unfamiliar to you. I think it's very important to take a moment to step back and consider how many choices in language design you may be casually taking for granted.

Many mainstream languages today- Java, Python, Ruby, C, PHP etc.- have very similar core semantics and syntax. Choices of keywords and type systems differ, but loads of ideas work the same, especially when you consider idiomatic everyday code- for loops, scalar variables, some superset of the algebraic rules for operator precedence you learn in math class. Whitespace in an expression tends to be irrelevant, but sometimes its absence is significant. To add a scalar to a list, you use a loop. Assignment operators flow right to left. Lists tend to be indexed from zero. How many of these choices are essential and how many are arbitrary? All of these languages are descendants of FORTRAN and ALGOL, sometimes with some ideas from the Lisp family thrown in. They share a common heritage.

Q, K, J, A+ and APL represent an entirely parallel course of evolution. Within this family there is a great deal of mutual intelligibility. I'm very familiar with K, but the Q dialect doesn't surprise me. When learned, it's familar- readable. What did source code look like to you before you learned to program?

The APL family isn't amazing because programs tend to be short; that's a side-effect of the positive properties of these languages. They teach you to write naturally parallel solutions to problems and offer a simple, consistent way to compound together and apply functions to large structures at once. Please don't say "this looks different than I'm used to" and then close your mind to what the paradigm has to offer.

lqdc13 · on June 18, 2016

Debugging poorly written or convoluted K programs is much more complicated than debugging equivalent C/C++/Java/Python programs.

It's decent for prototyping though, although I bet most people would prototype just as fast or faster in Python.

RodgerTheGreat · on June 18, 2016

I'll grant that the debugging story for K isn't as sophisticated as with many other languages. Having a REPL and preferring programs that are mostly pure functions helps, but there's room for improvement. Tooling is about the community, though- not the language. I think that making documentation, screencasts and generally encouraging the expansion of K's open-source ecosystem will, with time, close that gap.

IndianAstronaut · on June 18, 2016

This is functionally similar to dplyr in R. Although the more SQL like syntax of dplyr is much more handy.

michaelsbradley · on June 18, 2016

And when the data gets bigger, there's data.table[1], which performs amazingly well at certain tasks (vectorized ops ftw!), though the syntax can get a little clunky (if you squint at it hard, it's SQL-ish). On my 2012 macbook pro, I'm able to do (some) transformations of tables containing 10s of millions of rows in only a few seconds (and sometimes faster).

It's possible to use dplyr and data.table together, as well, to good effect[2].

[1] https://github.com/Rdatatable/data.table/wiki

[&] https://github.com/Rdatatable/data.table/wiki/Benchmarks-%3A...

[2] http://stackoverflow.com/questions/21435339/data-table-vs-dp...

[&] https://twitter.com/hadleywickham/status/553169339751215104

IndianAstronaut · on June 18, 2016

Are you able to load data sets into data.table which are larger than memory?

michaelsbradley · on June 18, 2016

AFAIK, with data.table it's all in-memory; whereas dplyr has the option of working with a database backend.

On the other hand, data.table has robust support for modify and update ops by reference, which can be a big performance saver.

nickpeterson · on June 17, 2016

The problem with all varients is the incredibly minimal resources to learn them. I wish a skilled practitioner would do something larger than a blog post. Something like a pluralsight/oreilly course...

srpeck · on June 18, 2016

There is a lot of material on kdb+ on kx's website (http://code.kx.com/wiki/Tutorials), Q for Mortals (http://code.kx.com/wiki/JB:QforMortals2/contents) being the best introductory book.

Also some good resources in this Quora: https://www.quora.com/What-are-the-best-resources-to-learn-q...

And if your interest is in the k language more than kdb+/q, then I have found the docs in John Earnest's ('RodgerTheGreat) oK interpreter a succinct, example-focused introduction: https://github.com/JohnEarnest/ok/blob/gh-pages/docs/Manual.... Plus using his browser-based REPL (http://johnearnest.github.io/ok/index.html) may lower the barriers to entry, and iKe (http://johnearnest.github.io/ok/ike/ike.html) is great for experimentation...http://johnearnest.github.io/ok/ike/ike.html?gist=9c5f43baa4...

eggy · on June 18, 2016

I program in J and there are many books, examples, YouTube videos, a great help file, and several active forums. Just check jsoftware.com.

Rosettacode is also a good source for example J code [1].

The J for C Programmers book is good if you already have programming experience, especially in C [2].

[1] http://rosettacode.org/wiki/Category:J

[2] http://www.jsoftware.com/help/jforc/contents.htm

eggy · on June 18, 2016

Also, see this:

http://code.jsoftware.com/wiki/Guides/GettingStartedSerious

Avshalom · on June 18, 2016

are you a student and/or can you pretend to be one? http://www.timestored.com/kdb-training/free-student-access

or http://www.dyalog.com/mastering-dyalog-apl.htm and be a student to get a free (educational) dyalog license

PeCaN · on June 18, 2016

J has a large number of tutorials ("labs") included in the IDE. They're fun and easy to follow.

codygman · on June 18, 2016

Is there anything like Q that is open source? J seems kind of similar.

ksherlock · on June 18, 2016

There's an open source version of K(3) here: https://github.com/kevinlawler/kona

codygman · on June 21, 2016

Thanks! This is nice. I'm playing with J though since the resources seem very nice.

haddr · on June 17, 2016

It's brilliant language, and all, but I don't buy it. Here's why:

1) languages with this high level of abstraction are very nice if your scenario maps perfectly to its usage (e.g. the wikipedia analysis given in the video). Everything is nicely vectorizable, etc. But if there is some quirk in your data, then sometimes you need to go the usual way, and the Q is of no help (no better than any other language), with the difference that now you do some very inefficient things with those "stinky" loops.

2) it's a question of taste, but I find Q syntax a bit unusual. Probably more time you need to think how to fix your simple problems with this clever one-liners than simply, well solving them...

3) legibility: for all of us working in software developement, we know how much time we waste due to illegible code. Finding bugs, etc. Here this is raised to the new level... of difficulty

4) this is a bit exaggerated, but I don't see how I could use Q in something bigger? Is Q only a scripting languages for one-off mini-batch programs? For instance R has this problem of not having any well-defined project structure, and it is hard to do many things, for instance: bigger programs are hard to maintain and debug, stream processing with R is pain in the ass. Server-side stuff is a little bit shoehorned (Shiny server is cool, but then it's just one-thread thing for serving filtered dataframes to ggplots)

It is cool niche language, but for smart simple analyses, nothing too complex, as it will abstract you from details to your loss.

srpeck · on June 18, 2016

I wanted to test 1 and 4 out, so I wrote an MMO in q/kdb+: https://github.com/srpeck/kchess (live server is down, but you can see some of recorded action here: https://github.com/srpeck/kchess/blob/gh-pages/docs/kchessdo...). Some of my design decisions: https://news.ycombinator.com/item?id=10924316

I had previously built similar games using other technologies, but found developing in q to be far faster, even given the more primitive debugging capabilities.

I understand that this is a relatively small example, but building it was enough to convince me the APL/k/q approach is useful beyond its supposed niche.

PeCaN · on June 17, 2016

Regarding 2 & 3, I assume you don't know Q or any other array language. If you do, it's not really any less readable than any other language. Finding bugs isn't particularly hard due to how friendly the languages are to REPL development and how easy it is to trace array transformations.

I do agree with 1 & 4 though—Q (and to a lesser extent, K, APL, and J) are niche languages. However, they're really, really good at their niche.

haddr · on June 18, 2016

It has nothing to do with array language, R is vectorized and is slightly more legible than, say clojure, that is not array lang. It has to do with very high density, though.

leephillips · on June 17, 2016

Introduces Q, which is a proprietary, more verbose wrapper around K, another proprietary language, inspired by APL.