Hacker News new | past | comments | ask | show | jobs | submit login
Data Analysis with Vector Functional Programming [video] (youtube.com)
88 points by srpeck on June 17, 2016 | hide | past | favorite | 34 comments



> "Only short programs have any hope of being correct"

Right, but I would not minimize the number of bytes in a program, but rather the number of nodes in AST: http://www.paulgraham.com/arcchallenge.html

Here's an exercise: parse your favorite APL program into an AST, and see how many nodes it has.


Almost certainly still relatively few.

There's something to be said for being able to see more code at once, too. This is something the vector language people emphasize (also Chuck Moore, IIRC). The APL/J/K style is a different way of both writing and reading code, and the standard objections ("readability", "that looks like line noise") mostly are just because of the gap between that experience and the more mainstream way of experiencing code.

Btw we once had a long HN discussion about pg's suggestion that one should measure code size in tokens rather then lexically. I remember arguing in favor but being persuaded out of it by someone who was even more radical about small codebases than I am.


Has anyone written on the relation between Haskell and similar language and APL, J and related languages?


I've given some talks making the case that what we are doing with Java streams, LINQ, Rx, Ruby Enumerable, point-free style in Haskell, etc is prepping us for array languages.

They are all emphasize transformation pipelines. What the array languages bring to the game is shape polymorphism and many more operators. I suspect we'll see both move into more mainstream languages over the next 5 years or so.


Any linkes, video, etc?


There's this one. It isn't deep. More about making people aware: https://www.youtube.com/watch?v=UX7xmhpUoi4



Having played a bit with J, and also used Haskell extensively, I'd say they aren't that close.

Probably the closest contemporary language to J is Python+Numpy+Pandas. I've gone about 1/3 of the way through Notation as a Tool of Thought (http://www.jsoftware.com/papers/tot.htm ) and translated most of it to idiomatic numpy. E.g. their first example is:

    +/l5
(read this as generate an array 1,2,3,4,5, then apply + as a reducer to it.)

In numpy this is:

    add.reduce(arange(1,6))
It may just be familiarity, but I do find the Python to be far more readable. (Hardly surprising since brevity is the core value of k/j/Q, but readability is the core value of python.)


The examples in the paper are in APL which uses a special, non-ascii character set to express its syntax. J uses the standard ASCII character set. Readability comes with familiarity. If you want closer to natural language for programming then see Inform [1]. I think readability in Inform hinders compos ability and adds up to some lengthy programs. Mathematicians take the time to familiarize themselves with Greek letters, and other seemingly odd symbols, so it can be understood across cultures and languages, and is succinct.

In APL, you can change indexing from 0-based to 1-based. In J it is 0-based, so the need to increment the sequence with the increment verb '>:' as follows:

  +/\>:i.5
1 3 6 10 15

You could also rename or group and rename functions for readability if working with others not familiar with J:

  range =: >:@(i.)    NB. @ joins the two verbs increment (>:) and index (i)
  range 5
1 2 3 4 5

  add_reduce =: +/\   NB. rename +/ (plus apply) and \ (infix)
  add_reduce range 5
1 3 6 10 15

  +/ range 5   NB. This is a comment (from the Latin 'Nota Bene')
15

I am playing with Idris, and do not know Haskell much, but I still find it more like mathematics when composing functions in J as opposed to Idris/Haskell. I see Idris/Haskell as more easy to express the proofs in a style to a proofs textbook, but I see J as more representative of the actual mathematical formulas. They are short, and can be easily manipulated without so much typing.

People sometimes criticize J/K/Q/APL the way they do when they write about Forth or Lisp, but it doesn't take away from them. The Rosetta lander had a lot of mission-critical code in Forth, but there are not many who like the syntax or way of composing programs.

J's learning materials have greatly improved from when I first looked at it [2].

kdb+/q beats the pants off of Spark, and Jd is the J programming language's answer to kdb+/q.

Array processing with a language that has arrays as its fundamental unit makes sense, and that is where it is all headed. Whether it is one of the existing array languages or a hybrid is the question. All those high salaries programming in Q is not a myth, and companies don't pay for no return on salary.

FYI - The creator of Pandas, Wes Mckinney, was studying or looking over J for his next venture. The link seems to have disappeared, so perhaps it is not in development, or it is being developed in secret!;)

[1] http://inform7.com/

[2] http://code.jsoftware.com/wiki/Guides/GettingStartedSerious


Written code, yes; blog posts, not yet.

If you think of an array as a function that maps a linearized index to an element value, you get most of APL for free in Haskell.


Historically array programming and functional languages are peers with quite different evolutions.


There's another two stories currently on the HN front page about IEX becoming the 13th stock exchange.

IEX uses k.


I like the concepts introduced in the video. But, the Q syntax hurts my eyes. I am personally, not in favor of fitting my entire program in a tweet. Readability helps!


Readability is subjective and largely colored by your experience. Verbosity or succinctness isn't the problem here; it's that the syntax and semantics of Q are unfamiliar to you. I think it's very important to take a moment to step back and consider how many choices in language design you may be casually taking for granted.

Many mainstream languages today- Java, Python, Ruby, C, PHP etc.- have very similar core semantics and syntax. Choices of keywords and type systems differ, but loads of ideas work the same, especially when you consider idiomatic everyday code- for loops, scalar variables, some superset of the algebraic rules for operator precedence you learn in math class. Whitespace in an expression tends to be irrelevant, but sometimes its absence is significant. To add a scalar to a list, you use a loop. Assignment operators flow right to left. Lists tend to be indexed from zero. How many of these choices are essential and how many are arbitrary? All of these languages are descendants of FORTRAN and ALGOL, sometimes with some ideas from the Lisp family thrown in. They share a common heritage.

Q, K, J, A+ and APL represent an entirely parallel course of evolution. Within this family there is a great deal of mutual intelligibility. I'm very familiar with K, but the Q dialect doesn't surprise me. When learned, it's familar- readable. What did source code look like to you before you learned to program?

The APL family isn't amazing because programs tend to be short; that's a side-effect of the positive properties of these languages. They teach you to write naturally parallel solutions to problems and offer a simple, consistent way to compound together and apply functions to large structures at once. Please don't say "this looks different than I'm used to" and then close your mind to what the paradigm has to offer.


Debugging poorly written or convoluted K programs is much more complicated than debugging equivalent C/C++/Java/Python programs.

It's decent for prototyping though, although I bet most people would prototype just as fast or faster in Python.


I'll grant that the debugging story for K isn't as sophisticated as with many other languages. Having a REPL and preferring programs that are mostly pure functions helps, but there's room for improvement. Tooling is about the community, though- not the language. I think that making documentation, screencasts and generally encouraging the expansion of K's open-source ecosystem will, with time, close that gap.


This is functionally similar to dplyr in R. Although the more SQL like syntax of dplyr is much more handy.


And when the data gets bigger, there's data.table[1], which performs amazingly well at certain tasks (vectorized ops ftw!), though the syntax can get a little clunky (if you squint at it hard, it's SQL-ish). On my 2012 macbook pro, I'm able to do (some) transformations of tables containing 10s of millions of rows in only a few seconds (and sometimes faster).

It's possible to use dplyr and data.table together, as well, to good effect[2].

[1] https://github.com/Rdatatable/data.table/wiki

[&] https://github.com/Rdatatable/data.table/wiki/Benchmarks-%3A...

[2] http://stackoverflow.com/questions/21435339/data-table-vs-dp...

[&] https://twitter.com/hadleywickham/status/553169339751215104


Are you able to load data sets into data.table which are larger than memory?


AFAIK, with data.table it's all in-memory; whereas dplyr has the option of working with a database backend.

On the other hand, data.table has robust support for modify and update ops by reference, which can be a big performance saver.


The problem with all varients is the incredibly minimal resources to learn them. I wish a skilled practitioner would do something larger than a blog post. Something like a pluralsight/oreilly course...


There is a lot of material on kdb+ on kx's website (http://code.kx.com/wiki/Tutorials), Q for Mortals (http://code.kx.com/wiki/JB:QforMortals2/contents) being the best introductory book.

Also some good resources in this Quora: https://www.quora.com/What-are-the-best-resources-to-learn-q...

And if your interest is in the k language more than kdb+/q, then I have found the docs in John Earnest's ('RodgerTheGreat) oK interpreter a succinct, example-focused introduction: https://github.com/JohnEarnest/ok/blob/gh-pages/docs/Manual.... Plus using his browser-based REPL (http://johnearnest.github.io/ok/index.html) may lower the barriers to entry, and iKe (http://johnearnest.github.io/ok/ike/ike.html) is great for experimentation...http://johnearnest.github.io/ok/ike/ike.html?gist=9c5f43baa4...


I program in J and there are many books, examples, YouTube videos, a great help file, and several active forums. Just check jsoftware.com.

Rosettacode is also a good source for example J code [1].

The J for C Programmers book is good if you already have programming experience, especially in C [2].

[1] http://rosettacode.org/wiki/Category:J

[2] http://www.jsoftware.com/help/jforc/contents.htm



are you a student and/or can you pretend to be one? http://www.timestored.com/kdb-training/free-student-access

or http://www.dyalog.com/mastering-dyalog-apl.htm and be a student to get a free (educational) dyalog license


J has a large number of tutorials ("labs") included in the IDE. They're fun and easy to follow.


Is there anything like Q that is open source? J seems kind of similar.


There's an open source version of K(3) here: https://github.com/kevinlawler/kona


Thanks! This is nice. I'm playing with J though since the resources seem very nice.


It's brilliant language, and all, but I don't buy it. Here's why:

1) languages with this high level of abstraction are very nice if your scenario maps perfectly to its usage (e.g. the wikipedia analysis given in the video). Everything is nicely vectorizable, etc. But if there is some quirk in your data, then sometimes you need to go the usual way, and the Q is of no help (no better than any other language), with the difference that now you do some very inefficient things with those "stinky" loops.

2) it's a question of taste, but I find Q syntax a bit unusual. Probably more time you need to think how to fix your simple problems with this clever one-liners than simply, well solving them...

3) legibility: for all of us working in software developement, we know how much time we waste due to illegible code. Finding bugs, etc. Here this is raised to the new level... of difficulty

4) this is a bit exaggerated, but I don't see how I could use Q in something bigger? Is Q only a scripting languages for one-off mini-batch programs? For instance R has this problem of not having any well-defined project structure, and it is hard to do many things, for instance: bigger programs are hard to maintain and debug, stream processing with R is pain in the ass. Server-side stuff is a little bit shoehorned (Shiny server is cool, but then it's just one-thread thing for serving filtered dataframes to ggplots)

It is cool niche language, but for smart simple analyses, nothing too complex, as it will abstract you from details to your loss.


I wanted to test 1 and 4 out, so I wrote an MMO in q/kdb+: https://github.com/srpeck/kchess (live server is down, but you can see some of recorded action here: https://github.com/srpeck/kchess/blob/gh-pages/docs/kchessdo...). Some of my design decisions: https://news.ycombinator.com/item?id=10924316

I had previously built similar games using other technologies, but found developing in q to be far faster, even given the more primitive debugging capabilities.

I understand that this is a relatively small example, but building it was enough to convince me the APL/k/q approach is useful beyond its supposed niche.


Regarding 2 & 3, I assume you don't know Q or any other array language. If you do, it's not really any less readable than any other language. Finding bugs isn't particularly hard due to how friendly the languages are to REPL development and how easy it is to trace array transformations.

I do agree with 1 & 4 though—Q (and to a lesser extent, K, APL, and J) are niche languages. However, they're really, really good at their niche.


It has nothing to do with array language, R is vectorized and is slightly more legible than, say clojure, that is not array lang. It has to do with very high density, though.


Introduces Q, which is a proprietary, more verbose wrapper around K, another proprietary language, inspired by APL.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: