I have spent a lot of time experimenting with the Haskell bindings for TensorFlow and a little time with HLearn. Sorry for a negative opinion, but as much as I don’t particularly like Python, I would suggest to almost anyone that they stick with Python for deep learning for now. In a year or two, I might recommend TensorFlow implemented turtles all the way down in Swift, but let’s wait and see how that project progresses.
I have only spent a few evenings playing with Clojure and mxnet and while I appreciate the efforts of the Clojure mxnet subproject team, I think you are still better off for now with Python, TensorFlow, and PyTorch.
A little off topic: I had a deep learning example for Armed Bear Common Lisp (implemented in Java) and DeepLearning4j in the last edition of my Common Lisp book. In the latest edition of my book, I removed that example and the chapter that went with it and replaced it with two examples of Lisp code using REST services written in Python, SpaCy, and TensorFlow - I think that is more practical right now; the situation may change in the future.
EDIT: I also added REST examples using Python, SpaCy, and TensorFlow to the second edition of my Haskell book.
Community size always outranks the pros/cons of languages. Still they are never entirely always static and it always takes some bold people to push it towards something new and (hopefully) better. Although more often than not it stays the same.
You might find the approach I used to implement neural networks in spaCy interesting, if you haven't seen it: https://github.com/explosion/thinc/
Basically it's all functional programming. Python is actually not a great language for this, I'm sure it would be better in a typed functional language.
For what it's worth, Python is a strongly typed language[0] and is what the manual calls "multi-paradigm," i.e. you can write purely functionally in it[1] all you want, though it is a fair point to observe that the community, PEPs, Guido et. al. do not encourage it, and the language implementations are probably not optimized for it.
In some sense, sure. But it's not statically typed, which is (in my opinion) much more important. Reasoning about function composition in Haskell is straightforward because you just have to make the types line up correctly. Technically this is true in Python, but you don't have any mechanism (by default) to check whether your implementation is correct other than writing enough tests. Yes, you can use type hints and an external type checker, but that's still not a great solution compared to any statically styped language.
> you can write purely functionally in it all you want
Except that Python does not optimize tail calls, which means that implementing a simple recursive solution to a problem requires reasoning about the stack (which it shouldn't, if we're doing things functionally). There are hacks to optimize for some tail calls via decorators and explicitly checking the stack, but this decreases runtime performance and does not work for things like mutual tail call recursion (which is optimized in functional-first languages). To me, lack of TCO is a huge dealbreaker. I mean I still write my Python code semi-functionally because I find FP easier to reason about, but it's more work than doing it in Haskell.
And don't even get me started on the lack of support for features like algebraic data types, pattern matching, etc.
---
I love Python; it's my most-used language by far, I think. And I love functional programming. And I even try to write my Python code in a functional style! But Python is not what I would call a "good" language for functional programming.
Tail call reversion optimization just exists so people can right 'for loops' in a manner that looks 'functional' on the surface.
Because that is what tail optimization does.. it turns your recursive function into a loop.
So every time you write a for loop in Python just tell yourself that it's actually just a brilliantly optimized recursive function and you can get the same benefit.
Functions are first class objects in python so you can, if you really want, write functional code. It just all kinda goes to shit when you start importing other people's libraries that demand classes and methods being used. Those libraries are just a major part of what benefit you get from using python.
So I am not saying that any of this invalidates haskell or anything like that.
> Tail call reversion optimization just exists so people can right 'for loops' in a manner that looks 'functional' on the surface.
Uh... sorry, but I don't think that's right.
The purpose of writing functions with tail recursion isn't "I want to write a for-loop but I want it to 'look' functional." The purpose of tail recursion is to make easy certain kinds of implementations.
If you have a recursive data structure, using tail recursion over that structure is significantly more straightforward than writing iteratively. I actually wrote a comment about this recently: [0]. When using recursively-defined data, it makes more sense to write the traversal functions recursively. Functional programming makes significant use of recursion because, when dealing primarily with functions and immutable data, recursion is the simplest tool for the job.
> Functions are first class objects in python so you can, if you really want, write functional code. It just all kinda goes to shit when you start importing other people's libraries that demand classes and methods being used. Those libraries are just a major part of what benefit you get from using python.
I guess that's one way of looking at it, yeah. But if Python had better support (and encouragement) of functional programming, it seems to me that we'd have more functionally-styled libraries.
> If you have a recursive data structure, using tail recursion over that structure is significantly more straightforward than writing iteratively. I actually wrote a comment about this recently: [0].
That example isn't tail recursive, though. The Python version is more difficult to read because you're using a manual stack instead of relying on the built-in one. An iterative algorithm, whether written using lexical recursion or a for loop, would entirely remove the use of a stack, not just hide the stack in your language implementation. Converting an iterative algorithm between the two forms is a simple syntax transformation, and doesn't introduce bookkeeping like that. Converting a body recursive function to iterate with an in-language stack introduces a lot of noise even if you use tail recursion to do the iteration.
The tail recursive Haskell version of your Python isn't much better:
sumTree :: BTree -> Int
sumTree t = sumTree' [t] 0
where sumTree' [] total =
total
sumTree' (Leaf v : rest) total =
sumTree' rest (v + total)
sumTree' (Branch v l r : rest) total =
sumTree' (l : r : rest) (v + total)
Oh damn, you're completely right. My mistake! I guess I should've actually re-read my comment before linking here since they were about slightly different things haha. Oh well.
> The Python version is more difficult to read because you're using a manual stack instead of relying on the built-in one.
For a sufficiently large input, Python will error due to its maximum stack recursion depth (default is 1000, I think), so a generic solution requires using an in-language stack or else modifying your Python configuration to circumvent the maximum. (Or using a different language, I suppose.)
> The tail recursive Haskell version of your Python isn't much better
I do find that less readable than the non-tail-recursive version I had written previously, but still easier to read than the Python version with an explicit stack.
An alternative approach might be to use a zipper instead of a list, but that's probably (definitely) over-engineering the problem haha.
Actually for fun I put together a zipper-y solution. It takes a lot of scaffolding, but if we assume the zipper stuff was already implemented then the `sumTree` solution itself is nice enough:
data BTree
= Leaf Int
| Branch Int BTree BTree
data Path
= Top
| InLeft Int Path BTree
| InRight BTree Int Path
type Location = (BTree, Path)
data Direction
= Up
| Down
goDownLeft :: Location -> Location
goDownLeft (Leaf _, _) = error "down of Leaf"
goDownLeft (Branch v l r, p) = (l, InLeft v p r)
goDownRight :: Location -> Location
goDownRight (Leaf _, _) = error "down of Leaf"
goDownRight (Branch v l r, p) = (r, InRight l v p)
goUp :: Location -> Location
goUp (_, Top) = error "up of Top"
goUp (l, InLeft v p r) = (Branch v l r, p)
goUp (r, InRight l v p) = (Branch v l r, p)
sumTree :: BTree -> Int
sumTree t = sumTree' Down (t, Top) 0
sumTree' :: Direction -> Location -> Int -> Int
sumTree' Down l r = case l of
(Leaf v, _) -> sumTree' Up l (v + r)
_ -> sumTree' Down (goDownLeft l) r
sumTree' Up l r = case l of
(_, Top) -> r
(_, InLeft{}) -> sumTree' Down (goDownRight (goUp l)) r
(_, InRight _ v _) -> sumTree' Up (goUp l) (v + r)
I feel like there's maybe a way to implement this more efficiently (like with a better choice of zipper constructors), but I really don't have a ton of experience with zippers so this is what I came up with. Maybe you know a better way?
Well it isn't built-in, so that's an obstacle to its use right there. You also have to rely on two sets of developers to ensure you can stay up-to-date, and this introduces more potential for errors and bugs.
Furthermore, the static type systems supported by, e.g., MyPy do not perfectly reflect the dynamic semantics of the Python language. So what you really have is some subset of Python that you're allowed to use fully. (This is probably not a big issue for many people, to be fair.)
Another problem is that systems like MyPy require you to actually write the type hints in a lot of places. Manifest type systems are not favored because requiring users to write more is almost never something they're happy with. One of the most common complaints about Java is that you have to write so many types everywhere, and while you don't need that many in statically typing your Python code, you do need quite a few. A statically typed language with good type inference definitely wins here.
Type hints are also not very expressive because they're purely nominal, but Python's type system is actually structural (which is true for most dynamically typed languages, as far as I know). I haven't experimented with this too much because I don't use a type checker like MyPy, but I seem to remember that it treated types nominally. (I think I've also seen some indication that they're working to extend type hints to support structural types, but last I checked it wasn't right around the corner or anything.)
What I'm getting at here is that while external type checkers like MyPy can be considered an improvement over the dynamically typed language itself, they come with a lot of caveats that make me uncomfortable. There are few cases where I truly prefer dynamic types to static ones.
> Another problem is that systems like MyPy require you to actually write the type hints in a lot of places. Manifest type systems are not favored because requiring users to write more is almost never something they're happy with.
As I have grown as a developer I have shifted from favoring dynamic fly-by-the-seat-of-your-pants typing to static explicit typing, even with the added verbosity[1] because... writing `int` takes a moment, but typing based bugs last a lifetime. Strict typing is worth the extra code overhead as it can drastically reduce code maintenance costs when coupled with a good architectural design. That said, I still hate hate haaaaate java-doc style dressed up comments and out of band enforcement in general - I want a first party tool to verify my typing and for that typing to be confirmed at run time or compile time depending on the language style (interpreted vs. compiled). If typing isn't correctly used everything should break.
On that topic, I've been loving PHP's move toward explicit typing since PHP7.
1. I still don't like C++ (pre C++11 which was the last I used... so auto may change my thoughts if I get a chance to play with it industrially) because the verbosity was super excessive.
> Well it isn't built-in, so that's an obstacle to its use right there.
Yeah, no doubt.
> You also have to rely on two sets of developers to ensure you can stay up-to-date, and this introduces more potential for errors and bugs.
[Edit: Some of] The maintainers of mypy are python core developers, so it's not quite two sets of developers. :-)
> Another problem is that systems like MyPy require you to actually write the type hints in a lot of places. Manifest type systems are not favored because requiring users to write more is almost never something they're happy with.
One of the benefits of optional typing is we can automatically add type annotations from runtime observation. Then we get the documentation benefits of typing without the work of writing them. https://monkeytype.readthedocs.io is a good tool for this.
> (I think I've also seen some indication that they're working to extend type hints to support structural types, but last I checked it wasn't right around the corner or anything.)
Honestly I think this is my main issue with it. I have the same feelings about Liquid Haskell having a separate executable to check the liquid types. It's not the worst... but not ideal.
> The maintainers of mypy are python core developers, so it's not quite two sets of developers. :-)
Shows you what I know haha.
Still, there's no guarantee that it'll always be the same developers. And even then, it's two separate development cycles. MyPy could fall behind someday or something. Just something to think about!
> One of the benefits of optional typing is we can automatically add type annotations from runtime observation.
This has a high coolness factor but is not very sound, because it'll require writing good tests that allow you to make those observations in the first place, right? This is adding to the total test-writing workload of the user, instead of a statically typed language where you just get all of this for "free".
> Mypy supports structural subtyping currently.
I didn't realize this! That's good! Clearly I need to do more research before commenting haha.
This does appear to require an additional external dependency though (the `typing_extensions` module), which is too bad.
> Do people not run tests much in more-static languages?
You do, but the tests you need are different. You don't need to test that all inputs are valid by type (checking for runtime type errors in Python, for example), because type correctness is guaranteed when you compile.
So instead, you just have to write tests for functionality.
In a dynamically typed language, you need both to really ensure you're covered. (Or else you're just relying on invariants that people obey, which is... risky.)
> `typing.Protocol` is in Python 3.8.
Ah! Excellent. Glad to hear it! I will certainly be making use of that haha.
> Do people not run tests much in more-static languages?
Sure we do :-) but most projects would be lucky to reach something like 70% test coverage. You can't derive 100% type coverage from that. You can get 100% type coverage with a good typechecker.
Also, PHP has a nice approach here, type declaration errors can (essentially) be run-time file parse errors depending on your settings, so even if there aren't tests covering everything, if the coverage is decent enough to hit every file (I'll roughly throw out ~40% if it's a mature code base) then you're probably safe.
I mean...Come on, it's not like that's news to me :p.
The mypy stuff in particular is somewhat promising, but it's a pretty awkward retrofit. A lot of the key APIs (e.g. numpy) were not designed around type declarations, so numpy will very often return either a float or an array depending on the input operations, keyword arguments, etc. This sort of thing is super common.
I just love the Hacker News guideline that comments get more thoughtful and substantive as a topic gets more divisive, I think we all benefit a lot from that aspect of this site.
Honestly I only posted my comment to stir deeper public discussion on the topic, while trying to be clear at the end that it's possible but I certainly don't think it's encouraged or any fun at all. I figured it wasn't news to you, but thought the topic worth opening, perhaps news to someone else. Cheers.
> numpy will very often return either a float or an array depending on the input operations, keyword arguments, etc
That makes it harder to find good types, but it doesn't rule it out.
Either you can tell what the shape of the output should be without calling the function in question or you can't.
If you can't, you should be checking the shape of the output after the call, and the type of the output should be some kind of union so the type system encourages this.
On the other hand, if you can look at the arguments and easily know how the output should look, there's a fair chance you can push this knowledge into the type system. I've done similar things in flow. As a recent example, the download function in the google storage API returns a promise of the file's contents, unless the argument contains a 'destination' key in which case it returns a promise of void.
Describing that as the intersection of the two function types got me the checking I want. Amusingly, I don't think there's a way to write a function with this type without a cast through any, but that's not a problem for describing an untyped API.
All of that said, I don't know how well this works out with numpy and mypy in particular.
'you can write purely functionally in Python'. I wish https://github.com/tobgu/pyrsistent were more mainstream in the Python community and get some syntax love.
Thanks for the link for Lex’s interview. I watched it last week, a fascinating conversation. Jeremy is teaching a class with the Swift version of TensorFlow, which I experimented with but setup on my Mac sometimes worked, but a update broke it for me.
If you want static typing (mostly) on your TensorFlow, and are willing to explore a new language, I am making https://losttech.software/gradient.html - basically .NET binding to the entirety of TensorFlow. If you want functional, it works with F#, but syntax is a bit worse due to the lack of DLR support.
I have been using Neural Networks since the 1980s, and I just finished a job at Capital One managing a deep learning team. I work mostly with GANs and RNN.
HLearn looked good, but I didn’t spend much time with it. I suggest you try both HLearn and the Haskell TensorFlow bindings. If you don’t try to set up using a GPU then the setup is fairly easy. The Haskell TensorFlow minst example is very cool with terminal graphics, but I was unable to get an RNN working.
Package and infrastructure ecosystem development and maturity. It's debatable whether Haskell is the right language for such a package, but it's not debatable that a fully-fledged deep learning framework and the linear algebra + GPGPGU infrastructure it would rely on would be significant undertakings of hundreds of thousands of combined man hours and many millions (maybe even billions) of dollars of resources, not to mention community adoption.
In order for a newcomer to beat the incumbents, it would have to be an order of magnitude better in some way -- perhaps ergonomics, perhaps performance, and it would have to do so in some way that the incumbent couldn't just conceptually backport. That's not to say that Haskell or FP won't eventually find usage in these domains or is theoretically unsuited to it. But, I do think that from a practical perspective, it's a risky choice.
From scratch, maybe. But nothing stops Haskell and Clojure from binding to the same C-libraries every other language binds to.
And I think you're overstating the difficulty of even a top-to-bottom new system. Re-implementing deep learning would be much, much easier than reinventing deep learning. Still probably not worth doing! But your estimate sounds about right for the latter, not the former.
I don't think they said they didn't believe. They were just asking for actual arguments, which seems fair to me.
As for Cabal hell, doesn't Stack [1] help with that? To be honest, I'm unhappy with dependency managers in most languages: Maven, the things Golang uses, or Python, or NPM (shudder!), etc. This really doesn't seem like a solved problem :(
Cabal hell is a solved problem these days. Solved three different competing ways --- nix, cabal new-build, Stack --- where ideally there'd just be one, but not the issue it was five years ago.
That doesn't mean dependency management is always easy. It's a hard problem in any language. But you won't find yourself in an inconsistent state that requires blowing away your installation if you use one of those three approaches.
I tried to play with tensorflow in Haskell, but I found, as with all nontrivial libraries with external bindings, getting it to install on my system is a headache and a half, better incompatibilities, and massive compile times and memory demands (8GB+ system) and hard disk usage / IO. Unless stuff is published as whole images for docker or virtual box, it's just exhausting to work with non-pure Haskell (and sometimes even pure Haskell with many dependencies).
As a full-time Python DL developer, I would say the major pain point isn't about passing the ndarrays/tensors to BTS runtime.
The major pain point of Python is the ... Python part of it, that is not related to ML itself. Like processing preprocessing data and sample it and feed to your downstream model stuff.
This part is surprisingly effortful, because of Python's slow runtime and GIL, etc. TBH for a lot of stuff there can be a workaround, but it is just manual and brittle and can't by further away than the supposed Pythonistical experience you would assume.
What Swift could be to the scene is a set of powerful and easy to work with preprocessing primitives that writes once, and runs both in training/production time. That would a revolutionary experience once it happened.
I can totally see the market there, and won't mind be an early adopter to it.
With that being said, Python isn't going anywhere any time soon. People bitch about it, but there is still no other option on the market that is more productive than Python.
Python isn't perfect, as it isn't as bad as some posts here trying to convince the audience otherwise.
Yes, the backend for Swift for Tensorflow will still be C++. The difference is that it will be the compiler (at compile-time) and not some library (at runtime) that will be mapping Swift native operations (instead of library calls) directly into the Tensorflow graph API. It's a much more sophisticated frontend, but there is no porting of the backend to Swift.
Parallelism is Clojure is not cheap if you are using ordinary threads, due all machinery necessary to setup and start OS thread (the same applies to Java/C(++)). If you are using core.async, you will get cheap green threads.
However, parallelism in Clojure is extremely easy, especially switching between non-parallel and parallel implementations (map -> pmap or using reducers).
* (Lack of) Libraries and Limitations
This is common misconception between Clojure novices or someone wanting to start with Clojure. Clojure embraces Java and, using Clojure primitives, you can easily make your Java code functional and safe. Lack of ML libraries in pure Clojure just means that wrapping TensorFlow java code will require couple of more Clojure functions.
For me, TensorFlow API isn't the most friendlier API out there and I have impression it is specifically designed for Python by C++ programmers. You may want to check alternative DL4J [1].
With fine-grained parallelism, programs don't tend to start new threads per bits of work. Instead there are a bunch of worker threads that get work delegated to them in various ways. It's this way in parallelism libs for all languages.
> due all machinery necessary to setup and start OS thread
The machinery to start an OS thread is on the order of magnitude of nanoseconds. Starting an OS thread far cheaper than printing a line to stdout, for example.
OS threads are only a problem in interpreted languages, because all of them have a global interpreter lock of some sort.
For compiled native languages OS threads are more efficient than green threads.
That microbenchmark is old and quite imperfect; I'm pretty sure it's not actually measuring what it claims to measure.
The point still stands. Creating threads (at least in a normal OS like a modern Linux) is really fast.
E.g., when processing a huge logfile it's faster to create one short-lived thread per each line than it is to use a thread pool with the producer-consumer routine and conditional variables.
> OS threads means context switching which means relinquishing the cpu from user space to kernel space for a time.
No. The OS will switch contexts regardless of how many threads you have. Even with zero threads the OS will be context switching between processes anyways. That's just how pre-emptive multitasking works.
What you're trying to say is that threads might get starved due to imperfect scheduling; but even that is wrong - you're never going to write a better scheduler than the one in the Linux kernel. I mean that seriously.
If you want a deterministic scheduling routine without unpredictable delays, then just use one of the realtime schedulers provided to you by the kernel. That's what they're there for.
> The OS will switch contexts regardless of how many threads you have
With IRQs pinned (via CPU mask given to the kernel at boot time) and application process pinning, there will be no preemption of the application process.
> Even with zero threads the OS will be context switching between processes anyways
No, this is what pinning is explicitly for.
> What you're trying to say is that threads might get starved due to imperfect scheduling
Context switch is switching between processes or threads that a CPU core runs. In parallel computation it's well known how to avoid that: keep the thread count equal to hardware core/thread count.
Green threads enable concurrency but not parallelism, so they are not applicable to accelerating computation.
(core.async isn't actually green threads, except in ClojureScript, but its thread pool is more for working around blocking calls than exloiting parallelism).
While I don't know that much about deep learning, last year I had a student who did, and he implemented a deep learning library in Futhark, a parallel functional language. Performance was decent on the small networks we ended up with, but I'm skeptical about the ability of functional languages to compete directly with specialised languages like TensorFlow (although I find TensorFlows Python API to be bad). In particular, the assertion in TFA that implementing multicore parallelism in Haskell is "easy" is a gross oversimplification. It is by no means easy to implement deep learning directly in Haskell (or most other functional languages) with any kind of acceptable performance, and parallel computation in Haskell is in general a tricky subject (easy to get right, very hard to make it run fast).
In practice, people do who machine learning with Haskell seem to treat it more of an instrumentation language, for putting together building blocks written in other languages, kind of like Python is for TensorFlow.
I want to pitch in to suggest the Book by Dragan Djuric [0]: Deep Learning for Programmers: An interactive Tutorial with CUDA, OpenCL, MKL-DNN, Java and Clojure"
This article is poorly written, with mistakes and lack of domain knowledge.
> in deep learning, when functions are applied to the data, the data does not change
Except for in-place operations.
> Clojure doesn’t replace the Java thread system, rather it works with it. Since the core data structures are immutable, they can be shared readily between threads
Missing the point - basic ops in DL need to be optimised to an extreme level, and that implies using C.
Thread level parallelism is different from GPU parallelism. Different threads can perform completely independent operations at any time. GPU threads must do exactly the same operations, but on different memory locations, at all time. In exchange for this rigidity, we can pack a lot more of them on silicon than CPU. A CPU thread is like a complete individual that can do anything they want. A GPU thread always is part of a pack, and they all move together.
The nice parallelism allowed by Clojure is for CPU threads not GPU threads. It would still need to rely on an external library for tensor operations, for instance ATen [1], the C++ backend of PyTorch.
On the other hand, Functional Programming can be useful to describe the model at a higher level and better handle the scheduling of each component (Convolution, LSTM, etc) on GPU. When training model, the batch size already allows near optimal usage of a GPU cores, however when doing evaluation, this becomes more relevant.
I would suggest using partition-all, not partition (in Clojure), unless you are sure that the number of items is an integer multiple of the partition size. Otherwise you are in for a surprise as some of your items will end up unprocessed.
This is pretty much the reverse of what happened in mainstream deep learning. Most people learned with Tensorflow, which is arguably functional, and then decided PyTorch (which is like numpy, imperative) was a lot more straightforward to work with, to the point where Google decided to make Tensorflow 2.0 imperative by default in order to stop losing ground
Someone please correct me if I'm wrong but this article suggests to perform training of DL models in production on the CPU?
Also why does the choice of language to specify my DL model matter that much if the low level number crunching is abstracted away anyway into a highly optimised component running on whatever processor / accelerator is most suitable?
> why does the choice of language to specify my DL model matter that much
because model specification is also code that needs to be amenable to change, tuning, refactoring. it needs to be expressive enough to test new ideas quickly without hassle.
basically just list all the reasons that drove language design in past half a century.
I was hoping that this would be related to Colah's "Neural Networks, Types, and Functional Programming" https://colah.github.io/posts/2015-09-NN-Types-FP/ (but it is not). In it, he establishes correspondances between different NN architectures and different FP tricks like `Zipped Left & Right Accumulating Map = Bidirectional RNN`.
Does anyone know any other resources that take this approach?
I get the overall idea why they make sense. But the fact that the author does not address GPU acceleration either means that she's not thought about that, or that she thinks its implementation is trivial.
Either way, I would need a deeper dive along those lines to be convinced of that the argument has real-world merit, and can actually be implemented in practice.
FWIW my estimate is that 90% of production training loads in the wild are done on GPU. Please correct me if my assumption is wrong.
always wonder why functional languages or constructs are often described as being beautiful or using other aesthetic terms. just strange that they are the only ones we ascribe such attributes, among the few paradigms and the many coding styles out there.
Usually people who talk like this haven’t dealt with functional programming in the real world. I fell in love with the beauty of functional programming in grad school.
After using Haskell in a real business setting for many years, no thanks. The code and headaches and weird bugs are just as bad as with any other language or paradigm. The type checking doesn’t actually help you to create functionally correct systems at all (let alone the false claims that it makes this process faster). Putting designs into the type system doesn’t matter. It’s little more than a useful mind-expanding way to think about code, but ultimately just as applicable to other paradigms. Mutating state is very often the right tool for the job. Plenty of experienced functional programmers write immutable, pattern-focused functional programming implementations that are full of bugs, confusing misdirection, unsafe handling of data, etc.
Library support is still insanely poor compared with C/C++, Python and Java. You have to insanely careful not to allow architectural astronauts to build out inscrutable systems that satisfy their weird personal opinions about aesthetics or interface design.
Hiring more engineers is super hard. Building in safety for when senior engineers choose to leave is even harder.
Integrations with emerging technologies like new database systems or new statistical packages always lags way behind other languages and usually exists in some half-finished form that depends on labor of love from some isolated hobbyist.
In terms of basic business decision making, I cannot recommend enough to avoid functional programming.
I don't agree with your conclusions, but that didn't stop me from upvoting your comment. You touch on all sorts of important issues management should consider while deciding whether to adopt an oddball programming language or technology as the basis of its business. I write this as a CTO that chose Clojure as the basis for our now-acquired and fully-metabolized-by-the-buyer startup.
You better make sure the person who is advocating for the oddball technology…
* Is going to stick around; even cofounders sometimes bail or get caught.
* Understands the opportunity costs i.e. what is being given up by not using boring tech.
* Has used the tech on the ground and not just in a comp sci lab or to implement a to-do app.
* Will be able to grow i.e. recruit additional members to the team as the needs of the business grow.
* Really is a 10X programmer -- assuming they exist.
* Can sell the decision (in a way that doesn't consume undue positive karma) to VCs, customers, acquirers, etc.
In my situation, Clojure worked for us because I'd already spent years writing Clojure (and Scheme and Lisp) code and I understood the problem domain (I was the head of product at the time) and I had made a personal commitment to the CEO to stick around for at least two years and I managed to not get arrested or do anything awkwardly disgraceful. We also got lucky in that Clojure had a bit of a moment as a smart person's programming language in the mid two thousand teens.
If I weren't hands-on, I probably wouldn't have advocated for Clojure and instead would have opted for something mainstream for backend infrastructure at the time e.g. Java, Node.js, Python.
> Integrations with emerging technologies like new database systems or new statistical packages always lags way behind other languages and usually exists in some half-finished form that depends on labor of love from some isolated hobbyist.
This really sets Clojure apart from Haskell, in that you can use any Java library with almost zero friction.
I don’t have experience in Clojure but I do have experience in Scala where this claim is also made often.
In Scala anyway, it’s totally false. Firstly there’s just a huge impedence mismatch between code relying on Scala features and code serving as a Java wrapper into Scala. But even just the basic usage of the libraries doesn’t always work.
I very nearly quit a job once just from the incredible frustration of trying to use JFreeChart in Scala.
For a while, a few years ago, I wrote a lot of scripts and personal tooling in Clojure, calling Java APIs. I don't remember really having any difficulties calling Java code from Clojure.
It's a very different world in Clojure. Data structures mutually satisfy the relevant interfaces, and there's no real pain about how a java.util.Map isn't immutable and so on.
> The type checking doesn’t actually help you to create functionally correct systems at all (let alone the false claims that it makes this process faster).
This is a red flag to me. Of course the tool doesn't guarantee the result, but your story smells of incompetence. I've introduced FP and a focus on types-as-a-tool at the places I've worked and hired accordingly. Quality and speed of development have gone up after minimal time investment. A battle of n=1 I guess?
Not really. People make all kinds of crazy claims about the benefits of static type checking or compiler enforcements. These are perfectly fine tools, but you can achieve the exact same thing with a low effort set of unit tests most of the time.
The teams I’ve worked on with functional programming always had super senior engineers, people who had previously spent serious time on core GHC development and people who had worked as compiler engineers at e.g. Lightbend. They displayed all the same sources of slowness, defects, deadline and/or feature creep, bad designs leading to required refactors or rework, etc., as any teams I’ve been a part of that do everything in Python.
Static type enforcement is just a tool like anything else. It locates a certain class of code problems at compile or type checking time, that’s all. In most large projects, speeding up the detection of this class of errors is just not very important, and can be extremely well approximated with low effort test cases and other techniques.
But on the other hand, it's been hard to put my finger on the tangible benefits of all that time spent on Haskell. I'm worried that I end up going deep into that rabbit hole without really benefiting from it.
I think that's fine, as long as it's your own decision from your own experience. On the other hand if you're struggling with anything in particular then we tend to be friendly and helpful over at https://www.reddit.com/r/haskell/. The monthly Hask Anything threads might be useful for you.
Do you have any thoughts on Elixir? I've been playing around with it and Haskell at the same time. Compared to Haskell, Elixir's dynamic type system and impurity make it a lot more practical to get stuff done, in my experience.
I'm just like you. I use Elixir for anything serious and Haskell for personal or small scale projects.
The way I see it is that Haskell abstracts away the runtime considerations. You spend the majority of your time modeling your business domain in Haskell. Or rather, retrofitting your business model into Haskell's type system.
In Elixir, in contrast, you spend the majority of your time building systems, in other words on runtime problems. All the tooling and even the whole OTP revolves around this. For anything that runs on the web, this is a major benefit compared to Haskell.
My suspicion is that people like declarative better than imperative. When a program reads as a definition it seems to be much more innocent than a bunch of commands.
Same split between descriptive/theoretical, and applied science. The first just increase knowledge - pure innocence without side-effects - the second may build bombs.
(But in the end both functional programs, and pure knowledge made with the intention to go through some interpreter with side-effects (computer/human) anyway...)
The way I understand the parent comment (and I am open to correction), the question is why do we specifically describe functional programming with terms such as 'beautiful' - why do we not use say 'beautiful' for paradigms such as object-oriented, or aspect-oriented, or event-driven?
People find order and structure attractive. Functional programming is associated to that.
I feel like functional programming - maybe more from a categorical perspective - is good at generalising and capturing lots of ideas cleanly. For example you could think of object-oriented programming as using comonads - Haskell is an object oriented language! :).
I feel like that power to connect ideas, and see how they fit into some "larger" structure is what is beautiful. Being able to say what is "really" going on behind the scenes (yeah well really you are just using monads to do x...), being able to draw connections, and then use that to potentially discover new ideas is what is attractive.
Usually other programming paradigms are not constructed from a more rigorous mathematical perspective, but if its a good idea, don't be suprised if you cant describe it with fp.
this is a pet peeve of mine - the use and abuse of humanist terms in stem. you see this a lot in a math - beautiful proofs, beautiful formulas, etc. I think it's a really weird perverse even thing to look at something purely formal (mechanical) and call it beautiful - like brutalist architecture - I think it says something about the person making the claim (that they appreciate structure to an inhuman extent).
Humans naturally associate aesthetic qualities with all sorts of things. An appreciation for the things around you makes life richer. Maybe you don't like the word beautiful in this context (or feel like it's overused), but I wonder if you'd argue with using the word elegant to describe something like quicksort or a cleverly composed fibonacci function? Beauty takes many forms, and getting hung up on nomenclature seems silly TBH. Characterizing this appreciation as perverse and inhuman is essentially taking offense to someone else's enjoyment in something they appreciate. Seems pointless.
elegant is fine. beauty and elegance aren't the same thing.
>Characterizing this appreciation as perverse and inhuman is essentially taking offense to someone else's enjoyment in something they appreciate. Seems pointless.
it's a humanist critique of the use of language for rhetorical purposes...? the purpose is exactly to investigate what the significance of that particular word choice is.
Does elegance describe well-defined observable properties that are independent of the observer?
The point being, that if not, we're still within the domain of the subjective. It seems arbitrary to allow one set of subjective descriptions and disallow another.
elegant: "the quality of being pleasingly ingenious and simple; neatness."
definitely less subjective than beautiful. and it's not arbitrary - i've alluded to an argument that can be made. i'm not going to spell it out because as usual no one on here is receptive anything that doesn't valorize stem.
i'm sorry but what is the shape or form of a proof? a function maybe - and calligraphy is certainly an art but what's wholly different from what the function encodes.
I like elegance. Beauty is highly subjective, but I think elegance is more focused on simplicity and efficiency - though still tied up to beauty.
Basically; If someone says that something is beautiful, it's pretty much in the eye of the beholder. If someone shows that something is very simple but effective, it's much harder to argue against.
If a line of code can replace a block of code efficiently, while still maintaining readability and cleanness, then that's something I would call elegant code.
I don't think it's clear at all that the natural grouping or division should be "STEM" vs "humanities". I think that mathematics belongs much closer to the humanities, especially philosophy and aesthetics, than to science or engineering.
Most colleges give you a BA for mathematics not a BS, btw (to reinforce your point). Which I think makes sense. Conputer science for some reason usually nets you a BS though, despite not being related to science in the least bit (and almost the same as math).
Software is written and maintained by humans. Buildings are designed and lived in by humans (I happen to like many brutalist buildings). They are not purely formal, they need to have psychological appeal; pathos + logos.
Related: TVM's (https://tvm.ai) intermediate representation Relay is a functional, ML-like (ML as in ocaml) language. It's being actively developed here at UW, with constant improvements to support the latest models (eg dynamic models including language models and graph convolutional networks).
Disclaimer: I'm doing my PhD at the University of Washington and work with the Relay people.
The first versions of Gorgonia[0] (obvs not called Gorgonia) was written in Haskell.
I found it difficult because Accelerate wasn't there yet when I wrote it. And cofounders found it difficult to understand (because Haskell was a strange language).
On the one hand it is easy to implement a computation graph in Haskell. On the other hand it is not so easy to implement an efficient kernel for which the computation of the values would run - you end up writing weird looking C that looks like Haskell, which I submit is not necessarily a good thing.
Working with Gorgonia IMO has been the clearest thing for deep learning for me. Granted it's a bit biased because I wrote the damn thing.
Nonetheless, writing it first in Haskell had its benefits, the structure was clear and hence it has imparted some design decisions in the Go library.
From time to time I still have to add new features to Gorgonia, so I still plot out roughly using Haskell. Nowadays I recommend Grenade for anyone wanting to play with deep learning in Haskell
I wanted this article to convince me, but they really don't acknowledge the reason that everyone currently uses Python - the libraries. Do linear algebra using lightly wrapped C? numpy. What about NLP? spaCy. Implicitly specify a computation graph with high level code? PyTorch. Explicitly specify a computation graph using leaky C++ abstractions? TensorFlow. And using either PyTorch or TensorFlow, you get to interact with CUDA.
For now, if you want a functional language for doing deep learning, IMO that language needs Python interoperability. Long-term, I'm hopeful that GraalVM [0] can provide a way of calling Python from the JVM, but until then, I think the best option is coconut-lang [1], "a functional programming language that compiles to Python." You get pattern-matching, TCO, the pipe operator, and ADTs, all while being one AOT compilation step away from Python.
For what it is worth, Apache MXNet has official Clojure bindings and you can use Deeplearning4j as well. There is also Neanderthal for CUDA/OpenCL programming in Clojure. Theoretically the ease of implementing DSLs in Clojure should be an advantage for the computation graph use case you are talking about.
I think the reason the parent mentions GraalVM instead of Jython is that Jython can't use C libs like numpy. But GraalVM lets you compile JVM to a native binary, so maybe there is some way to use Cython in JVM.
There is a lot here about performance - but I think much more important aspect here is the training - in python its extremely easy to manipulate with files and other data sources - sometimes its quite ugly but its simple, whereas using for example Haskell for data processing wont be as simple...
One of the most striking difference between Clojure and Haskell:
In Haskell Immutability is an Abstraction, not Implementation. Clojure data structures and INTEGER & FLOAT object wrappers are expensive. Haskell build toolchain produces much-optimized code which preserves the functionality at high level and reasonable performant mutating code at a low level.
As far as I know, Clojure usage is on the decline at Facebook. It was only used by an acquired subsidiary (Wit), and lack of internal support meant migrating to another supported language.
I certainly don’t know what the future holds for clojure, but I feel that when the community adopted Slack, they lost a fundamental learning resource and talent attractor.
Why do I say this? Because very interesting and deep and enlightening conversations and explanations and questions used to be in a forum environment where threads of conversation, topically organized, were easily followed and might be resurfaced and continued years after they had gone dormant.
All that is lost now. Slack was embraced, no doubt for the immediacy of response, and topical history is vanished. Everyone jumped onto slack to try it out, and people stopped checking the forum altogether.
The forum is a wasteland now populated with occasional version bump announcements. Nobody bothers to post there because nobody reads it anymore. And the wisdom of the community is vaporizing as soon as it forms.
It’s a massive loss, and deeply tragic. Beyond the insight and wisdom, the forum had a cohesive effect. It was a grounding, a center and base. Slack is transient stream of consciousness.
Even with logs, the slack history is worthless by comparison.
Until a medium of exchange develops that the community wants to adopt, which can serve as a useful knowledge aggregator, I think clojure is on the path to obscurity.
The author seems to be confused and uninformed about how parallelism is implemented in ML libraries. Libraries don't mostly rely on spawning threads for parallel execution! Parallelization happens at the hardware level using SIMD by running the calculations for backprop on a GPU/TPU/DSP. Not by creating new threads on the CPU. Because of this, you get the same type of isolated execution you would have with a functional language.
Which leads to the second point. Libraries like TF are already designed around a data flow execution model and provide functional APIs. It is nonsense to assert these libraries are missing something fundamental that Clojure and Haskell would magically fix.
This is beyond my level of understanding but I believe there are physical implementation paradigms in all languages that make them either capable or not of performing certain things. While you can always bend a turing-complete language to do the logic, it doesn't mean your abstractions carry all the way down to hardware — there is "overhead" or "loss of efficiency" if you will. You get the convenience, and perhaps organize the safety; but it only goes so far.
For instance Go was designed based on SCP (notably), a theoretical language designed by Tony Hoare in 1978 which implements a workable and fairly efficient concurrent paradigm. At the compiler then runtime level (low-level things you'll never be able to 'change' with libraries), when the program is built then executed, the implementation is directly able to translate into the physical topology of multi-core machines (Go design began in the late 2000's, shortly after Intel released the Core Duo, first multi-core CPU).
Now you can do parallel and/or concurrent in other languages, of course, but since they were not designed as such, it's a convoluted programming exercise to say the least. For proof, observe how little software is able to multi-thread efficiently today, how very little is built in a concurrent approach — it's just a mess in most languages, the added complexity not worth the cost.
This is an example of how you can get the functionality of a fundamental feature but the cost, the implementation curve is so steep that it doesn't really fly — reality, money, skill pool, all these things.
On topic, functional languages like Haskell do indeed "fix" or rather implement some features in more efficient ways than e.g. Python; it's always a trade off you know. No best tool, only tools best suited to a given case.
The point was that the concurrency model of these ML libraries is not dependent on a CPU for backprop (ideally), so discussing language level concurrency as an advantage is irrelevant. The parallelism is not implemented at the language level, nor does it rely on language constructs for synchronizing shared memory. The parallelism happens at the hardware level (e.g. with SIMD). And in the model architecture (e.g. with convolutions).
So the advantages you posited in your lengthy diatribe are meaningless, because you failed to take into account the problem domain, along with utterly failing to grasp what gets calculated during backprop. Plus, backprop doesn't get calculated in Python.
The argument about functional vs imperative languages is old, tired and is not relevant to the subject. Please at least learn some fundamentals about NNs and ML frameworks before contributing a long, uninformed response.
> the concurrency model of these ML libraries is not dependent on a CPU
Gotcha, my bad.
> Please at least learn some fundamentals about NNs and ML frameworks before contributing a long, uninformed response.
My apologies. It's not like I enjoy wasting people's time, starting with mine. It was an honest mistake, point taken, and thanks for explaining.
If you have a somewhat 'definitive' book or chapter/resource to share on "understanding concurrency"... please do so. I'm eager to learn, and I'd wager many reading such a thread.
I don't think it touches upon ML, but it is relevant for understanding the difference between GPUs and CPUs. You can go through the intro to backprop lectures on deeplearning.ai for ML.
While you can always bend a turing-complete language to do the logic, it doesn't mean your abstractions carry all the way down to hardware — there is "overhead" or "loss of efficiency" if you will. You get the convenience, and perhaps organize the safety; but it only goes so far.
Well, hmm, I think you're right about the "hardware bending" part but unfortunately you are completely wrong about "Turing complete" part (I'm not sure where that comes in, tb). Indeed, a big factor with all this is that a c/c++ vector extension is pretty good reflection of a GPUs operations (and I'd assume a TPU, it being similar).
To wit. If you look at the assembly code of a GPU (Nvidia PTX), appears is reasonably similar to the machine code of CPUs but some extra sorts of primitives to allow communication between kernels. Otherwise, the code is "SIMD" (or STMD as Nvidia says); Most of a program is a sequence of imperative instructions that are different from CPU instruction in that they're acting on a huge "vector" of values. You really don't "threads" in the fashion that CPUs have - as the GP says. What you talking about is a system that's absolutely "Turing complete" (or as Turing complete as assembler or C on a CPU) but with extra facilities. The parallelism involved is absolutely different from the parallelism of current CPU systems. This is why just everything is translated to c++ which is then translated to PTX.
'Turing complete' was my uneducated way of saying "a program in language A can functionally mimic the behavior of any program in language B" [I guess that part's correct] but this doesn't mean at all that performance is identical in both cases. Ergo other languages can perform better or be easier to program than Python. I'm learning Go these days and I find the approach and implementation refreshing.
But this whole discussion is way over my head, I'm half-wrong and half-trivial.
Thanks again, I'll dig deeper into the concepts you mentioned. I'll confess my primary motive is to become a better programmer and I sadly don't have time to learn everything! ;-)
I have only spent a few evenings playing with Clojure and mxnet and while I appreciate the efforts of the Clojure mxnet subproject team, I think you are still better off for now with Python, TensorFlow, and PyTorch.
A little off topic: I had a deep learning example for Armed Bear Common Lisp (implemented in Java) and DeepLearning4j in the last edition of my Common Lisp book. In the latest edition of my book, I removed that example and the chapter that went with it and replaced it with two examples of Lisp code using REST services written in Python, SpaCy, and TensorFlow - I think that is more practical right now; the situation may change in the future.
EDIT: I also added REST examples using Python, SpaCy, and TensorFlow to the second edition of my Haskell book.