In all honesty, this sounds to me like a whole lot of BS and hype. The name is pretentious, the quotes are ridiculous ("Deep Learning est mort. Vive Differentiable Programming."; "there will be a need for the creation of a whole new set of new tools, such as a new Git, new IDEs, and of course new programming languages"). Maybe I am just ignorant and fail to grasp the importance of such capabilities in a machine learning context, but I have to say, the grandstanding is a bit grating.
Who knows, perhaps this will become the greatest thing since Lisp... What do I know.
A particularly salty developer I knew ages and ages ago once said (apocryphally, apparently) that there was an old Inuit proverb, “everyone likes the smell of their own farts.”
Google is still taking themselves very seriously while everyone else is starting to get bored.
The problem with being 25 is that you have about 8 years ahead of you before you figure out how full of shit everyone is in their twenties, and maybe another 8 before you figure out that everyone is full of shit and stop worrying quite so much about it.
>The problem with being 25 is that you have about 8 years ahead of you before you figure out how full of shit everyone is in their twenties, and maybe another 8 before you figure out that everyone is full of shit and stop worrying quite so much about it.
Boy there is some truth right here. It wasn't until long after I graduated undergrad that I realized just how much bullshit is out there. Even in the science world! When I started actually reading the methodology of studies with impressive sounding conclusions, I realized that easily 30-60% were just garbage. The specific journal really, really matters. I'd say 90% of science journalism targeting laymen is just absolute bullshit.
I started actually chasing down wikipedia citations and OMG they are bad!! Half are broken links, a large fraction don't support the conclusions they're being used for, and a massive fraction are really dubious sources.
I realized that so many people I respected are so full of shit.
I realized that so many of MY OWN OPINIONS were bullshit. And they STILL are. I hold so few opinions that are genuinely well-reasoned and substantiated. They are so shallow.
Yet, this is just how the world works. Human intuition is a hell of a drug. A lot of the people I respect tend to be right, but for all the wrong reasons. It's SOOOO rare to find people that can REALLY back up their thinking on something.
Steve Jobs had a slightly more optimistic way to say it: “ Everything around you that you call life was made up by people that were no smarter than you.”
Tight integration with the language may be beneficial in making it simpler to write, but its not like you can't do this already in other languages. Baking it into the language might be useful to make it more popular. Nobody should be doing the chain rule by hand in the 21st century.
If we already have useful phrases like "embedded programming", "numerical programming", "systems programming", or "CRUD programming", etc, I'm not seeing the pretentiousness of "differentiable programming". If you program embedded chips, we often call it "embedded programming"; likewise, if you write programs where differentials are a 1st-class syntax concept, I'm not seeing the harm in calling it "differentiable programming" -- because that basically describes the specialization.
>the quotes are ridiculous ("Deep Learning est mort. Vive Differentiable Programming.";
The whole point is you’re supposed to have the same “something” on both sides (X is dead, long live X), to indicate it’s not a totally new thing but a significant shift in how it’s done
The most well known one:
> The King is dead. Long live the King!
If you change one side, you’re removing the tongue-in-cheek nature of it, and it does sound pretty pretentious.
Not really. The sentence with the King was used to mean that the new King immediately got into function.
> Le Roi (Louis ##) est mort. Vive le Roi (Louis ## + 1) !
Using the sentence with Deep Learning and Differential Learning just suggests that Differential Learning is the heir/successor/evolution of Deep Learning. It does not imply that they are the same thing.
As a French person used to the saying, Le Cun probably meant that.
It does, actually. What matters is that there is a king, not who is the king.
The saying works because, as you say, it suggests a successor, but the successor has to use the same title, because what people want is a new king, so that nothing changes and they can live as they did before the king was dead, not a revolution with a civil war tainted in blood.
If you do volontarily change the title, it's because you think the new one will be better, which is pretentious.
> Using the sentence with Deep Learning and Differential Learning just shows how Differential Learning is an evolution. It does not imply that they are the same thing.
... where did I imply it means they're the same thing?
From my comment:
> indicate it’s not a totally new thing but a significant shift in how it’s done
You could say, an evolution?
-
A snowclone is a statement in a certain form. The relevant form that English speakers use (I'm not a French person, and this is an English article) is "X is dead, long live X", where both are X.
That's where the "joking" the above comment is referring to comes from, it sounds "nonsensical" if you take it literally.
If you change one X to Y, suddenly there's no tongue-in-cheek aspect, you're just saying "that thing sucks, this is the new hotness".
I suspect the author just missed that nuance or got caught up in their excitement, but the whole point of a snowclone is it has a formula, and by customizing the variable parts of that formula, you add a new subtle meaning or tint to the statement.
They’re kinda fucked because differential analysis was coined long ago to describe a set of techniques for attacking bad cryptography.
Differential programming would be less flashy but may be confusing.
I wouldn’t actually be interested in this topic much except for the top level comment complaining about them wanting a new version control system for this and now I’m a bit curious what they’re on about this time, so will probably get sucked in.
"differential" and "differentiable" are different words. Both are being used correctly. Is the problem only that the two words look kind of similar? That seems like an impractical requirement.
God I hate the trend of using a prefix on its own as a stand in for '<prefix><thing>'. I think it's a symptom of politicians trying to sound stupid to avoid sounding elitist.
What do you think will have more impact on the economy, crypto or cyber?
I think using the shorten version is perfectly valid only provided the context is correct.
If you are talking to someone about cryptocurrency, referring to it as crypto later in the conversation in context is perfectly valid and doesn't lessen the meaning.
I do however agree with you when outside of it's context that these shortened names are horrible and effectively buzzwords.
Why would a language where it is possible to manipulate certain functions to get their derivatives require a new revision control system or IDE?
And speaking of Lisp - wasn't symbolic differentiation a fairly common thing in Lisp? (basically as a neat example of what you can do once your code is easy to manipulate as data).
Symbolic differentiation is a fairly common exercise but it is inefficient (the size of the derivative grows exponentially in the size of the original expression). "Automatic differentiation" is the term for the class of algorithms usually used in practice, which are more efficient while still being exact (for whatever "exact" means when you're using floating point :-)
AD still explodes for “interesting” derivatives: efficiently computing the adjoint of the Jacobian is NP-complete. And, naturally, the Jacobian is what you want when doing machine learning. There’s papers from the mid-90’s discussing the difficulties in adding AD Jacobian operators to programming languages to support neural networks. This article is just rehashing 25 year old problems.
Correction: Finding the optimal algorithm (minimal number of operations) for computing a Jacobian is NP-complete, but evaluating it in a multiple of the cost of a forward evaluation is standard.
Also, many optimizers that are popular in ML only need gradients (in which case the Jacobian is just the gradient vector). Second order methods are important in applications with ill-conditioning (such as bundle adjustment or large-scale GPR), but they have lots of exploitable structure/sparsity. The situation is not nearly as dire as you suggest.
Around 2000 I was accidentally inventing a Bloom filter variant (to this day I don’t know how I missed the Google papers at the time) for doing a large set intersection test between two machines.
Somehow, I ended up with a calculus equation for determining the right number of bits per entry and rounds to do to winnow the lists, for any given pair of machines where machine A found n entries and machine B found m. But I couldn’t solve it. Then I discovered that even though I did poorly at calculus, I still remembered more than anyone else on the team, and then couldn’t find help from any other engineer in the building either.
Eventually I located a QA person who used to TA calculus. She informed me that my equation probably could not be solved by hand. I gave it another day or so and then gave up. If I couldn’t do it by hand I wasn’t going to be able to write a heuristic for it anyway.
For years, this would be the longest period in my programming career where I didn’t touch a computer. I just sat with pen and paper pounding away at it and getting nowhere. And that’s also the last time I knowingly touched calculus at work.
(although you might argue some of my data vis discussions amount to determining whether we show the either the sum or rate of change of a trend line to explain it better. The S curve that shows up so often in project progress charts is just the integral of a normal distribution, after all)
Thanks - I didn't think it was literally doing symbolic differentiation (I don't work in the area so literally had no idea) - but the basic idea that you apply some process to your code to get some other code doesn't sound that surprising to anyone who has used lisp (and I used to write tools in lisp to write numerical engineering simulations - admittedly a long time ago)
It is different. DVC is a server-less management tool that helps you organize and link to your storage backends. And move data from these backends to your workspace. Git LFS requires a dedicated server. And you can store data only on that server, instead of moving data between a multitude of storage backends (like Google Drive, S3, GCP, local drive).
Mind you - DVC seems to be a platform on top of git rather than a replacement for git. So I'd argue that it's not really a new revision control system.
For the IDE part they have written some cool debuggers that help you understand the differentiation part and catch bugs on it. But I'm not sure why you couldn't just use the debugger, instead of a whole new IDE, much less why you would need a new RCS
In essence there are cases outside the well developed uses (CNN, LSTM etc.) such as Neural ODEs where you need to mix different tools (ODE solvers and neural networks) and the ability to do Differentiable Programming is helpful otherwise it is harder the get gradients.
The way I can see it being useful is that it helps speed up development work so we can explore more architectures, again Neural ODEs being a great example.
Differential programming is about building software that is differentiable end-to-end, so that optimal solutions can be calculated with gradient descent.
Probabilistic programming (which is a bit more vague) is about specifying probabilistic models in an elegant and consistent way (which than then be used for training and inference.)
So, you can build some kinds of probabilistic programs with differential programming languages, but not vice versa.
Swift was railroaded in Google by Chris Lattner, who has since left Google and S4TF is on death watch. No one is really using it and it hasn't delivered anything useful in 2.5 years
Does that really matter that not many people use it? Apple's carve-out of Objective C from the broader C ecosystem spanned something like 25 years.
Sure, the 90's were a rough period for them, but I think a series of failed OS strategies and technical debt are more responsible for that than just what language they used.
You could argue that their ambitions re Swift scaling from scripting all the way to writing an entire OS might never grow substantially outside Apple, but there's also the teaching aspect to think about.
"Objective C without C" removes a whole class of problems people have in just getting code to run, and I'll bet it shapes their mind in how they think about what to be concerned about in their code v what's just noise.
Sometimes things take a little longer to develop. I don't know who will create it, but from my perspective, the need for a statically typed "differentiable" language is extremely high and C++ is not it.
> the need for a statically typed "differentiable" language is extremely high
This is not what Google has found, actually. Teams who wanted to use this for research found that a static language is not flexible enough when they want to generate graphs at runtime. This is apparently pretty common these days, and obviously Python allows it. Especially with JAX that traces code for autodiff
Or has at least found that existing solutions for statically typed "differentiable" programming are ineffective, and I'd agree.
But having some way to check types/properties of tensors that you are doing operations to would really help to make sure you don't get your one hidden dimension accidentally switched with the other or something. Some of these problems are silent and need something other than dynamic runtime checking to find them, even if it's just a bolt-on type checker to python.
There are a lot of issues with our current approach of just using memory and indexed dimensions. [0]
Flexible enough or just... former statisticians have enough on their plate than learning programming so lets use the simplest popular language in existence?
The article gives specific and reasonable motivations for why you might want a new language for machine learning. There are already new tools. Like Pytorch. If you have never used Jupyter notebooks instead of an IDE, give it a try for a few weeks. It was the biggest boost I've seen to my coding productivity in literally decades. A new Git? I don't quite get that one. But considering the author arguably already got three out of his four claims right, maybe there is some reasoning behind that Git claim too.
> If you have never used Jupyter notebooks instead of an IDE, give it a try for a few weeks. It was the biggest boost I've seen to my coding productivity in literally decades
Really? I seem to shoot myself in the foot a lot with jupyter notebook. I can't count the number of times my snippet was not working and it was because I was reusing some variable name from some other cell that no longer exists. The amount of bugs I get in a notebook is ridiculous. Of course, I'm probably using it wrong
If you cant do a "restart kernel and run all cells" without errors, your notebook is not in a good state. But somehow people dont seem to do this regularly and then complain that notebooks are terrible, when its their own process that is shooting them in the foot.
Imo people’s love of Jupyter notebooks is another one of those “this is My Thing and I love it despite the flaws” situations.
Jupyter notebooks are painful to read, the allow you to do silly stuff all too easily, they’re nightmarish to debug, don’t play well with git, and almost every. Single. One. I’ve ever seen my teammates write eschewed almost every software engineering principle possible.
You’re not using them wrong, they shepard you to working very “fast and loose” and that’s a knife edge that you have to hope gets you to your destination before everything falls apart at the seams.
Instead of building things to make Notebooks play nicely with git, why not relegate notebooks to explicitly local exploratory work and when something needs to be deployed have it be turned into proper scripts/programs?
That's on your teammates not the technology. Like any code if you want it to be readable and robust you have to spend the time cleaning up and refactoring the it. Lots of notebooks are easy to read and run reliably.
Did you happen to see a thread on here a week or so ago about “It’s not what programming languages let you do, it’s what they shepard you to do”? (Strongly paraphrased there)
That’s my issue with Jupyter notebooks, between them and Python they implicitly encourage you to take all kinds of shortcuts and hacks.
Yes, it’s on my teammates for writing poor code, but it’s on those tools for encouraging that behaviour. It’s like the C vs Rust debate right: yes people should write secure code, and free their memory properly and not write code that has data races in it, but in the majority of cases, they don’t.
I didn't see that thread. Based on my experience, I don't really buy the premise. I'm not saying different languages can't somewhat nudge you a tiny bit towards better practices. Or simply not allow you to do certain things, which isn't really sheparding is it? But the vast majority of great engineering I have seen is mostly about the team and the many decisions they have to make in the course of one day. Which quickly adds up.
Quality engineering mostly comes from people, not languages. It is about your own personal values, and then the values of the team you are on. If there were a magic bullet programming language that guided everyone away from poor code and it did not have tradeoffs like a hugely steep learning curve (hi Haskell) then you would see businesses quickly moving in that direction. Such a mythical language would offer a clear competitive advantage to any company who adopted it.
What you are looking at really is not good vs. bad, but tradeoffs. A language that allows you to take shortcuts and use hacks sounds like it could get you to your destination quicker sometimes. That's really valuable if your goal is to run many throw-away experiments before you land on a solution that is worth spending time on improving the code.
Yeah really, but I'm a very experienced dev so what I'm getting from it is likely very different from your experience. Consider looking into git or some other version control. If you are deleting stuff that breaks your code, you want to be able to go back to a working version, or at least look at the code from the last working version so you can see what how you broke it.
Hi, author here! The git thing is regarding model versioning. The managing of a ton of very large and slightly different binary blobs is not git's strong point imo.
There are a ton of tools trying to fill this void, and they usually provide things like the comparison of different metrics between models versions, which git doesn't provide.
Differentiable programming allows one to specify any parametrized function and allows one to use optimization to learn the objective.
There is definitely some need for an EDSL of some sort, but I think a general method is pretty useless. Being able to arbitrarily come up with automatic jacobians for a function isn't really language specific, and usually much better results are obtained using manually calculated jacobians. By starting from scratch you lose all the language theory poured into all the pre-existing languages.
Be hard on the article, but easier on the concept - I think there is a lot of potential for differential programming w/ Swift, but this article is not a good advocate.
I would add that verbatim text is sometimes hard to read because it doesn't wrap long lines and small screens require the reader to scroll horizontally so try not to use it for large/wide blocks of text.
Also bullet lists are usually written as separate, ordinary paragraphs for each item with an asterisk or dash as the paragraphs first character.
Reading through the top level comments, they are all a form of surface level aversion to the unfamiliar. It really highlights that many industry trends are based on hot takes and not any sort of deep analysis. This makes sense why Javascript and C++ will remain as entrenched favorites despite their technical flaws.
For those who actually spent time with Swift, and realize its value and potential, consider yourself lucky that large portions of the industry have an ill informed aversion to it. That creates opportunity that can be taken advantage of in the next 5 years. Developers who invest in Swift early can become market leaders, or run circles around teams struggling with slowness of python, over-complexity of c++.
Top three comments paraphrased:
> 1) "huh? Foundation?"
but you have no qualms with `if __name__ == "__main__"` ?
> 2) "The name is pretentious"
is that an example of the well-substantiated and deep technical analysis HN is famous for?
> 3) Swift is "a bit verbose and heavy handed" so Google FAILED by not making yet another language.
You're completely misinterpreting (I'm the author of (1)).
C++ and javascript are languages of professional software engineers, of which there are many many more languages with various pros and cons.
Python has been the defacto standard in scientific/data/academic programming for decades. The only other language you could say rivals it would be MATLAB which is even more simplistic.
My point is that simplicity and clarity matters to people who don't care that much about programming and are completely unfocused on it, they are just using it do get data for unrelated research.
'if __name__ == "__main__"' is not in the example code nor is it a required part of a python program so not really sure what your point is here.
> Python has been the defacto standard in scientific/data/academic programming for decades
In my experience (Genomics) this is simply not true. Python has caught on over the last 5 or so years, but prior to that Perl was the defacto language for genetic analysis. Its still quite heavily used. Perl is not a paragon of simplicity and clarity.
I feel like trying out various languages/frameworks would affect compsci labs a lot less than other fields, since the students probably have some foundational knowledge of languages and have already learned a few before getting there. Might be easier for them to pick up new ones.
(a) While I'm being honest that my observations are based on the fields I have experience, there is no such justification that "It is true broadly for computation in academia" in your comment.
(b) Interpreting "niche" as "small" (especially given your "true broadly" claim): Computational genetics is huge in terms of funding dollars and number of researchers.
I have, my impression when doing an applied math degree more than 10 years ago was that python was by far more prevalent than R. I know through my wife that python is much more prevalent in bio-informatics and bioengineering too.
Doesnt really change my argument though, R is also a slow but simple language that is popular among academics but not professional software engineers. My whole point is that Swift is never going to be popular with academics because the syntax isn't simple enough.
Hasn't been my experience. I was programming mostly in python in 2007 in applied math, oceanography, and physics classes. It had already been established in those fields (at least at my university) in 2007 so it's been at least 15 years.
>4)"Google hired Chris Lattner and he forced Swift down their throat."
Does anyone force anything on Google? This seems to express little confidence in the competence of Google and their people. Perhaps Google chose Swift and brought Lattner in for his obvious expertise.
The biggest drawbacks of Swift are the legacy relationships it has to the Apple ecosystem.
Yet swift is open source, and Apple and the community can fork it if they so choose. This is great news for me personally as an iOS developer and an ML noob who doesn't want to write Python. I can't comment on Julia because I have no experience with it, but I applaud the efforts to build the Swift ecosystem to challenge Python.
I think a lot of the criticisms so far are that it's early days for Swift in ML, and that's one point the author is emphasizing.
I have spent 2 years of my life with Swift and I would say that I have a very well informed aversion to the language.
Some ideas in the language or parts of its core concepts are really good. First class optionals and sum types, keyword arguments, etc., I liked all of those.
Unfortunately, by and large, Swift is lipstick put on a pig.
I have never used any other language that regularly gave me type errors that were WRONG. Nowhere else have I seen the error message "expression is too complex to be type-checked", especially when all you do is concatenating a bunch of strings. No other mainstream language has such shoddy Linux support (it has an official binary that works on Ubuntu... but not even a .deb package; parts of Foundation are unimplemented on Linux, others behave differently than on macOS; the installation breaks system paths, making it effectively impossible to install Python afterwards[1]). Not to mention, Swift claims to be memory-safe but this all flies out of the window once you're in a multithreaded environment (for example, lazy variables are not threadsafe).
In addition, I regularly visited the Swift forums. The community is totally deluded and instead of facing the real problems of the language (and the tooling), it continues to bikeshed minor syntactic "improvements" (if they even are improvements) just so the codes reads "more beautifully", for whatever that is supposed to mean.
But the worst thing is how the community, including your post, thinks Swift (and Apple) is this godsend, the language to end all language wars, and that everyone will eventually adopt it en masse. Even if Swift were a good language, that would be ridiculous. There was even a thread on that forum called "crowdfunding world domination". It has since become awfully quiet...
>Developers who invest in Swift early can become market leaders, or run circles around teams struggling with slowness of python, over-complexity of c++.
While other people will do the sensible thing and learn Rust. Because it runs circles around Swift, it offers many paradigms and can be used in almost any industry, operating system and product, not just developing apps for Apple's ecosystem.
Swift will take over the world when Apple will take over the world which is safe to assume it will never happen.
I am not saying at all that is bad to learn Swift and use Swift, but have correct expectations about it.
I think you misunderstood criticism about the name differential programming and the idea that building in a gradient operator in to a language is somehow a breakthrough that warrants the label "software 2.0".
This is not really about swift. Swift seems to have been chosen because the creator was there when they picked the language, even though he left.
I think my point stands that the criticism on this thread is mostly a surface level reaction and hung up on meaningless slogans like "software 2.0" or "breakthrough".
You use of the word "seems" is very apt here.
Have you considered that Google might have hired Lattner precisely because he is the founder of LLVM and Swift, and they hoped to leverage his organizational skills to jump start next generation tooling? We know google is heavily invested in llvm and C++, but dissatisfied with the direction C++ is heading [0]. They also are designing custom hardware like TPUs that isn't supported well by any current language. To me it seems like they are thinking a generation or two ahead with their tooling while the outside observers can't imagine anything beyond 80s era language design.
I'm a deep learning researcher. I have an 8 GPU server, and today I'm experimenting with deformable convolutions. Can you tell me why I should consider switching from Pytorch to Swift? Are there model implementations available in Swift and not available in Pytorch? Are these implementations significantly faster on 8 GPUs? Is it easier to implement complicated models in Swift than in Pytorch (after I spend a couple of months learning Swift)? Are you sure Google will not stop pushing "deep learning in Swift" after a year or two?
If the answer to all these questions is "No", why should I care about this "new generation tooling"?
EDIT: and I'm not really attached to Pytorch either. In the last 8 years I switched from cuda-convnet to Caffe, to Theano, to Tensorflow, to Pytorch, and now I'm curious about Jax. I have also written cuda kernels, and vectorized multithreaded neural network code in plain C (Cilk+ and AVX intrinsics) when it made sense to do so.
I've taken Chris Lattner / Jeremy Howard's lessons on Swift for TensorFlow [0][1]. I'll try to paraphrase their answers to your questions:
There aren't major benefits to using Swift4TensorFlow yet. But (most likely) there will be within the next year or two. You'll be able to do low level research (e.g. deformable convolutions) in a high level language (Swift), rather than needing to write CUDA, or waiting for PyTorch to write it for you.
You can't. It won't be available for at least a year I'm guessing.
Even then I'm not sure what granularity MLIR will allow.
On the other hand you can do it in Julia today. There is a high-level kernel compiler and array abstractions but you could also write lower level code in pure Julia as well. Check out the Julia GPU GitHub org
If it's not ready I don't see much sense in discussing it. Google betting on it does not inspire much confidence either. Google managed to screw up Tensorflow so bad no one I know uses it anymore. So if this Swift project is going to be tied to TF in any way it's not a good sign.
As for Julia, I like it. Other than the fact that it counts from 1 (that is just wrong!). However, I'm not sure it's got what it'd take to become a Python killer. I feel like it needs a big push to become successful in a long run. For example, if Nvidia and/or AMD decide to adopt it as the official language for GPU programming. Something crazy like that.
Personally, I'm interested in GPU accelerated Numpy with autodiff built in. Because I find pure Numpy incredibly sexy. So basically something like ChainerX or Jax. Chainer is dead, so that leaves Jax as the main Pytorch challenger.
I was looking around for a language to write my own versions of convolution layers or LTSM or various other ideas I have.
I thought I would have to learn c++ and CUDA, which from what I hear would take a lot of time.
Is this difficult in Julia If I would go through some courses and learn the basics of Julia?
This would really give me some incentive to learn the language.
You could just use LoopVectorization on the CPU side. It's been shown to match well-tuned C++ BLAS implementations, for example with the pure Julia Gaius.jl (https://github.com/MasonProtter/Gaius.jl), so you can follow that as an example for getting BLAS-speed CPU side kernels. For the GPU side, there's CUDAnative.jl and KernelAbstractions.jl, and indeed benchmarks from NVIDIA show that it at least rivals directly writing CUDA (https://devblogs.nvidia.com/gpu-computing-julia-programming-...), so you won't be missing anything just by learning Julia and sticking to using just Julia for researching new kernel implementations.
In that benchmark, was Julia tested against CuDNN accelerated neural network CUDA code? If not, is it possible (and beneficial) to call CuDNN functions from Julia?
That wasn't a benchmark with CuDNN since it was a benchmark about writing such kernels. However, Julia libraries call into optimized kernels whenever they exist, and things like NNLib.jl (the backbone of Flux.jl) and KNet.jl expose operations like `conv` that dispatch CuArrays to automatically use CuDNN.
I’m not telling you to switch. I don’t think the S4TF team is telling you to switch anytime soon. At best you might want to be aware and curious about why Google is investing in a statically typed language with built in differentiation, as opposed to python.
Those that are interested in machine learning tooling or library development may see an opportunity to join early, especially when people have such irrational unfounded bias against a language, as evidenced by the hot takes in this thread. My personal opinion, that I don’t want to force on anyone, is that Swift as a technology is under-estimated outside of Apple and Google.
Please read the article. It answers your question pretty straightforwardly as "no, it's not ready yet."
But it also gives reason it shows signs of promise.
So you should get involved if you are interested in contributing to and experimenting with a promising new technology, but not if you're just trying to accomplish your current task most efficiently.
Google hopes you will be using their SaaS platform to do ML, not just use your own server. This is one of the reasons they push hard to develop some instruments.
> To me it seems like they are thinking a generation or two ahead with their tooling while the outside observers can't imagine anything beyond 80s era language design.
Given the ML and Modula-3 influences in Swift, and the Xerox PARC work on Mesa/Cedar, it looks quite 80s era language design to me.
But I'd like to point out that while Google has some of the top C++ experts working for them, is heavily involved in C++ standardization and compiler writing process, in 2016 they claimed to have 2 billion lines of C++ running their infrastructure...
.. and yet they don't suffer from familiarity bias or the sunken cost fallacy I hear in your comment.
Instead Google C++ developers are sounding an alarm over the future direction the language and its crippling complexity:
Google really messed up here, they had an unprecedented opportunity to create a new language for numeric computation and rally the scientific community around that. They hired Chris Lattner and he basically steamrolled that dream by forking Swift.
I don’t see people running over here to write numerical libraries like you see in Julia, that’s largely because of the crowd around Swift. The language is also a bit verbose and heavy handed for what data scientists would prefer. Latner was too close to Swift to understand this. The blame really falls on google project management.
Google creating a new language? To shut it down in 3 years? Swift will be around in 15 years. It would be Microsofts F#, or Googles Dart all over again. It's a monumental task to create a language that people want to use, and an even bigger task to create tools around it (IDEs, language-specifications, cross platform frameworks, package management).
I know this is a hot take but... I doubt Google has the capability to be frank. They created Dart and Go (a.k.a generics ain't necessary). They created Tensorflow 1 which is totally different from Tensorflow 2.
Swift may not be the best but Swift is starting to become such a large part of Apple so it will have backing no matter the internal politics.
The language is not where the battle will be, it will be the tooling.
I would disagree, first Go has been tremendously successful and yes will have generics soon. The ML community really needs a better language than Python, the current alternatives (Julia, Nim, R) are alright but seem to miss the mark in this arena. I see few data scientists excited about Swift, its too heavy handed and deeply embedded in the Apple iOS community.
People are searching for a better language in this space and it's something that often needs a corporate backing. Google is aware of this problem and hired Chris Latner to fix it, its just a bit of unfortunate oversight, I guess we'll keep using Python for now.
Nim is a languge that has good performance and I had good experience porting an enterprise Python application to Nim (for performance gain). For a new user the risk obviously is the newness of Nim but the Nim team was very helpful and prompt whenever I posted a question. Its a very complete and surprisingly issue-less language.
Hopefully Arraymancer will help increase its reach, I wish the implementors all the best.
I don't know that the ML community necessarily _needs_ a better language than Python for scripting ML model training. Python is decent for scripting, a lot of people are pretty happy with it. Model training scripts are pretty short anyway, so whatever language you write it in, it's just a few function calls. Most of the work is in cleaning and feature engineering the data up front.
Perhaps a more interesting question is whether the ML community needs a better language than C++ for _implementing_ ML packages. TensorFlow, PyTorch, CNTK, ONNX, all this stuff is implemented in C++ with Python bindings and wrappers. If there was a better language for implementing the learning routines, could it help narrow the divide between the software engineers who build the tools, and the data scientists who use them?
I think the ML community really needs a better language than Python but not because of the ML part, that works really good, its because of the Data Engineering part (which is 80-90% of most projects) where python really struggles for being slow and not having true parallelism (multiprocessing is suboptimal).
That said I love Python as a language, but if it doesn't fix its issues, on the (very) long run its inevitable the data science community will move to a better solution. Python 4 should focus 100% of JIT compilation.
I've found it generally best to push as much of that data prep work down to the database layer, as you possibly can. For small/medium datasets that usually means doing it in SQL, for larger data it may mean using Hadoop/Spark tools to scale horizontally.
I really try to take advantage of the database to avoid ever having to munge very large CSVs in pandas. So like 80-90% of my work is done in query languages in a database, the remaining 10-20% is in Python (or sometimes R) once my data is cooked down to a small enough size to easily fit in local RAM. If the data is still too big, I will just sample it.
It's an argument that Python being slow / single-threaded isn't the biggest problem with Python in data engineering. The biggest problem is the need to process data that doesn't fit in RAM on any single machine. So you need on-disk data structures and algorithms that can process them efficiently. If your strategy for data engineering is to load whole CSV files into RAM, replacing Python with a faster language will raise your vertical scaling limit a bit, but beyond a certain scale it won't help anymore and you'll have to switch to a distributed processing model anyway.
Can you get things done in python/c++ sure, but the two language problem is a well known issue, and python has a number of problems. People certainly want a better option, and google investing as much as they did validates that notion.
Yes, so to me, the key question is not whether Swift can replace Python's role, but whether it can replace C++'s role, and thereby also making Python's role unnecessary and solving the two-language problem in the process.
I think we can all agree that C++ is a dragon that needs to be slain here. Swift could potentially get close to that for most of the needs, but I still wouldn't bet data scientists would write swift.
As a data scientist, most of my projects have been individual--I'm generally the only person writing and reading my code. No one tells me which language I have to use. Python and R are the most popular, and I use either one depending on which has better packages for the task at hand. I don't use Julia because I don't see enough of a benefit to switching at this point. But I really don't care, they're just tools, and I will use any language, Julia, Swift, whatever, if I see enough of a benefit to learning it. I would just take a day or two and learn enough of it to write my scripts in it.
So I think that's the good news--because of the more independent nature of the work, you generally can win data scientists over to a new language one at a time, you don't necessarily need to win over an entire organization at once.
Getting a company or a large open-source project to switch from C++ to Swift or Rust or whatever, seems much harder.
Ideally they'd get behind a strict subset of typed python that could be compiled the same way that cython is. Numba, PyTorch JIT and Jax are already handling a decent chunk of the language.
RPython is not intended for humans to write programs in, it's for implementing interpreters. If you're after a faster Python, you should use PyPy not RPython.
Numba gives you JIT compilation annotations for parallel vector operations--it's a little bit like OpenMP for Python, in a way.
I just look forward to have a proper JIT as part of regular Python, as PyPy still seems to be an underdog, and JIT research for dynamic languages on GraalVM and OpenJ9 seems more focused on Ruby, hence why I kind of hope that Julia puts some pressure into the eco-system.
For what it's worth, I think learning Julia would be a fantastic investment. I think it made me a much better programmer because it substantially lowers the barrier between 'developers' and 'users'.
I also don't think I could ever go back to using a langue that doesn't have multiple dispatch, and I don't think any language out there has a comparable out-of-the-box REPL experience.
Julia is really nice the only problems are 1) its focus is it being used from the REPL. While you can use a .jl script from the cli it feels wrong because of 2) its lack of a good AOT compilation/runtime/static binary option. Its JIT compiler is really good but you pay a price on startup and first runs hence Julians usually just have long running REPLs they don't close. 3) the ecosystem is still immature there are some amazing parts but also a lot of empty or poor parts still.
If I'm a person who wants to do some data science or whatever and I have very little software background I want there to be libraries that do basically everything I ever want to do and those libraries need to be very easy to support. I want to be able to Google every single error message I ever see, no matter how trivial or language specific, and find a reasonable explanation. I also want the environment to work more or less out of the box (admittedly, python has botched this one since so many machines now have a python2 and a python3 install).
Julia punches well above it's weight in libraries, especially for datascience and has the best online community I've ever been a part of. Googling an error in julia definitely won't give you nearly as many StackOverflow hits, but the community Discourse, Slack and Zulip channels are amazingly responsive and helpful.
I think a big advantage of Julia is that it has a unusually high ratio of domain experts to newbies, and those domain experts are very helpful caring people. It's quite easy to get tailored, detailed personalized help from someone.
This advantage will probably fade as the community grows, but at least for now, it's fantastic.
I've a lot more experience with Julia than any other language (and am a huge fan/am heavily invested). My #2 is R, which has a much more basic type system than Julia.
So -- as I don't have experience with languages with much richer type systems like Rust or Haskell -- it's hard to imagine what's missing, or conceive of tools other than a hammer.
Mind elaborating (or pointing me to a post or article explaining the point)?
I found multiple dispatch to be odd at first, but after adapting my mindset a bit I really like it. It makes it really easy to just drop your functions specialized for your types into existing libraries, for example. It's a win for code reuse.
What do you mean by "types are too shallow"?
Yes, jit can be slow to boot, but I think this is an area they're going to be focusing on.
"the tooling is poor" Not sure I agree here. I think it's great that I can easily see various stages from LLVM IR to x86 asm of a function if I want to.
Boot times still aren't ideal, but I find it takes about .1 seconds to launch a julia repl now. First time to plot is still a bit painful due to JIT overhead, but that's coming down very aggressively (there will be a big improvement in 1.5, the next release with differential compilation levels for different modules), and we now have PackageCompiler.jl for bundling packages into your Sysimage so they don't need to be recompiled every time to you reboot julia.
I also think the tooling is quite strong, we have ana amazingly powerful type system and I would classify discovering multiple dispatch as a religious experience.
> I see few data scientists excited about Swift, its too heavy handed and deeply embedded in the Apple iOS community.
Which is unfortunate, because it would probably be the best language if it were controlled by a non-profit foundation like Python. As it stands it's basically unusable.
Why do you think Swift would be the best language? I am doing a lot of C#, and so far have not seen anything in Swift, that would make it feel better. In fact, at this moment even Java is doing leaps forward, so will quickly catch up on syntax.
And C# and Java have a benefit of JIT VM by default, meaning you only build once for all platforms, unless you need AOT for whatever rare reason (which they also have).
I'd say the culture is very, very different. Java/C#-heads are in love with OOP and create layers on layers everywhere, hiding as much state and methods as they can (you can't use this operator, you'll shoot yourself in the foot!) and rarely doing pure functions. It's just a long way from how math works.
Not saying it wouldn't work, it definitely would, but I think I'd rather switch profession than deal with Maven and Eclipse in 2020.
Swift culture is more about having non-mutable structs that in turn is extended via extensions and heavy use of copy on write when mutable structs are needed. It's a small difference but it's there.
I fail to see how culture is related to the language.
You have a weird notion of mutable by default in either Java or .NET. The former is notorious for builder patter because of that exact reason. Does Swift have special syntax for copy + update like F#: { someStruct with X = 10 }?
Never had problems with Maven. How is Swift different?
People have not been using Eclipse much for a while. There is IntelliJ IDEA for Java and Resharper for C#.
I might be wrong but as I've understood it Builder Pattern is mostly used as a solution to mitigate mutable state from being accidentally shared. Which is duct taping around the complexity instead of removing it.
I don't really know why but the coding patterns (what I call culture) that are popular for each language are very, very different even when they can support the same feature-set.
My understanding is C#'s behavior around structs is identical, except it is not called "copy-on-write". From what I see this behavior is identical to copy-always from language spec standpoint of view. If an actual copy is created is down to the code generator, but semantics is the same.
Because JS wasn't designed for scientific computing like R and Julia are. Best case scenario is that you reimplement all the libraries Python has, but then you're just replacing Python with another generic scripting language instead of a language built for that purpose. Why would data scientists bother switching to JS when Python already has those libraries, and Julia and R have better numeric and data analysis support baked in?
And if Python, Julia and R don't cut it, then there's no reason to think another scripting language would. Instead you'd be looking at a statically typed and compile language with excellent support for parallelism.
JavaScript is a mess of a language in general. But even if that was not true, it is definitely not designed for numerical computation, nor symbolic manipulation, nor high performance, nor reliability.
Going from Python to JavaScript is a step backwards, not forward.
Feels like there's too much impedance mismatch, as integers aren't a first-class citizen in JS. You can use array buffers, but... I imagine you would want precise control over numerical representations everywhere to fully auto-differentiate code.
I tested this to a certain extent and its not a toy. Its well thought out product from a very talented team, and has the ease of coding that we love about javascript. It can run on browsers!
This being said, we should note the strengths of a statically compiled language with the ease of installation and deployment like with Go, Rust, Nim, etc. in enterprise scale numerical computing.
I have similar thoughts. I think that Typescript could allow for a lot of the "bolt-on" type checking that I find appealing with static languages and most of these are just interfaces to the same C/C++ framework, so no reason you couldn't create typescript bindings..
I have to agree. Google suffers from a really bad case of ADHD when it comes to creating anything for consumption by outside developers...there is a long list and Dart and GWT are just two that stand out because there are large codebases out there that suffered because of these decisions.
Frankly, I'm surprised that Go has made it this far - I mean, it's a great language, I get it, but Google is fickle when it comes to things like this.
Dart's relevancy is heavily dependent on Flutter's success.
Seeing how Chrome team pushes for PWAs, Android team woke up and is now delivering JetPack Composer, and we still don't know what is going to happen with Fuchsia, the question remains.
What makes you think Dart and Go have been unsuccessful?
Both are relatively young languages with rapidly growing adoption. Dart was on a downward trend but has seen a rejuvenation in the last few years thanks to flutter.
I was looking into Dart the other day because I guess Google is insisting on using it for Flutter and… it's someone in 2011's idea of a better JavaScript, except it's worse than modern ES2020 JavaScript? Why would anyone prefer Dart to modern JS/TypeScript? It's just not as good.
I just was looking at their code samples on the website, and I was really unimpressed with it. Why learn a new language if there’s nothing distinctive or better about it, you know? It’s just a better ES5 and a worse TypeScript.
In hindsight, I wonder if using Swift would seem like a good idea in retrospect. AFAIK NVIDIA has dropped CUDA support on Mac, and outside Mac and a bit of Linux; how much support does Swift have?
Even though the article talks about the "why not Julia", which is the highest comment at time of typing ... Choosing a cross-compatible language would have kept more people interested in the long-run. Why should I as a Windows user want to learn a language just to use it with Tensorflow; when I'm not sure if such language support and other tooling will generally come to Windows?
Yeah its a very closed ecosystem, that is heavily Apple centric. The Apple war with NVidia is ridiculous, I guess they are just betting on TPUs or AMD to make up the difference. It is also strange that its so tied to Tensorflow rather than being just a better numeric language. Julia is definitely a better option, but I still feel we can do better than that.
Swift is only 6 years old and is growing rapidly on Linux. Despite IBM bowing out it's server frameworks (vapor and Kitura) continue their development. There are Swift compilers for many platforms.
I don't understand why Swift for Windows has not been updated for 2 years. I guess no one, especially Apple, cares about Swift becoming a general purpose language. For that reason alone, I'm skipping over it, although I hear interesting things about it.
A quick search on the Swift forums brings you an announcement that the Swift team is going to support the Windows port which is already mostly finished and will be available in Swift 5.3 [0] and above [1].
"Saleem Abdulrasool is the release manager for the Windows platform (@compnerd), is a prolific contributor to the Swift project and the primary instigator behind the port of Swift to Windows." Saleem's github, https://github.com/compnerd, lists swift-win32 repo, which is a "a thin wrapper over the Win32 APIs for graphics on Windows." So it's one person wrapping Win32. Not too promising yet, but it's early and there's room for Windows programmers to get involved.
I took that text from "5-3-release-process" I'm not talking about Swift compiling on Windows, I'm talking about the GUI situation, but I'll install it and hopefully be pleasantly surprised with a full featured GUI SDK. But don't get me wrong, a supported compiler and std lib for Windows from Apple is a fantastic start.
I don't agree that Windows support affects Swift being general purpose or not.
If a Windows dev can target other platforms using Linux subsystem or containers, the only downside becomes an inability to target Windows desktops and servers, which are not hugely important targets outside of enterprise IT.
Enterprise IT is a user/customer category, not a purpose. And I didn't mean it to include apps. I meant that Swift is not suitable for IT depts that must use Windows servers, which is not that common anymore.
There are some very common purposes that are closely associated with enterprise IT. Writing line of business applications for Windows environments (server and/or desktop) is one of them.
But there is of course a sense in which Swift is a general purpose language as opposed to something like SQL if that's what you mean.
Unfortunately, right now Swift is not (yet) a pragmatic choice for anything other than iOS/macOS apps.
Web includes a very large number of Windows Server environments, either on premises or in the cloud. You really underestimate how dominant Windows is in many enterprises.
Agreed. It would have been better for them to back Julia as it was already a language for numerical computing. And it has an LLVM backend. With Google's backing Julia could be way ahead of where it is now.
I am now growing into thinking Swift may be a very good language, but may also happens to be not the best language for anything. And I am not sure if that is a good thing or a bad thing.
But the question is whether it's worth using it over Kotlin (which is also better Java) - Swift is hamstrung because its primary implementation of standard libraries is limited to single vendor platform. If you try to run it on Linux/Windows, you're forever going to be fighting compatibility quirks stemming from the fact that the main libraries only target the Apple implementation of dependencies. It's a similar situation as Mono users were stuck in.
Kotlin is a language by a small company that only grew out of sheer love by the world. Even Google had to throw in the towel and officially support Kotlin. Flutter is a reflection of "creating a new language" - Dart.
Kotlin doesnt have the heavy handed lock-in of Swift. Wouldnt Swift fundamentally by handicapped by Apple's stewardship ? What if Lattner wants to add new constructs to the language ?
Kotlin is an interesting option, especially now that it compiles to the LLVM. One of the issues I see is the mind space these languages occupy. Kotlin is deeply linked to the Andriod community which oddly feels like baggage.
Julia has done a great job at marketing itself as the language built for modern numerical computing by numerical computing people. They have effectively recruited a lot of the scientific community to build libraries for it. I think the language is flawed is some deep ways, but there is a lot to learn in how they positioned themselves.
I like Kotlin, but the garbage collector isn't really meant for numerical computing I'd guess, and I doubt having to think about LLVM and JVM and JS at the same time is going to work out well for it when it needs such a heavy, heavy focus on performance.
"Java / C# / Scala (and other OOP languages with pervasive dynamic dispatch): These languages share most of the static analysis problems as Python: their primary abstraction features (classes and interfaces) are built on highly dynamic constructs, which means that static analysis of Tensor operations depends on "best effort" techniques like alias analysis and class hierarchy analysis. Further, because they are pervasively reference-based, it is difficult to reliably disambiguate pointer aliases.
As with Python, it is possible that our approaches could work for this class of languages, but such a system would either force model developers to use very low-abstraction APIs (e.g. all code must be in a final class) or the system would rely on heuristic-based static analysis techniques that work in some cases but not others."
Their justification for picking Swift over Julia rings a bit false, unless one reduces it to "we're familiar with Swift, and that's why".
They can't argue for Swift over Julia due to community size, given that Julia is far more portable, and more familiar to users in the scientific domain. 'Similarity of syntax to Python' is another very subjective 'advantage' of Swift: Later in the same document they mention "mainstream syntax" - that is, Swift having a different syntax from python - as an advantage.
I wonder whether they just decided on the language in advance, which is totally fine, but we could do without the unconvincing self-justification.
Julia may be portable but it doesn't run well at all on smaller embedded devices like a Pi or Nano for example and its compiler will be an issue on most mobile devices outside of terminal emulators.
They seem to be saying, they could have picked Julia, but were just more familiar with Swift:
> and picked Swift over Julia because Swift has a much larger community, is syntactically closer to Python, and because we were more familiar with its internal implementation details - which allowed us to implement a prototype much faster.
I think it's very debatable to claim Swift is more similar to Python syntactically, as Julia looks more like a dynamic language to the user. Also, Julia is closer to languages like Matlab and R, which many mathematical and scientific programmers are coming from.
Swift has a much larger community, but it's not clear how big the overlap is between iOS app developers and Machine Learning developers. It probably would make deploying models on iOS devices easier, however.
Well yes, and precisely because the places for 2nd best continues to belongs to python in many areas which is the problem. It is not clear whether Swift is trying to compete with the best or the 2nd best. It seems to be to have wrong sets of trade offs.
IMO the first three lines of the program basically explain why academics and data programmers are never going to use Swift:
Python:
import time
for it in range(15):
start = time.time()
Swift:
import Foundation
for it in 0..<15 {
let start = CFAbsoluteTimeGetCurrent()
This is why people like Python:
- import time: clearly we are importing a 'time' library and then we clearly see where we use it two lines later
- range(15): clearly this is referring to a range of numbers up to 15
- start = time.time(): doesnt need any explanation
This is why academics and non-software engineers will never use Swift:
- import Foundation: huh? Foundation?
- for it in 0..<15 {: okay, not bad, I'm guessing '..<' creates a range of numbers?
- let start = CFAbsoluteTimeGetCurrent(): okay i guess we need to prepend variables with 'let'? TimeGetCurrent makes sense but wtf is CFAbsolute? Also where does this function even come from? (probably Foundation? but how to know that without a specially-configured IDE?)
EDIT: Yes everyone, I understand the difference between exclusive and inclusive ranges. The point is that some people (maybe most data programmers?) don't care. The index variable you assign it to will index into an array of length 15 the way you would expect. Also in this example the actual value of 'it' doesn't even matter, the only purpose of range(15) is to do something 15 times.
In the Python version, you can mostly understand someone else’s code right off the bat, even as a mostly non-technical reader.
The details (e.g. that list indexes and ranges start at 0 by default and are half-open) are consistent and predictable after just a tiny bit of experience.
"Understanding" entails knowledge of the semantics. This means understanding the meaning of syntax, and the behaviour of the functions being called. This is true of both code fragments presented, so if you find one more intuitive than the other, that's your bias and not necessarily some objective feature of the syntax and semantics.
Maybe most people share your bias, and so it could qualify as a reasonable definition of "intuitive for most humans", but there's little robust evidence of that.
The first (and most important) level of “understanding” is understanding the intended meaning. Programming, like natural language, is a form of communication. The easier it is for someone to go from glancing at the code to understanding what it is intended to do, the more legible.
I claim that it is easier to achieve this level of understanding in Python than most other programming languages. (And not just me: this has been found to be true in a handful of academic studies of novice programmers, and is a belief widely shared by many programming teachers.)
Using words that the reader is already familiar with, sticking to a few simple patterns, and designing APIs which behave predictably and consistently makes communication much more fluent for non-experts.
There are deeper levels of understanding, e.g. “have carefully examined the implementation of every subroutine and every bit of syntax used in the code snippet, and have meditated for decades on the subtleties of their semantics”, but while helpful in reading, writing, and debugging code, these are not the standard we should use for judging how legible it is.
Languages that dump crap into the namespace on import are doing the wrong thing. Python has from x import * and every style guide says to never use it. Swift has a lot of other nice features, but the import thing is really a bungle. It is worse for everyone, beginners and experienced users alike. It is even bad for IDE users because you can't type imported-thing.<tab> to get the autocomplete for just the import. You're stuck with the whole universe of stuff jamming up your autocomplete.
>Is it? Does that range start at 0 or 1 or some other value? Does it include 15 or exclude it?
This is like reading a novel that says, "And then Jill began to count." and then asking the same questions. A non-technical reader does not need to know these details. The smaller details are not required to grok the bigger picture.
>Doesn't it? Is that UTC or local time? Or maybe it's CPU ticks? Or maybe the time since the start of the program?
When is the last time someone asked you, "Know what time it is?" and you responded with, "Is that UTC or local time?" Same thing, these details do not and should not matter to a non-technical reader.
Keep in mind, the audience is for non software engineers from people who barely know how to code, to people who do not know how to code but still need to be able to read at least some code.
> Does that range start at 0 or 1 or some other value?
What does range mean? Is it English? Attacking every single possible element of a language is not compelling. The defacto standard, is 0 indexing. The exceptions index at 1.
> - start = time.time(): doesnt need any explanation
People often oversimplify the concept of time. However, on average, the cognitive load for Python is lower than most. Certainly less than Swift. In one case I would look up what time.time actually did and in the case of Swift, I would throw that code away and work on another language with less nonsensical functions, like PHP. /s
> The defacto standard, is 0 indexing. The exceptions index at 1.
"Defacto standards" are meaningless. The semantics of any procedure call are completely opaque from just looking at an API let alone a code snippet, especially the API of a dynamically typed language, and doubly-so if that language supports monkey patching.
So the original post's dismissive argument claiming one sequence of syntactic sugar and one set of procedure calls is clearer than another is just nonsense, particularly for such a trivial example.
> However, on average, the cognitive load for Python is lower than most.
Maybe it is. That's an empirical question that can't be answered by any arugment I've seen in this HN thread.
No they arent. A mismatch between what is expected and doesnt happen, within a specific context contributes to cognitive load. "Intuitive" is a soft term with a basis in reality. The only language (Quorum) that has made an effort to do analyses was largely ignored. Usability in languages exist, with or without the numbers you wish for. Swift is less uable than many some languages and more than others.
The use of `let` to declare immutable values is well-established in programming languages. Academics have no problem with this (and, indeed, prefer it -- at least, everybody I've talked to about it in the PL research community seems to prefer it). The same or a similar form is used in Scala, OCaml, JavaScript, Lisp, Scheme, etc. Some of these languages provide mutable contrasting forms, such as `var`. Tracking mutability so explicitly allows for more advanced static analyses.
Using `..<` and `...` is pretty simple to figure out from context. The former produces an exclusively-bounded interval on the right, while the latter is an inclusive interval. This is functionality that more languages could stand to adopt, in my opinion.
I agree that the names themselves are not very transparent. However, they become less opaque as you learn the Swift ecosystem. Admittedly, this makes them not as immediately user-friendly as Python's simple names, but it's not as though they're some gigantic obstacle that's impossible to overcome.
Personally, I like Swift a lot (even though I never use it). It has a syntax that has improved on languages like Java and Python, it's generally fast, it's statically typed, and it has a large community. The fact that implicit nullable types are discouraged directly by the syntax is phenomenal, and the way Swift embraces a lot of functional programming capabilities is also great. If it weren't so tied to Apple hardware, I would likely recommend it as a first language for a lot of people. (I know that it runs on non-Apple hardware, but my understanding is that support has been somewhat limited in that regard, though it's getting better.)
> However, they become less opaque as you learn the Swift ecosystem
IMO that's essentially the problem. Most people* don't want to have to learn the ecosystem of a language because it's not their focus.
The other issue is that when you start googling for information about the Swift ecosystem, you're not going to find anything relevant to academic, mathematical, or data-science programming. All the information you will find will be very specific to enterprise-grade iOS and macOS development, which will be a huge turn-off to most people in this community.
Writing off a language/syntax/toolset because you couldn’t be bothered doing < 5 minutes of searching to figure out something that will probably yield net-benefits in the future is an incredibly myopic thing to do in my opinion.
Like you said: people in PL research. They specifically work on researching programming languages. But that is just a tiny fraction of what academic world has.
Are academics born with python knowledge? you still need to learn that range(10) is exclusive of the number ten, and that 'time' itself is not a function. Julia for example is much further from 'natural language' programming and seems quite popular.
It's more important that the language can accurately and succinctly represent the mental model for the task at hand, and the whole point of this article is that Swift can offer a syntax that is _more_ aligned with the domain of ML while offering superior performance and unlocking fast development of the primitives.
Julia is similar to matlab by design, which makes it easier for science and engineering folks who are already familiar with it.
I think functional programming advocates underrate simplicity of procedural languages. Programming is not math, algorithms are taught and described as a series of steps which translate directly to simple languages like Fortran or Python.
I think ML is great, but I’m skeptical if it is a big win for scientific computing.
They are proven with with math, but their implementation in code certainly isn’t. If it were that simple, we would be using languages like Coq and TLA+ for writing software. But we usually don’t, because math does not cleanly translate into usable programs, it needs a human to distill it into the necessary steps the computer must follow.
No really. They are math themselves. Algorithms have nothing to do with implementation. The whole CLRS books algorithms are written with pseudocode. By your logic Turing machines and many other models of computations are not math. Just something is imperative doesn't mean it's not mathematics.
Plenty of excellent programmers are not mathematicians. How would that work if programming were just math? That’s like saying physics is just math while ignoring all of the experimental parts that have nothing to do with math.
Range is a concept from mathematics, so an academic should know it regardless if they know Python or not.
Most of the concepts in Python come from academics and mathematics, so it's an easy transition. I don't think math has a time concept in a straight forward way, so time is an edge case in Python.
Have you ever come across a bug where range(10) doesn't get to 10? Even if it is assumed knowledge, it doesn't seem to me to even approach the level of assumed knowledge of time coming from a 'Foundation' library rather than... you know... a time library.
CACurrentMediaTime() / CFAbsoluteTimeGetCurrent() are first of all not deprecated (just check CFDate.h / CABase.h) but return a time interval since system boot so they are guaranteed to be increasing. It's just a fp64 representation of mach_absolute_time() without needing to worry about the time base vs seconds.
Date() / NSDate returns a wall clock time, which is less accurate and not guaranteed to increase uniformly (ie adjusting to time server, user changes time etc)
Oops, you're right on the deprecation point. CFAbsoluteTimeGetCurrent is not itself deprecated but every method associated with it is [1].
Also CFAbsoluteTimeGetCurrent explicitly calls out that it isn't guaranteed to only increase. CACurrentMediaTime is monotonic though.
CFAbsoluteTimeGetCurrent also returns seconds since 2001 and is not monotonic, so there's really no reason to use it instead of Date().timeIntervalSinceReferenceDate. The most idiomatic equivalent to the Python time method is definitely some usage of Date(), as time in Python doesn't have monotonic guarantees either.
Repeated calls to this function do not guarantee monotonically increasing results. The system time may decrease due to synchronization with external time references or due to an explicit user change of the clock.
Python's time.time() call is also going to be affected by system time changes and thus not guaranteed to increase uniformly. So Date() in Swift and time.time() in Python are the same in that regard.
Python's time() call is returning the unix epoch wall clock time. Newbies (and most engineers TBH) are not going to know the subtleties and reasons why you'd use a monotonic clock or to even think of using one or another.
So for this comparison, it is better to use Date().
"long deprecated" as in 20 years long; the CF APIs exist mostly for compatibility with Mac OS 9. The only time you really would need to use those functions nowdays is for interfacing with a few system services on Apple platforms like low-level graphics APIs and whatnot.
You're right; I quoted the parent comment even though "deprecated" was not an accurate word choice here, sorry. CF is not deprecated because it is needed on occasion when programming for Apple platforms, but the APIs are largely obsolete.
What's amusing is that -[NSDate timeIntervalSinceReferenceDate] is actually the older of the two, going back to NeXTStep's Foundation introduced with EOF 1994 and later made standard with OPENSTEP 4.0.
Also, does it start with 0 or 1 (or -3023 for that matter)? As a programmer you would assume of course it starts at 0, but since this thread talks about "non-programmer" academic types I think it's worth mentioning. What if I want a range of 1-15, or 20-50, can I still use range()? I can't tell from the Python example but I can tell exactly what I would need to change with the Swift one to make it work exactly how I'd want.
Very true, and this is especially important in data science, where the majority of languages, other than Python, are 1 indexed (Matlab, Julia, R, FØRTRAN).
You learned it for one language. Now imagine that you're working with a handful of languages regularly, some of which have 1-based indexing, some 0-based, some of which may have closed ranges, others half-open ranges.
If you're anything like me, you'll end up spending quite a bit of time looking up the documentation to the range operator to remind yourself how this week's language works again.
You're kind of proving you've never used Swift before. The real problem with Swift has nothing to do with the Syntax or API. It has to do with the segmentation in documentation, training materials, best practices, and design patterns. The journey from 0 to "best way to do x" is awful with Swift compared to other languages. It's pretty damn telling that the best way to learn iOS dev is still piecing together random Ray Wenderlich blogs (written across different versions of Swift!).
The Swift manual is pretty good actually, documentation around Foundation is pretty complete although a bit spare but yeah....UIKit and other libraries used for creating iOS applications are really not very well documented. The last few years I've been copying code from WWDC demo's to learn about new stuff. I tried to learn how to capture RAW data when the API was just out, so no Stack Overflow answer out yet. It was hard as hell.
But anyway that's not a Swift problem. Swift itself is pretty easy to get into.
Ya it is UI bound (though you’d think that’s where a lot of the documentation would be!). I’d also say JSON handling and other data handling aspects are poorly documented, would you agree?
I agree Codable is one of the worst ways of dealing with JSON apart from manually parsing through dictionaries. I mean It Just Works on a very clean and consistent API, but if people started to mix snake_case with PascalCase, return "1" or any other garbage shit people write when they only thing they have to care about it JS clients then you're typing a lot of unreadable boilerplate.
Since we have custom attributes I will investigate soon if there's a nice framework around that can make it work a bit like the usual JSON libraries in for example C# work.
> - start = time.time(): doesnt need any explanation
Python's no saint when it comes to time stuff either. I had some code using time.strptime() to parse time strings. It worked fine. Then I needed to handle data from a source that included time zone information (e.g., "+0800") on the strings. I added '%z' to the format string in the correct place--and strptime() ignored it.
Turns out that if you want time zones to work, you need the strptime() in datetime, not the one in time.
BTW, there is both a time.time() and a datetime.time(), so even that line that needs no explanation might still cause some confusion.
Python isn’t alone in that. Designing a date&time library is so tricky that, even after having seen zillions of languages fail and then replace their date&time library with a better one, most new languages still can’t do it right the first time.
I think the main reason is that people keep thinking that a simple API is possible, and that more complex stuff can initially be ignored, or moved to a separate corner of the library.
The problem isn’t simple, though. You have “what time is it?” vs “what time is it here?”, “how much time did this take?” cannot be computed from two answers to “what time is it?”, different calendars, different ideas about when there were leap years, border changes can mean that ‘here’ was in a different time zone a year ago, depending on where you are in a country, etc.
I guess we need the equivalent of ICU for dates and times. We have the time zone database, but that isn’t enough.
I use Python a few times a month for some simple scripting usually. Every time I have to look up how to use `range()` correctly, usually because I forgot if it's inclusive or exclusive. Academics that are used to Matlab or Julia will also have to look up if it starts at 0 or 1.
Furthermore, it's obvious what `time()` does in this context, but if I was writing this code I would _absolutely_ have to look up the correct time function to use for timing a function.
That's exactly my point. The only people who use Java are professional software engineers mostly working at very large companies with teams in the hundreds. Almost nobody in academia uses Java.
In many fields of science Python is already the dominant language, in others (like neuroscience), the writing is on the wall for Matlab. Approximately all the momentum, new packages, and new student training in systems neuroscience that I’ve seen in the last 5 years is in Python.
I apologize :-/ I should have been more clear. What I was referring to was in a little more broader context than just Data Science or ML. Many of the Engineering and Math PhDs I work with typically use MATLAB or Mathematica.
It really depends heavily on the engineering field. I do work in optical/physical engineering (photonics, nonlinear optics, quantum computing) and essentially operations research (optimization theory) and almost everything we use is Python (as a C + CUDA wrapper/HDL-type thing) and Julia (which I'm trying to introduce for code reusability, even if it is only marginally slower than the former).
At least in my university, most people really do use Python + C and Julia for many, many cases and MATLAB and such are used mostly in mechanical and civil engineering, some aero-astro (though a ton of people still use Python and C for embedded controllers), and Geophysics/Geophysical engineering (but, thanks to ML, people are switching over to Python as well).
I think even these fields are slowly switching to open versions of computing languages, I will say :)
Yeah I know what you mean. I'm mechanical engineering (Controls) and the vast majority of them still use MATLAB, but they are slowly moving towards more open computing languages. I can only consider this a great thing! :)
The issue I see is with the undergraduate curriculum in many Universities. This is where I see the legacy use of MATLAB is really hurting the future generation of students. Many still don't know proper programming fundamentals because MATLAB really isn't set up to be a good starting point for programming in general. To me, MATLAB is a great tool IF you know how to program already.
Oh yeah, it’s a killer I’m not going to lie. I have the same problem with some classes here (though I haven’t taken one in years) and it’s quite frustrating since students are forced to pay for mediocre software in order to essentially do what a normal calculator can do anyways (at least at the undergrad level).
I work in a massive research institution with a lot of medical doctors. They almost all use R if they can program. I try to encourage the use of Python to help them slowly pick up better programming fundamentals so they dont miss out on whatever the next wave is in a decade. Learning R doesn't teach you much about other languages but IMO learning Python can help you move languages.
> Many of the Engineering and Math PhDs I work with typically use MATLAB or Mathematica.
Yes, and the government still needs COBOL programmers.
Going forward, I believe Python has far more momentum than either MATLAB or Mathematica. I think far more MATLAB and Mathematica users will learn Python than the other way around in the future, and far more new scientific programmers will learn Python than either of those.
MATLAB's foothold in academia is due to legacy familiarity, cheap (but not free) academic licensing, a nice IDE, and good toolboxes for certain niches (Simulink, Control Toolbox). I used MATLAB for 12 years in academia and would consider myself an advanced user.
However, when I left academia (engineering) 8 years ago, its use was already declining in graduate level research, and right before I left most professors had already switched their undergrad instructional materials to using Python and Scilab. I observed this happening at many other institutions as well. Anecdotally, this trend started maybe 10 years ago in North America, and is happening at various rates around the world.
I'm in industry now and MATLAB usage has declined precipitously due to exorbitant licensing costs and just a poor fit for productionization in a modern software stack. Most have switched to Python or some other language. My perception is that MATLAB has become something of a niche language outside of academia -- it's akin to what SPSS/Minitab are in statistics.
I'm not denying any of this and agree with your analysis about MATLABs use. I'm just saying that it's still used a lot more than people on Hacker News like to think.
The University I work at still teaches MATLAB to new engineering students still.
Oh I understand, I was more responding to your original statement "None of them use Python either. A lot just use MATLAB" which would be an unusual state of affairs in this day and age, though I have no doubt it is true in your specific situation. It's just that your experience seems circumscribed and uncommon in academia today (well insofar as I can tell -- I don't know the culture at every university).
...nobody in academia uses python? I would strongly disagree. The whole point of this Swift library is to provide an alternative to PyTorch which is clearly very popular in the community.
Being in academia myself, I have to disagree as well. Academia has it's own languages and tools it prefers. They have just recently started warming up to Python.
The important thing with syntax is to avoid the illusion of understanding. That's when the language user is confident that the syntax means one thing when it actually means something else. If the user is not sure what something means, they'll look it up in docs or maybe write a few toy examples to make sure it does what they think it does. Python's range() is ambiguous enough that I did this when I was learning the language. I was pretty sure it would create a range from 0 to 14, but I wanted to make sure it wasn't inclusive (0-15).
Examples of the illusion of understanding abound. These aren't all true for everyone, and HN users have been writing software long enough to have internalized many of them, but every language has them:
- Single equal as assignment. Almost every newbie gets bitten it. They see "=" and are confident that it means compare (especially if it's in a conditional).
- x ^ y means xor, not "raise x to the power of y"
- "if (a < b < c)" does not do what newbies think it does.
- JavaScript's this.
Sometimes syntax can make sense on its own, but create the illusion of understanding when combined with another bit of syntax. eg: Python newbies will write things like "if a == b or c" thinking that it will be true if a is equal to either b or c.
The illusion of understanding is the cause of some of the most frustrating troubleshooting sessions. It's the thing that causes newbies to say, "Fuck this. I'm going to do something else with my life."
>Examples of the illusion of understanding abound. These aren't all true for everyone, and HN users have been writing software long enough to have internalized many of them, but every language has them
>The illusion of understanding is the cause of some of the most frustrating troubleshooting sessions. It's the thing that causes newbies to say, "Fuck this. I'm going to do something else with my life."
About 14 years ago (give or take up to 4 years) I read about a study that was done at a prestigious CS university, where some tests were given entering CS students at the beginning of the course to see who was ok with arbitrary logic and syntax and who was not, IIRC it was 40% of the class who would get hung up on "but why" and "it doesn't make sense" and would end up failing but the ones who were able to cope with the arbitrary nature of things would graduate and the others would end up dropping out, changing studies.
About every couple of months I wish I could find that damn paper / study again.
NOTE: my memories of this study might also have been faded by the years, so...
Are you thinking of The Camel has Two Humps[1]? I don't think it was ever published in a journal and the author later retracted it.[2]
It seems like the conclusions of the study were overstated, but the general idea is correct: Those who apply rules consistently tend to do better at programming than those who don't. This is true even if their initial rules are incorrect, as they simply have to learn the new rules. They don't have to learn the skill of consistently applying rules.
I guess it is, it certainly seems like, although I had the memory that their main claim was that the ability to handle arbitrary mental models (not completely logical ones) was the differentiator between those who succeeded and not.
And embarrassingly this thing I've gone around believing for the last 14 years isn't so.
I think were JavaScript's this is concerned, it's different than the others - the others are just little tricky bits of syntax, this is it's own little problematic area of deep knowledge in the language.
It's more like saying people don't understand all of how pointers work in C.
I haven't read OP yet but I don't see what is the issue here. I honestly think what you see as issues are perhaps because of your lack of exposure to a wider range of languages?
CF clearly is a prefix for a library. I'll take an educated guess it means Core Foundation? Pretty common pattern of naming things, with +/- to be certain. And once you've seen it, it is just there, and you know precisely what it means. So 10 minutes of your life to learn what CFxxx() means.
Let. I like lets. Some don't. Surely we can coexist?
x..y is also not unique to Swift. It has a nice mathematical look to it, is more concise.
Btw, is that 'range' in Python inclusive or exclusive? It isn't clear from the notation. Must I read the language spec to figure that out? .. /g
`let` is not a terribly hard to understand keyword. Especially if you've had exposure to functional programming. Most academics I knew actually started out programming the functional way, rather than OO. So I'm not sure if I agree 100% with what your saying.
It's not terribly difficult to understand `let` if you have a background in math, given that nearly every formal proof defines variables with `let x:=`
The "Foundation" and "CFAbsoluteTimeGetCurrent" are very easily fixable surface level details.
"range(15)" vs "0..<15" could go either way.
"let" vs "var" in Swift is indeed something that adds verbosity relative to Python, and adds some cognitive load with the benefit of better correctness checking from the compiler. Very much a static vs dynamic typing thing. That's where you'll see the real friction in Swift adoption for developers less invested in getting the best possible performance.
"Python is slow" argument just shows complete ignorance about the subject. (and there may be good arguments for not using python)
First of all if you are doing "for i in range(N)" then you are already doing it wrong, for ML and data analytics you should be using NumPy "np.arange()", Numpy arange doesnt even run in "python" it's implemented in C. So it may even be faster than swift '..<' . Let me know when you can use swift with spark.
This is actually one of the most frustrating parts about using python. You can’t write normal python code that performs well. Instead you have to use the numpy dsl, which I often find unintuitive and too often results in me needing to consult stack overflow. This is very frustrating because I know how I want to solve the problem, but the limitations of the language prevent me from taking the path of least resistance and just writing nested loops.
my point is that the benchmark is deceiving, again if you are doing data analytics or ML, then you already are using numpy/pandas/scipy, so thats not a valid argument.
The way a range is defined in Swift looks scary for a programmer but immediately looks very natural as soon as you imagine you've forgotten programming an only know math.
time.time() (as well as datetime.datetime.now() and other stuff like that) always looked extremely ugly to me. I would feel better writing CFAbsoluteTimeGetCurrent() - it seems more tidy and making much more sense once you calm down and actually read that.
Python is great for scripting or rapid prototyping because of this, but I can definitely understand why someone would want a more literal language like Swift. Even in your example you can glean more information from the Swift code.
”start = time.time(): doesnt need any explanation”
So, is that evaluated when that statement is run, when the value is first read (lazy evaluation, as in Haskell), or every time ‘start’ gets read. For example, Scala has both
val start = time.time()
(evaluates it once, immediately),
lazy val start = time.time()
(evaluates it once, at first use), and
def start = time.time()
(creates a parameterless function that evaluates time.time() every time it is called)
My original attempt was accidentally more powerful, in that it allowed `cube` to be a stateful function like a closure.
To implement this, the only thing one needs to do in the `#[derivative]` macro is:
* parse the function into an AST
* fold the AST using symbolic differentiation
* quasi quote the new AST
Swift can probably differentiate functions in other translation units because it keeps the bytecode of all functions around. A proc-macro based approach wouldn't be able to achieve that, at least, not in general, but a Rust plugin could, since plugins can drive compilation to, e.g., compile whatever function is being derived, and differentiate its bytecode instead of Rust, and if the function happens to be in a different translation unit, access their byte code.
>Can't you do this in Rust by writing a #[derivative] proc macro without having to modify the language?
That question can be generalized to "since <Turing Complete language> can do <anything-we-want-via-library/function/macro/codegen/customcode/extension/whatever>, can't we just do that instead of modifying the core language?"
Well yes, but the advantage of having 1st-class language syntax includes:
+ more helpful compiler error messages because it has intelligence of the semantics
+ more helpful IDE because the UI has intelligence about the intentions of the programmer
+ interoperability because programmers are all using the same syntax instead of Programmer Bob using his custom macro-dialect and Progammer Alice using her idiosyncratic macro-dialect.
Points 1 and 2 are about introspection, which are hugely language- and implementation-dependent. IME, the languages with great tooling and error messages didn't achieve it by integrating every possible feature into the compiler. (Usually, it's closer to the opposite.) Even different compilers for exactly the same language differ greatly in their ability to extract intention, and present it to users in a comprehensible way.
Point 3 is simply that programmers aren't sharing interfaces, which is indeed a shame, but that's independent of whether you choose to put the interface in the language or the library. Didn't TensorFlow fork the Swift compiler?
> That question can be generalized to "since <Turing Complete language> can do <anything-we-want-via-library/function/macro/codegen/customcode/extension/whatever>, can't we just do that instead of modifying the core language?"
Not really, since that usually requires using weird syntax or APIs, but the Rust proc-macro solution has the exact same syntax as the Swift one:
@differentiable
...cube function
let grad = gradient(at: cube);
vs
#[derivative]
...cube function....
let grad = gradient(cube);
>, but the Rust proc-macro solution has the exact same syntax as the Swift one
You're focusing on syntax but I was talking about the higher-level semantics. When programmers write custom code (e.g. custom macros), as far as the compiler/IDE is concerned, it's just an opaque string that it checks for valid syntax. Those tools have no "intelligence" of what the higher concepts of derivatives/deltas etc.
E.g. the IntelliJ plugin for Rust will have no idea what a the higher semantics of a differentiable function is. It just sees a recursive "gradient()" which is just an arbitrary string of code to parse and it might as well be spelled "abcdefgzyxwvut123()".
I.e. the blog post is talking about an symbiotic ecosystem of tools (e.g. compilers, IDEs, etc) that treats differentiable programming as a 1st-class language feature. The ultra flexibility of Rust or Lisp macros to reproduce the exact syntax doesn't really solve the semantics understood by the ecosystem of tools.
edit reply to: >You seem to be at least suggesting that if one adds differentiable programming as a first class language feature, IDEs would automatically be intelligent about it,
No, that's an uncharitable interpretation. I never claimed that IDEs will automagically understand higher level semantics with no extra work required to enhance the IDE's parser.
I'm only saying that your macro proposal only recreates the syntax but not the semantics -- so you're really not fully solving what the blog author is talking about.
You seem to be at least suggesting, that if one adds differentiable programming as a first class language feature, IDEs would either automatically be intelligent about it, or that making these tools know about the feature would somehow be simpler.
That's not true. IDEs would need to add support for that feature, and whether the feature is a language feature, or a custom Rust macro, for Rust at least, the amount of work required is the same.
Rust IDEs already understand many ecosystem macros (cfg_if, serde derives, etc.). Somebody would just need to put in the work for a new macro, but that would need to be the case anyways.
The same would apply to Rust static analysis tools like clippy. Somebody would need to write lints, and whether those target an ecosystem macro or a keyword doesn't really matter with respect to the amount of work that this requries.
> I'm only saying that your macro proposal only recreates the syntax but not the semantics
What do you think these semantics are? The only semantics I see is being able to compute the derivative of a function at some particular point (for some particular inputs). The proc macro solution has this semantics: it automatically computes the derivative of a function, and gives you a way to query its value at a particular point.
> No, that's an uncharitable reading. I never claimed that IDEs will automagically understand higher level semantics.
So what are you saying? Because that's what I'm honestly understanding from your post.
That somehow this feature being a Rust macro means that it cannot be as good as if it was a first class feature. You are making this claim in general, as if first class features are always strictly better than user-defined features, yet this claim isn't true, and you haven't mention any particular aspect of automatic differentiation for which this is the case.
Swift doesn't have proc macros, so they have to make the feature first class. For languages with proc macros, the question that one should be asking is: what value does a first class language feature adds over just using a proc macro.
I would love to see a cleaned-up implementation of that (and I do have notes on the subject but no time to try and implement them).
Note that to be useful you need backward differentiation (a function with one output but several outputs) and compatibility with tests and loops.
See the Zygote.jl library (Julia) for a working example in a different language (their whitepaper is a great starting point if you want to implement something like that).
Maybe we can also make a #[returns] macro. This macro parses the function into an AST, invokes magic, and determines if the function will return on any input.
What magic does need to be invoked by the #[derivative] proc macro ?
For parsing Rust into an AST you just use syn, which also supports doing AST folds, and for semi-quoting you just use the quote crate.
For the symbolic differentiation part, you can use any of the hundred symbolic differentiation libraries out there to obtain '3 * x * x' from 'x * x * x'. Pretty much any machine learning framework out there supports AD and symbolic differentiation in some form, by using some libraries already. Swift people are standardizing standard practice, not discovering the americas.
If any of that looks like magic to you, then you might just be lacking the background.
I'm surprised by all the negativity in this thread. The need to write C++ components for models is pretty real when you need things to be fast and run on CPU. God forbid you ever want to deploy that on some sort of accelerator, necessitating a rewrite to CUDA or whatever language that accelerator speaks.
Maybe that's too niche of a subject for people to really care about, and maybe it's going to fail now that Lattner is gone (though there still seemed to be a bunch of activity last I checked), but the problem it is trying to solve is real.
The negativity is because they were never serious about it.
While C++ might not be the ideal language, choosing one that only works properly in Apple platforms, dismissing the complaints of lack of any solid roadmap in platforms that researchers actually use and hand waving any usability problems to just use notebooks in Google Compute cloud is not how a language gains adoption beyond its initial niche.
I don't want to be that guy, but if they were going with a heavy handed language anyway, why not Rust? It has a good multiplatform support and most of the same benefits of Swift.
Imo, Julia should’ve been the choice; I have never heard a single researcher use Swift for anything other than iOS coding (I certainly never have). But we will see how this plays out, I guess.
Probably, yes. My point is if they wanted a general purpose high performance multi-paradigm language like Swift. But in general I think Julia would have been a better choice, so I agree.
Their reasoning: "We love Rust, but it has a steep learning curve that may exclude data scientists and other non-expert programmers who frequently use TensorFlow." [1]
(I can't comment on that because I don't know Rust nor am I a data scientist!)
Yeah it just sounds like a cop out to me, Swift's learning curve is equally steep to Rust in my experience. But as mentioned elsewhere, Julia would probably have been the best fit.
But that's not what you do in Rust? That's more of a C++ thing. I think you'd be surprised, Rust is much more similar to Swift than you'd think. The biggest layer of complexity is fighting with the borrow checker in some cases, but that only really comes into play when you want to write something concurrent.
I love Rust and I don't know much about data science but from my understanding many data scientists are not hardcore programmers it the same way that many programmers are not necessarily great at sysadmin. It's a tool, but it's not really the core of their job. In particular they mention Python being very popular in these circles.
Switching from garbage collected and dynamically typed to manual memory management + static typing + borrow checker + trait-based generics is going to be a deep dive for these people. Maybe they'd end up better off in the end though, who knows. I personally think that dynamic typing and GCs are overrated and their benefits overstated but that's obviously not the consensus out there.
I guess the core of my argument is that none of these features need to be used. It's very easy to write basic scripts in Rust at least in my experience. Maybe you might encounter difficulty with the borrow checker occasionally, once in a blue moon now that non lexical lifetimes are a thing. Most of the other disadvantages also apply to Swift.
Either way, people will find a way to justify their preference. The people in question liked Swift, and there's nothing wrong with that.
Perhaps it could be this: if my value comes from running some crazy interesting high level
Math derived machine learning model on data and I’m highly paid the opportunity cost to invest the time and effort to both become good at and write C++ is higher than if I can do the same thing in a few lines of python and just throw some more compute at the problem since that’s a fixed cost.
Compute is most definitely a marginal cost rather than a fixed cost....
And when you start talking about buying 10s of millions in additional hardware to support the project you're trying to launch you start thinking pretty hard about whether you can speed things up.
And for practitioners most of this is not crazy high level math, it's first year linear algebra and multivariable calculus at most, and generally just lego blocks, intuition and data work.
> Perhaps it could be this: if my value comes from running some crazy interesting high-level Math-derived ML model on data, and I’m highly paid, then the opportunity cost of becoming good at C++ is higher than if I can do the same thing in a few lines of python and just throw some more compute at the problem, since that’s a fixed cost.
Add some hyphens and commas and shed a bit of cruft, and it's fine IMO. (I hope you don't mind my speculative editing.)
I appreciate the feedback. Defending wordy stream-of-consciousness writing style is not a hill I want to die on.
I did realize that if I have to throw a million cores at something vs. 1000 perhaps it would make sense to spend the effort or buy the time from an expert in C++ so as to save on the compute cost. But then, what if those million cores are only needed for a a day or an hour? Then python or some other rapid prototyping language would make a bit more sense imo.
Until Google actually offers Swift tooling comparable to Julia, Pytorch and ML.NET on Windows or Android, I will keep ignoring it.
Note that the recent Tensorflow Summit had zero annoucements related to S4TF, including post event blog posts.
From the outside it looks like there is a small group trying to push S4TF, without regards for usability outside Google Cloud, while everyone else is doing JavaScript, Python and C++ as usual.
Yeah I agree that the tooling at the moment is inadequate, specially on linux, but both Apple and Google are working towards improving it.
Regarding the tensorflow dev summit, it was supposed to be 2 days long initially, with the S4TF talk taking place on the second day. One week before the summit, the whole day 2 got scraped due to covid19 though. So the intention was there at least.
In a nutshell, differentiable programming is a programming paradigm in which your program itself can be differentiated. This allows you to set a certain objective you want to optimize, have your program automatically calculate the gradient of itself with regards to this objective, and then fine-tune itself in the direction of this gradient. This is exactly what you do when you train a neural network."
Isn't this just declarative programming as we know it from e.g. Prolog, SQL or other places where the programmer declares what their objective is, and it's left up to the interpreter, compiler or scheduler to figure out the best way to achieve that? And now that's being applied to ML (which probably makes sense, since it involves a lot of manual tweaking). Sounds like a great use case for a library, but hardly worthy of being called a new programming paradigm.
• Sure, it could come under the umbrella of "declarative programming", but that's an enormous umbrella, so that doesn't really say much.
• I fail to see how differentiable programming (the idea of expressing the desired computation in terms of differentiable objective functions) is any less of a "paradigm" than logic programming (the idea of expressing the desired computation in terms of logical predicates).
• Depending on the expressiveness of your programming language, every paradigm can seem like it's "a great use case of a library, but hardly worthy of being called a new programming paradigm."
Eh, differentiable programming is a lot simpler and more specific than declarative programming, and only marginally a "paradigm". It just means you can take any (reasonably deterministic and parametrized) function specified in the programming language and calculate its partial derivative wrt some parameter.
Is the presumption that these parameters are floats, or at least numeric? It seems like if I took the functions from some random program (e.g., GNU diff), most would not be meaningfully differentiable. Or perhaps I'm missing something?
Yes, the parameters you're differentiating with respect to need to be floats.
Though there might be potential for extending the frameworks to e.g. differential cryptanalysis - I'm not knowledgeable enough about it to say how much differential cryptanalysis can be done programmatically.
It's not Prolog/SQL-style declarative programming; the part about setting the objective and calculating the gradient is an example use case, not the whole thing.
You'd like to be able to write your functions using the language's standard function syntax, but have access to both the function and its differentiated form. You can achieve it with macros (in an ad-hoc way) or as a custom language feature (what's being done here, again kinda ad-hoc), or you can use an algebra+interpreter style but at the cost of having to use a less natural syntax (Haskell do notation or similar). The thing that I'd say is closest to the ideal solution is something like stage polymorphism (which is genuinely an exciting new paradigm in my book: it squares the circle of macros versus strong typing), where you can write a function definition in natural function syntax and have access both to the function itself and the AST of the function in a much richer form than what a macro gets (which can be interpreted to produce a differentiated version of the function).
This is the declarative paradigm in the same way that python maintaining an abstract representation of your code that it then passes down to a C interpreter is a "declarative" paradigm.
Sure, there are some analogies, but it is stretching the definition a bit.
I'm no expert, but is there not cause for it being a language feature rather than a library? When you use a library you effectively have a DSL to construct your differentiable program, and lose the facilities of the host language.
It's also not quite declarative programming either. In the same way that you build ordinary executable programs out of smaller executable parts. You build differentiable programs of smaller differentiable parts. You aren't declaring what outcome you want, you are simply restricting what you are building your program out of so that it has properties that allow you to interpret the program differently from how it will run "normally".
I think I can explain a little, but caveat is that it's a little bit of the blind leading the blind, ie. this isn't my field, but I work with people who are working on this and they have talked to me a little about it.
Declarative programming is more or less orthogonal to differentiable programming. You're right that declarative as a paradigm leaves the implementation details up to the "compiler," and so you can have arbitrary implementations created in response to one declared specification. And often what that means is that under the hood it's possible that the compiler can be tweaked to output better results. But the thing is that those compiler changes and optimizations, are just that: arbitrary. You can't "know" or "prove" anything about them, and that means it mostly requires human creativity and intelligence to make progress.
That is fine and all, but differentiable programming is asking/answering the question: Ok, but what if we COULD prove something here? What if the units of computation could be guaranteed to have certain mathematical properties that make them isomorphic to other formalisms?
And why do we care about that?
Well, in math what happens is that some people will start with their favorite formalism and prove a bunch of stuff about them, and figure out how to do interesting calculations or transformations on them. Like "Ah, well if you have a piece of data that conforms strictly to the following limitations, then from this alone we can calculate this very interesting answer to a very interesting question."
But a lot of the time those mathematical paradigms can't talk to each other -- in programming terms, their apis just aren't compatible. Like raster vs vector images. Both image storage/display paradigms are "about" the same thing, but a lot of the operations you can do on one don't even make sense to try on the other, and our ability to translate back and forth between them is a little wonky. Math formalisms are a bit like that, a lot of the time.
So it's very interesting in math world when someone proves that a formalism in one paradigm can be transformed perfectly into a formalism from another paradigm. All of a sudden all the operations available in either paradigm become available in both paradigms because you can always just take your data, transform it, do the operation, then transform the answer back.
(Side note: this is why some people are excited about Category Theory: it's like a mathematical rosetta stone. Ie. A lot of things can be translated into and out of category theory, and in turn all those things that were previously in separate magisteria are interchangable.)
Ok so, back to differentiable programming. If you suddenly have a way to conceive of your program / unit of computation as a differentiable function, then right off the bat you get access to all the tools ever created for calculus. The optimization thing where you find the gradient of the program and follow it toward some target is just one of the things. You also get a huge suite of tools that let you enter the world of provably correct programs, for example.
You also get access to all the tools of all math that can translated to and from calculus, which is... a lot of them. I wish I knew more about math so I could rattle off the 100 ways that would help, but I can't, so instead I'll just say that I think it would be a game-changer for creating optimized, robust systems that work way, way better than our current tech.
The height for me was when Chris talked about the Python integration, somewhere after 0:24:30. Marco asks jokingly: "can you also do PHP integration?" Chris is silent for two seconds, then continues as though nobody said anything :-)
To be fair, he does answer the question a minute later.
Nice article, learned a lot about Swift, love the named parameters and argument labels, makes calling functions with many parameters much more reliable and with labels reads like English, which is great for immediate comprehension. Lack of Named parameters in Javascript, is for me, a critical glaring missing capability, that should be fixed urgently.
Indeed. What that pattern also solves, and AFAIK most static languages lack, is argument forwarding (though Python solves this even more elegantly than JS):
Syntax special-cased to function parameters/arguments is more elegant than reusing the same object&spread syntax everywhere to achieve the same effect? I don't think so.
JS parameters are named to some extent, as in you can refer to them by name within the function body - BUT (a critical shortcoming in my view) you can’t assign values by name when calling functions, you can only assign values by position.
AFAIK the first only works for positional (non-keyword) arguments. The second works, I guess? Not an expert in JS, especially not the "new parts", splats are definitely an improvement!
Yes an object is better than nothing, but is awkward syntax compared to proper named parameters and I don’t see it being used very often. In Postgres I find named paremeters far more convenient than positional paremters for calling functions, also conveys intention much better, also great for defaults (you can use default=null to get around ordering retstriction Postgres has).
One thing to bear in mind with Swift is how difficult it is to stay up to date with the latest version. Swift versions are tied to Xcode versions and Apple has a history of leaving older versions of OS X in the dust. I'm on OS X Mojave, which is only one version behind Catalina, and I'm already denied access to Swift 5.2. Talk about building a community.
Even worse, macOS is tied to hardware years, so even though I've got 3 working Macintoshes here, only 1 of them can run the current version of Swift (and just barely). My quad-Xeon can't run the latest minor upgrade, even though the compiler is faster than the previous one.
I'm sick of this pointless and never-ending upgrade cycle. I started a new project this week, and it's in Clojure. I find it a much more productive language, and I'm confident I can run it on nearly any computer from the past 20 years.
(This is probably not my most productive comment ever, but I'm pretty frustrated right now with having chosen Swift in the past.)
Yes I too am getting tired of it. I have to get a new SSD every time Apple releases an OS because I don't want to run their latest OS and break all my audio hardware, but need something to build + release on for the app stores.
I have stuck with C++ but I did buy a book on Swift, which is now hopelessly out of date. People can say bad things about C++ (plenty of material!) but at least it is mostly backwards compatible.
> I have to get a new SSD every time Apple releases an OS because I don't want to run their latest OS and break all my audio hardware, but need something to build + release on for the app stores.
Yes that's what's occurring but I end up needing at least 2 for this version and the previous one because you can't always rely on the new OS to function properly for a few months.
(Still, I can't complain too much - it is a free entire OS upgrade unlike OSX upgrades of yesteryear)
This is only a problem if you're using Xcode-bundled Swift, right? It seems like you can easily run Swift on earlier version of macOS completely independently of Xcode if you want to.
Technically true, though if I’m going to give up the advantages of the standard toolchain and the ability to ship App Store apps, I’m not sure why I’d use Swift at all.
Some people like Swift on its own merits but I do not. It’s a terrible fit for the kinds of programming I want to do.
Slightly more than 7 years, it seems. Younger than my car, my boots, my desk, my stereo, etc. I’m one of those weirdos who replaces something when something better comes along, not just because a certain amount of time has passed. And I can’t afford to spend thousands of dollars just to run a slightly newer version of a compiler that doesn’t even fix the bugs I’m hitting in the old one.
If it's merely about access to Swift as a language, not for Apple platform development, you can run swift docker images for pretty much any Swift version you want (and you can, of course, run them on Mac as well).
I agree that Xcode increases its macOS version requirement frustratingly quickly. But I imagine Swift for TensorFlow is often used outside of the macOS/Xcode context, where you aren’t using Xcode-bundled Swift anyway. Besides, you can download a newer Swift toolchain and tell Xcode to use that. [0]
You do need to use the Xcode-bundled toolchain when archiving for the App Store, but not for development or off–App Store releases. Even for the App Store, you could get away with just one computer on your team having the latest macOS and Xcode, or with using a separate build server instead of archiving on your workstation.
That is something Apple is limiting in their OS though, it's not really a lang limitation. Swift is open source and various groups are working on making sure it can run on different platforms and not need XCode.
Speculation time! Chris Lattner designed Swift while he was at Apple. Apple seems committed to it, and Chris seems to spend his time making it a language that isn't Apple-only. His stint at Tesla was maybe short because Tesla didn't want to adopt Swift. At Google, he got to start a project with it, but him leaving suggests it didn't catch on. SiFive is small and hardware ramps are slow, so maybe he's given up on the Swift-world-domination plan. Again, this is all speculation.
It's nice that they've actually demonstrated something in it. The differential physics engine is pretty freaking cool. Also, if you look at some of the details they talk about it's clear that just having a differentiable language isn't enough: it's still very easy to have gradients which don't work properly if you're not careful, especially in boundary conditions (the example they give is collision detection).
OP likes Swift and he compared it to other languages Python, Go, C++ & Rust, Julia. In the enterprise sphere that I am in, I haven't seen Swift and its hard to move a team in this direction at this point.
Python already drives a significant market and captured talent in numerical differentiable programming, particularly because of the ease with which prototyping and tuning is done with it. Obviously when we want to scale we have the conversation on some other option.Personally its a bit frustrating that Go support for TensorFlow or one of the competitors is not quite satisfactory and this is surprising, I can't explain it.
Instead of inventing any new language why don't one of you -- Python, Go, C++ & Rust, Julia, or Swift -- complete the job with end-to-end differentiable programming. 'Complete' to me means a language level seamless GPU (or related distributed/parallel architecture) and language level deployment ease (the kind of thing done with Kubernetes) and integration with embedded hardware.
There is more in that rationale than being privately owned. In comparison to C#, Swift is not natively adopted by any of the three big cloud providers (AWS, Azure and GCS) despite being the go-to language for half of all mobile apps (which is a huge market). C# is supported despite being Microsoft controlled. I believe that Swift does not have the confidence of the community due to Apple's radical technology choices.
IBM may have counter-balanced this.
What do other think is the reason for the slow backend adoption of Swift?
IBM has close to no momentum behind their cloud offerings compared to the big three.
I don't see how IBM will push for Swift in the future. They already pulled out (?) of the swift server working group, and didn't have the kind of pull to begin with.
IMO the most important reasons for the slow adaptation is lacking cross platform functionality (currently no upstream windows support + poor linux tooling) and questionable API design. Both make it difficult for new developers to give swift a try. Luckily, windows supportis supposed to come with the next version in a few months.
The API design is probably very subjective, but comparing the example in the blog post I don't understand how anyone could prefer:
To play the devil’s advocate, I think time.time() is more vague than intuitive. What value does it have? 0? Current time? In what time zone? Is it monotonous?
C# seems like much better time investment if you want to target many platforms and areas.
Now I do business web apps, but previously I did games. Which we published on all app stores + web. I'm also doing web sites. I did cross platform mobile apps with Xamarin.
I can do even frontend web thanks to Blazor. I can use C# for embedded work - but that's not my thing. And you can use C# for machine learning, too.
With Swift I would be restricted. Which may be fine for some people who like to work only on certain kind of apps, but not for me.
Only Rust tempts me to learn and use as a new language.
For me as a long-term C# fanboy these modern times are hard.
C# has benefits now because of the community and wide reach. That was always the argument against C# ;)
I see it picking traction. Probably won't happen in Bay Area but smaller shops worldwide are picking it because it has some advantages.
I don't expect it to takeover the world, I just want it to have enough success to keep me earning money and be kept up to date. Which kind of happens. :)
> What do other think is the reason for the slow backend adoption of Swift?
It's not different enough. All the demos I've seen of it, felt like it was mostly an "also ran" language that modernised it to be a significant improvement over Obj-C.
But I've never seen any argument for it being a huge enough improvement over C#, Kotlin or others to be worth it to switch ecosystems.
I feel differently. I read the docs when it was first released and felt it was potentially hitting the sweet spot for "static yet expressive" and "familiar yet modern". I was mainly Python at the time and the readability and clarity impressed me and those are areas where I have very high standards.
More radical languages all tend to look a bit "read only" to me but I am aware that might just be unfamiliarity on my part.
C# and Kotlin both have nice features but they are more conservative Java-likes than Swift. Swift had enough modern features and syntax to feel like it would be reward the effort learning.
But I don't own an iOS device so I'm still waiting for a use case!
.NET and JVM also have Clojure, Scala, F#, C++ to choose from, metaprogramming runtimes, and AOT compilation support (although many seem to forget about this one).
So while Swift is a welcomed improvement over Objective-C, hardly brings anything worthwile to dump either the Java or .NET ecosystems.
> C# and Kotlin both have nice features but they are more conservative Java-likes than Swift. Swift had enough modern features and syntax to feel like it would be reward the effort learning.
What does it have that Kotlin doesn't?
(As a Scala programmer, Swift and Kotlin look pretty much the same to me; as far as I can see they're both polished but conservative languages that don't fundamentally offer anything that wasn't in OCaml)
They're pretty similar on the surface but Kotlin is a bit hindered by it's Java legacy, generics are better in Swift (and for example C#) because of that. Swift has associated values with enums, Kotlin can assign a value to a variable with a when (switch) statement. Swift uses more pass by value, Kotlin does a lot more pass by reference. Kotlin has Java's form of garbage collection, Swift uses ARC. ARC is simpler to predict but can induce a performance penalty.
I don't think Kotlin has any form of the "ugly C" API Swift has.
Swift is a bit more secure in terms of "it compiles, it works" than Kotlin, but Kotlin is definitely a huge improvement over Java.
As someone who writes Kotlin and Swift on a regular basis, the Java bits that leak through to Kotlin are quite frustrating at times. If you picked up the language without a feel for Java’s quirks (as I did), Kotlin feels somewhat finicky and temperamental compared to Swift.
In contrast, Swift has done an excellent job of making a real clean break from its Obj-C predecessor, despite sporting full Obj-C interop. It has for most practical purposes none of Obj-C’s quirks for pure Swift code, which is great.
I frequently find myself wishing that Kotlin shared Swift’s level of independence from its predecessor, or that I could just use Swift instead. While Kotlin isn’t a bad language, its ties to Java leave me less than enamored.
> generics are better in Swift (and for example C#) because of that
Disagree. Reified generics let you write functions that do different things based on the runtime type of an object, but such functions are confusing to a reader and best avoided.
> Swift has associated values with enums
Java/Kotlin style enums are my favourite from any language, since they're first-class objects; you can give them a value in a field if that's what you want, but you can also have them implement an interface etc.
> ARC is simpler to predict but can induce a performance penalty.
In the general case ARC has the same pause problems as traditional GC, since you can't tell when one reference is the last living reference to an arbitrarily large object graph.
> Swift is a bit more secure in terms of "it compiles, it works" than Kotlin
How so? If anything I'd have expected the opposite.
Fair enough. Looking into it, swift's "associated values" sound super weird; it seems like it means you can have multiple distinct instances for the same enum case, which breaks what I'd expect an enum to be. In fact an enum with associated values sounds more like a sealed class - both those concepts exist in Kotlin, but they're different and the difference is useful.
Like I said, you can write the same kind of thing very easily in Kotlin by using a sealed class - your example would translate line-for-line. It's a useful feature. But it's very confusing to call it a kind of enum rather than as a kind of class; as your other post acknowledges, it means Swift essentially has two different things that are differently implemented but both called "enum".
>C# and Kotlin both have nice features but they are more conservative Java-likes than Swift. Swift had enough modern features and syntax to feel like it would be reward the effort learning.
What necessary modern features do you feel Swift has and C# doesn't?
>That said, like C#, it is strongly associated with Apple as a “private” language (despite being open-source).
You can't use Swift on Windows. You can target practically target most operating systems with C#.
C# is not only open source, but an international standard. Ecma (ECMA-334) and ISO (ISO/IEC 23270:2018)
Although both started as proprietary languages, both are open source now, both are developed mainly by one big company, the difference is that Microsoft is interested seeing C# on many platforms, while Apple doesn't have that interest.
I think Kotlin, Go, Rust have greater potential to rise in popularity than Swift.
Swift wasn't open source when it was first released, and first impressions matter. Reputation also matters, and Apple has a poor track record of being a proponent of open source.
I would've been excited about Swift if it weren't for these things.
Agreed, this is the most clickbait title of the week.
From my understanding, the author is someone from outside Google who doesn't know anything about Google bets more than what is publicly available, and the author is biased toward the Swift language.
JAX makes more sense, as it uses Python, but more importantly, Google works at a lower level of the stack with TPU, MLIR and XLA. It doesn't really matter what language is on top of that stack.
Does anyone here use Swift for Deep Learning work ? I'm yet to see any real advantage, and even if there were or were to be added, I think Julia would almost always be a better choice here.
As an outsider, I was disappointed they didn't pick Julia too. It seems very thoughtfully put together, has a good community and some high quality libs. And it's fast.
As the article points out, Google did consider it. IIRC it came down to Julia and Swift in the end. And, given Chris Lattner was leading the effort, there was only really going to be one answer. There's clearly some merit to that: they were expecting to make changes in the compiler (again, if I remember, for e.g. optimising GPU code). If you're going to change the compiler, it's pretty compelling to opt for the language that one of your team designed. And it's not clear (to me) what the implications of commits into the Julia master tree would have been.
That doesn't generate a community though. It's yet to be seen whether that will happen. It would take the level of resource that very few firms can afford to dedicate. Google is one of them: though its patchy record on committing to long term endeavours means it's definitely not a slam dunk. And Lattner leaving further detracts from that confidence.
I'll be interested to see what they come up with. Still think it's a pity they didn't choose Julia. But it's not my project so I don't get to choose.
Sad but true. The adoption of a rapidly-changing language (Swift) mostly under the control of a company that doesn't mind breaking things from one year to the next (Apple) being used by a company that abandons projects monthly (Google) sounds like a waste of time and effort for developer investment.
Well, why though ? Julia has plenty of numerical libraries going for it, and all of them can be used in a differentiable manner (see, Neural ODE etc.). While Julia was built by a mix of people doing Numerical Linear Algebra and PLs (MIT, UCSD..), Swift seems to be entirely the output of the latter group.
I wonder how important more superficial elements will be in terms of future adoption. As a Python programmer Swift looks much more familiar and if you can easily import familiar packages from Python then maybe this will help overcome the inertia required to switch, especially in industry.
Even if Swift wins over Julia it certainly won't be because of these two points (syntax and Python interop). Julia probably has the best Python interop out of any other language and the syntax is fairly simple (although a bit ugly with the 'end' blocks and the allowed unicode characters).
One of the things that people in my field (physics) love when I showcase Julia is the ability to use greek letters as variables: this makes mathematical formulae so clean to read!
Don't you dare talk shit about my Unicode characters. Any langauge written after Unicode's adoption has no excuse not to use them. They're wonderful. They're magical. They let you say exactly what you mean, and keep it condensed enought that you can read it at a glance.
If you're in a field where symbols have existing meanings, it's asinine to make your code clunky and harder to read by not using those existing meanings.
'end' is subjective (I like it). The unicode characters allow to write syntax which is very close to mathematical formulae, this can help a lot with reading/understanding in some situations.
Python is my main language, and I struggled mightily to get my head around some aspects of Swift (e.g. randomly sticking question marks in different places until it was happy). As an aside, I also found the API incredibly verbose, the documentation poor, and Xcode to be a bad IDE.
In contrast, I enjoyed Julia - while the documentation isn't great either, after a single afternoon of learning (with plenty of Googling) I was able to port code over from Python and have it run perfectly.
Julia is compiled unlike Python though (at least by default), the moment you call Zygote it will have to run through the entire program you want to differentiate and fail immediately if any type does not match (without the need of any type annotation besides in the method arguments for the multiple dispatch and structs, as Julia has type inference). As some people say, Julia will fail "Just Ahead of Time", and if you're experimenting with live code in the REPL or Jupyter (the recommended workflow) it could very well be just as good.
That said, for machine learning a static type system (as they are now) is definitely not as much of a boom as in most other areas. Most of it involves tensor operations, and embedding things like it's shape in the generics will often lead to an exponential explosion of possible monomorphizations that the compiler will be forced to create. Even JIT languages like Julia will have trouble even though it only needs to compile when it's used (for example StaticArrays will get to a point when the compiler will take so much time that it's not worth it anymore). And even then I feel like most of the issues that actually take time are deeper than stuff like shapes and not naming dimensions, like numerical instabilities, hyperparameter tuning, architecture search and other stuff that gets more benefit of a language allowing quicker exploration.
Python is also compiled to bytecode in the mainstream implementation. But that is an implementation detail and has nothing to do with the (lack of a) type system.
Julia is also "compiled" down to it's (untyped) IR before even starting (when all macros are expanded), but since it's a dynamic language like Python, it can't know types at compile-time, so this step can only know parser errors. It could only type check at this step if the type was in the definition and not in the variable inside (if it was static). And both Julia and Python are also strongly typed (there isn't really a language that lacks a type system).
The difference is that whenever a function in called in Julia, based on the type of the arguments the compiler can infer every type of every variable and arguments of every subsequent function, immediately compiling them down to machine code (unlike Python there is no such thing as a Julia Virtual Machine or even a Julia Interpreter outside of the debugger). Whenever you enter a function in Julia it becomes a static program, with all the properties of a static language (for example, you can't redefine types, you can define functions but the program can't see them since they are not part of the compiled code, you can't import libraries, and all types were already checked before running). That's why Julia is fast, and the language was entirely designed for working this way (there are lots of things you can do in Python that you can't in Julia to make this work).
> And both Julia and Python are also strongly typed (there isn't really a language that lacks a type system).
No they are not. At least not for my definition of the term "strongly typed". Of course you can use a different definition, but if, as you say, "there isn't really a language" that does not fulfill your definition you might want to reconsider its usefulness. The point of classifying languages is mood when every fan comes along and says "my favorite language also has that, if you tweak your understanding just a bit".
I used the more common definition (strong vs weak being orthogonal to static vs dynamic). Both Julia and Python are strong and dynamic (and duck typed).
This is me querying Julia for it's typed intermediate representation of the function `foo` I defined if the input is an integer. As you may be able to see, this is a statically typed program. So while Julia is a dynamic language, the interior of function bodies can be static if type inference succeeds. However, if type inference fails, Julia is also perfectly happy to just let chunks be dynamic.
If you're having trouble understanding that code_typed output and are familiar with C, maybe the output LLVM instructions are helpful:
Also, I encourage you to note that nowhere did I manually put a type annotation in my definition of foo. foo is a generic function and will compile new methods whenever it encounters a new type.
You seem to confuse type checking with shape-driven optimization. Of course the latter can borrow a lot of terminology from the former and in a typed language you would be stupid not to use your types, but they are different topics. Any decent JavaScript or lisp compiler or interpreter will do the same.
Also what Julia does is not exactly type inference. It is rather a form of specialization. Type inference just gives you the type for an expression using just the syntactical representation. Nothing more.
As an example, if I was to invoke your function f with a floating point number, would Julia not happily specialize it for that too (assuming factorial was defined for floating points or taken out of the equation)? The thing here is, that a type checker needs to know about all the specializations a-priori.
Not sure what you mean by that example. If "f" accepts a float and you give a float it will work. If "f" accepts a number (or t subclass number) it will work since float subclass number. If "f" only accepts int then it will fail since you can't convert a float to int (as in there are no defined promotion/conversion rule from float to int). The type check works just like any language.
Specialization will only occur if there are more than one "f" that handles your type (for example both "f" with a number and with float), and then the compiler will choose the most specialized (float). If you write your Julia program like you would with a static language (for example using over-restrictive types like just float and int) it will return the same error as the static language as soon as it can infer the types (when it leaves the global namespace and enters any function). Using over-restrictive types (and declaring return types for methods) is considered bad practice in Julia, but that's a cultural thing, not a limitation of the type checking.
It happens at compile time in Julia (required for the multiple dispatch to work), the compile time is just interlocked between runtime steps. Calling something with incorrect arguments will fail even if the function is never called during runtime (unless the compiler figures that out using only compile time information and decides not to compile it).
But I understand your point, it's not really like a static language checking as it can't do without running the program, so it's better than fully interpreted (a unit test would catch errors even in paths it didn't take) but worse than static checking (as it can't compile 100% of the valid program at once).
> As an example, if I was to invoke your function f with a floating point number, would Julia not happily specialize it for that too (assuming factorial was defined for floating points or taken out of the equation)?
Yes, but I could then define
bar(x::Int64) = x^2 - 1
and this would create a method which only operates on 64 bit integers.
Now I want to be clear that I'm not saying Julia is a static language, I'm just saying that once specialization occurs, things are much more like a static language than one might expect, and indeed it's quite possible add mechanisms for instance to make a function error if it can't statically infer it's return type at compile time.
> A data scientist could use Swift in much the same way as they use Python
Is this really true? Many data scientists don't come from a development background. Low level swift code (what's likely to be in TF) can be as obscure as C++.
Will the 'industry lag' without users having to dig into the library and understand the code?
Not all data scientists are the same. Sure, this probably isn't useful for someone without much coding experience who ya experience with the Kerala API, but for people who want to code custom architectures to solve their problem (which is also very common), the static type checking enabled by Swift, among other things, is very appealing.
I'm an iOS developer deeply familiar with Swift and on the path of learning ML. The irritation with "import Foundation" is understandable. Foundation is, wait for it, the Foundation of Objective-C programming and development for the iPhone. It's a library Swift makes use of and needs to inter-op with due to legacy concerns for iOS and Mac developers.
It's not an inherent part of Swift the language, and efforts like those at Google and the open source development of Swift can develop more modern and suitable replacements for those libraries.
I like python, but man I really don't want to make large systems in it. Swift is a great language, and imo the biggest thing holding it back is that it's intertwined with a lot of Apple code. But it doesn't need to stay that way, and for that reason I applaud the efforts to move it beyond just an "app creation" language.
One thing not mentioned in the original "Why Swift for Tensorflow" document and was a mayor source of conflict when the differentiable programming feature was formally proposed by the S4TF as a standard Swift feature: Swift has no mechanisms for metaprogramming. The reason is that Automatic Differentiation can be implemented 100% using metaprogramming, instead the S4TF team had to create internally certain features for this, that is probably one of the reasons why it took so long to get the most basic stuff working.
In retrospective you can really say Swift was a bad choice for the project because the time to market was much slower than it could be vs e.g choosing Julia. The other thing they didn't take into account was the actual market, that is, the Data Science ecosystem in Swift is non-existente, you have an excellent Deep Learning library standing alone without a numpy, a pandas, a scipy, a opencv, a pillow, ect, which makes doing real application with it nearly impossible.
That said, Swift as a language is amazing, doing parallel computation is so easy, not having a garbage collector makes it super efficient. Its the kind of thing we need, but the language right now is not in the right state.
There are some things really neat in the switf example that they compare to python, like the literal range similar to ruby's, the fact that "+" is a function like in lisp, although I'm really not fan of the CFAbsoluteTimeGetCurrent().
But the python example doesn't make me trust the rest of the article. It is clearly a swift example, translated verbatim to python.
Idiomatic Python would be this:
import time
for it in range(15):
start = time.time()
total = sum((it,) * 3000)
end = time.time()
print(end - start, total)
Which is shorter, and way faster.
Now of course, Python is slower than swift (although a numpy version would not be, but I get it's not relevant in general purpose machine learning). But misrepresenting a language is not a good way to make a point.
Hi, author here, thanks for taking the time to read the article!
The objective of the demo was not to see which language could sum up a bunch of numbers the fastest. You could keep optimizing that until you are left with just `print(<the resulting number>)`. The objective was to have a simple example of looping over an array a bunch of times. The only reason I ended up summing the numbers in the array and printing them was so that LLVM wouldn't optimize it away and be unfair towards python. I actually wrote it first in Python tbh.
I find optional?s & protocol-oriented programming enable clear & concise mental models. I like swift's syntax better than Java/Kotlin, C++ & Rust imo. And it's damn fast.
"If you’ve ever used PyTorch, TensorFlow, or any of the other big machine learning libraries, they all support this feature, but only if you’re using their particular library specific operations. What’s more, working with gradients in these Python libraries is not as lightweight, transparent, or well integrated as it is in plain Swift."
This reminds me of concurrency and transactional memory in Clojure. You can have all of those things at a library level, but building it into the language... Well it kinda FORCES you to deal with them, for good and ill.
if you want nice DP, try Julia with Flux and Zygote. Zygote perhaps has a bit of a distance to go for performance, but it just feels totally natural to code with these libraries .. if you assume performance gaps will be closed (its good enough).
Because for mobile devices you don't want to ship LLVM. Plus it's easier to package binaries than scripts which usually require Docker to deal with dependencies in a sane way.
I had to google "differentiable programming" to see if they were really touting automatic differentiation as some futuristic language feature. Honestly the only real response to this should be :trollface:
I've been doing software development in data science, large scale optimization and machine learning for over 15 years... I've needed automatic differentiation in my language .... exactly never. I mean, most of the languages I use regularly are capable of it, and it is a neat trick; it's just not that useful.
The best part of this article is Yann and Soumith twittering they need Lush back (and not because of automatic differentiation). I agree; it's still my all time favorite programming language, and I don't even fool around with Deep Learning.
https://twitter.com/jeremyphoward/status/1097799892167122944
Sure: great -it can be used with other sorts of models as well, but it's really not that big a deal, and can be implemented in a lazy afternoon. I'm pretty sure lack of this as a first class language feature is holding back exactly no DL frameworks or programming languages, and glueing it onto Swift isn't going to make DL people use it. I've never used Swift's repl; I suppose that might actually be something that gets people on board.
Hi, I'm working on making a tensor lib in Rust (think numpy + autodiff) to learn about these topics. There isn't much information online about how projects like numpy and autograd work under the hood.
Do you have any ideas/tips/resources about how it could be done?
Numpy is basically lapack. You'd be hard pressed to replace that with something nearing its performance. For autodiff, I dunno, how about John Mount's explanation?
It had a lot of cranky pieces when it made the 64 bit transition, and afaik it never got GPU support. I'm pretty sure it was ultimately abandoned because students didn't like it.
I agree that cpython threads are not parallelism, but python still comes with built-in support for multiprocessing and I've been earning my bread using that within the past two years, so unless you have to use in-process parallelism for some weird reason, python and your OS scheduler of choice has you covered there.
> have to use in-process parallelism for some weird reason
Needing shared memory paralellism is a weird reason now? Pretty much any parallel algorithm that's not embarassingly parallel is going to perform better with threads able to share memory than with message passing between processes.
Exactly, pretty much any parallel algorithm is embarrassingly parallel (gather or calculate a bunch of data, process it and merge it together) so I question the need for continuously needing in-process parallelism instead of solving it trivially.
Python is not great for fine-grained data parallelism (SIMD, GPU), which is increasingly the lion's share: non-starter for direct inline and pretty bad for DSLs. The result is runtime heroics for embedded ~dataframe DSLs (pyspark, rapids.ai) with high overhead.
OTOH, those heroics do happen, and been OK so far. Accelerating differentiable programming is basically an extra transform layer on accelerating data parallel programming. Thankfully, our team writes zero raw OpenCL/CUDA nowadays and instead fairly dense dataframes code. Similar to async/await being added to doing a lot for web programming on Python, curious what it'll take for data parallel fragments (incl. differentiable.) If it wasn't for language resistance for UDF + overhead, and legacy libs around blocking, we'd be happy.
I'm not sure what's your domain but for machine learning that will be slow:
- Parallel matrix multiplication may be embarassingly parallel, you have a reduction step that is not trivial to parallelize across processes. Also you need to take care of register tiling, L1 cache tiling and L2 cache tiling. It is way easier to do this in OpenMP
- Parallel Monte Carlo Tree-Search: it's much easier and more efficient to spawn/collect trees with a proper spawn/sync librairie.
Coming from a complicated background (pascal, c++, ruby, typescript, delphi, kotlin) I'm pretty new to python myself. Python is dead in water for trivial things like parsing datetimes and floats, or writing complex algorithms. C++ pretty easily beats it by orders of magnitude. However, it is pretty good at other things such as (off the top of my mind) running a regex over a bunch of text to find subgroups, serializing/deserializing data, scipy's curve fitting, etc. So as long as you are using some module that calls natively compiled code, you are pretty much golden.
Looks like I'll stick to the wonderful world of C++ then! It seems that when a language has to resort to calling another language's compiled modules for speed, it has lost the fight for efficiency.
So what about Fortran if the modern languages in this space fail to satisfy? It's fast, statically typed, and often used for heavy duty number crunching on supercomputing architectures, so obviously it handles parallelism well.
There are several things that make me give this a thumbs down.
First: "All these usability problems aren’t just making it more difficult to write code, they are unnecessarily causing the industry to lag behind academia."
Industry lag behind academia? Seriously, have you any idea of the amount of data that industry crunches these days, while some in academia still think half a GB is "big data"? Or the amount of money there is in the kind of ML that industry does (which is the main reason why anyone cares about this field at all).
Also, your whole post is about innovation going on at ... google. Then apple. Not academia.
Secondly: "swift is fast", point taken. Then you go off on a drool about your favourite syntactic sugar. My own experience is that this is exactly the kind of thing you do not need in an enterprise-grade product.
While I’m a fan of Swift and am glad that it’s getting a lot more attention outside of the iOS world, I’m not sure if it’s the best language to be used in the data/ML world. Yes, it’s faster than Python, yes it’s almost as easy to read, and yes its standard library is powerful. But when interacting with Swift as a framework, I can’t imagine how much of a pain in the ass the strict type system will be for researchers. I get it. Types help people developers write less error prone code and blah blah. But the main target here are ML researchers. They aren’t developers. They just want to write down their theory in code and just run to train their models. They aren’t writing complex iOS apps here.
To me this seem attractive for those that want to put ML into production.
Having a solid type system and an easily interoperable language could lead to a lot less bad surprises than using python.
Something of an aside: closures don't need to be "unnamed functions which capture their context", right? A function that "closes over" its context but is named is still a closure, unless I'm wrong.
You're correct, but there's a bit of "linguistic drift" at play, I guess. The use of the term "closure" in the Swift ecosystem has come to refer almost entirely to the anonymous kind.
(Side note, named functions in Swift are indeed also closures:
let x = 10
func f() {
print(x)
}
f() // Okay; prints 10 as you'd expect
)
Yeah, that seems to be the case in the JS community as well – I see a lot of people referring to the arrow function syntax as "closures", even though named functions are closures in JS too.
Apple has little interest in supporting Swift outside of their platforms. Chris Lattner has left Google so the future is uncertain. Maybe thats a pessimistic outlook though.
My gripe with Swift (and I've written a lot of it) for purposes requiring a lot of number crunching is the performance issues associated with default use of copy-on-write types/ease of accidentally writing O(N^2) code. In Objective C the behavior was more obvious with NSMutable* types being front and center.
Depends to which extent working on Swift at Apple actually means Swift on Android, Linux and Windows, at an ecosystem and tooling level comparable to other programming languages.
The problem is the GIL prevents threads from making progress simultaneously, even with synchronization and locking primitives available.
But even so, Python is a little slow when you start thinking about threading. You'd be better off using Rust or Go or something rather than the half-baked support found in scripting languages.
> the half-baked support found in scripting languages.
It's a long time since I've heard that phrase used disparagingly. Didn't we all decide to use the term "dynamic languages" just to avoid the judgemental overtones associated with "scripting".
> It's a long time since I've heard that phrase used disparagingly. Didn't we all decide to use the term "dynamic languages" just to avoid the judgemental overtones associated with "scripting".
I didn't mean to disparage. While I did intend to scrutinize Python's multiprocessing (my experience with it was less than fun), my use of the term "scripting language" was entirely subconscious.
But I do have an anecdote.
A decade ago I was using Python and Flask for building web apps. Now I use Rust instead, and many of my collages are choosing to use Go.
I still use Python for scripting and now also use it for ML. But I wouldn't use it for the web anymore.
I think the landscapes and use cases are shifting since there are new tools available. Python is doing new things (pytorch, tensorflow), but less of the things I used to use it for (Flask, Django, ...)
the growth of an high performing language for data will probably come from NIM. it has expressiveness like Python and speeds like C. Though the community needs to develop. and drop efforts like deploy to JS etc
No - a differentiable program would still be running on a real computer (which is a finite approximation of a Turing machine). So for a differentiable program to solve the halting problem, that would imply that the underlying machine could solve the halting problem (which it can't).
My read is that this is a technique that can make it easier to write certain types of machine learning applications - which means less coding, more giving the application data so that it can fine-tune it's behavior. There are also applications we don't know how to write, but we can train a program (a machine learning model) to behave that way.
There would be domains where this would be a major advantage and (probably) applications where it wouldn't be particularly helpful.
Best to think of it as another tool in the engineering toolbox.
Obviously "more powerful" doesn't imply the capability of solving the halting problem. What exactly do you want to say? You are not impressed with "differentiable programming"?
It is misnamed. Makes it sound like some kind of Turing complete approach, like when DeepMind invented differentiable Turing machines. Tgis post just describes differentiating mathematical functions, which has been around forever, and you can find in matlab, mathematica, python libs, explained in CISCP, etc. Why doesn't Google just buy a Matlab license and be done with it? Why bother embedding in a whole new programming language?
Technically, RNNs, including LSTMs, are already turing complete, neural turing machines mostly decoupled memory from the hidden layers so it can grow the memory size without a square increase on the number of parameters, which helps with the unbounded part of turing machines. it also helped inspire many attention based models that came after. Also matlab isn't really related here, and automatic differentiation is different from symbolic differentiation, the direct comparison are those python libs like tensorflow, pytorch and jax.
Differentiable programming in the end is just a way of making something that you can already do better (just like you could create neural network long before theano/tensorflow/torch but it was not as streamlined). With a differentiable programming approach you can get something as dynamic as pytorch, with the performance optimizations and deploy capabilities of a tensorflow graph and with an easy way to plug any new operation and it's gradient by writing in the same host language (so no need to learn or restrict yourself the tensorflow/pytorch defined methods/DSL).
You don't even need to change the compiler or define a new language for it. Julia's Zygote [1] is just a 100% Julia library you can import, for which you can add at any point any custom gradient even if the library creators never added them, and them run either on CPU or GPU (for which you can also fully extend using just pure Julia [2]). And of course, you can also use a higher level framework like Flux [3] which is also high level Julia code.
I think the heart of differentiable programming is just another step in the evolution, from early (lua) torch-like libraries that gave you the high level blocks to compose, to autodiff libraries that gave you easy access to low level math operators to build the blocks to the point where you can easily create your own operators to create the high level blocks.
Who knows, perhaps this will become the greatest thing since Lisp... What do I know.