I can see in the page that last PDL release was on February, okay. Probably the presence of activity indicates that there are people using PDL, and more than that, maintaining it. Since PDL existed for many years, I would speculate that these people presumably didn't jump into PDL yesterday. There can very well be large codebases using it for a long time.
All the comments suggesting that this somehow shouldn't be and that people should move away from PDL, are depressing in the way that, in a split second, years of effort and thousands lines of code that are probably doing good work are dismissed just like that.
If a bank is still using Cobol, it is interesting and a testament to how a Cobol programmer can still make a good living on it. But if a scientist is using Perl to carry out calculations, this is somehow bad?
This is the sentiment I've been seeing online, especially here.
1/3 of the posts here are "<old tool that already exists in a stable, mature codebase> in {rust|go|whatevernewlanguagecomesnextweek} released v0.0.1"
Perl is a great language, that does it's job for many, many things, especially with CPAN, and it has been doing so for years. You can buy a 20yo book on perl, and 99.99% of the example code from that book still works, and same goes for projects from that era (which cannot be said for python, where developers and distro mainanters seem to enjoy removing usable, mature projects, just because they're written for python2.7 and incompatible with 3+).
If I have to write a script once, that I can forget about, and just expect it to run for years, perl will always be my first choice.
> If a bank is still using Cobol, it is interesting and a testament to how a Cobol programmer can still make a good living on it.
No it isn't. It's a testament to how backward that bank is. You'll see upvoted contrary takes here, sure, but that's because middlebrow contrarianism is a good way to get upvoted on HN.
I loved using Perl for projects in the early days of the Web. For anything even remotely expressive or artistic, Perl was the way to go. But if you want to communicate scientific insight, using the write-only language is something I have to maintain my doubts. But, if Inline::Python works as well as the comments above indicate, then I might be tempted again. Number crunching in Python, pretty pictures and presentation in Perl.. Hmm.
Don't worry, I think Perl is good. I like the freedom you get when writing it. It's especially good at tasks that shell is just not quite expressive enough for. I used it heavily in a sysadmin job and it was way easier than writing ansible scripts.
The only thing holding me back is if I want to use ${library} I probably can't do so from Perl.
> If a bank is still using Cobol, it is interesting and a testament to how a Cobol programmer can still make a good living on it. But if a scientist is using Perl to carry out calculations, this is somehow bad?
Yes. The negative externalities of doing scientific analyses in Perl are much greater than a bank having a legacy COBOL codebase. Only a handful of engineers within the bank will ever see that COBOL codebase. Science is globally collaborative; many people at many different institutions across the world would have to deal with some idiosyncratic scientist’s decision to write their analysis in Perl.
Also, the bank only has a COBOL codebase because it’s reluctant to make major changes to an extremely important system that’s been working flawlessly since the early 60s. There’s absolutely no reason to start a totally brand new project in Perl (or COBOL, for that matter), when far superior alternatives exist.
> idiosyncratic scientist’s decision to write their analysis in Perl.
In my experience, looking at "scientist" produced code, the programming language matters very little. It is not hard to produce something completely inscrutable and non-replicable in Python and R the same way it's been done for ages using SAS, Stata, MatLab etc.
I still see people rolling out their regression computations using matrix inversion and calculating averages as `sum(x)/n`.
I really like PDL when I can use it. I have had problems building it from source on Windows in the past, but it is actually a very well thought out library.
Also worth mentioning, you can get a lot of mileage out of GSL[1].
> It is not hard to produce something completely inscrutable and non-replicable in Python and R the same way it's been done for ages using SAS, Stata, MatLab etc.
It’s possible to write inscrutable code in any language, but some languages sure make it easier.
Syntax issues aside, the main advantage of Julia/Python/R (the latter’s syntax might even be worse than Perl’s) for scientific computing is their ecosystems. A language for a particular use case is only as good as the packages available for that use case. The scientific package ecosystems for Ju/Py/R are far richer than that of Perl, simply because their userbases are much larger. Thus, a scientist using Perl would likely be forced to roll a lot of their own functions, which makes the code idiosyncratic and more likely to contain bugs. (To use one of your examples, people might implement OLS regression by manually computing the hat matrix because no stats package exists for the language they’re using. Now imagine their language lacks something more complicated, like a robust MCMC sampler package à la PyMC3 or STAN, and they have to roll that themselves. Yikes.)
And that’s not even getting into the value of the languages for interactive scientific computing, which is how most of it gets done these days. For instance, there’s no official Jupyter notebook support for Perl (although unofficial plugins exist, they don’t support inline graphics/dataframes/other widgets), and the REPLs for Julia/Python/R are much more modern and fully-featured than PDL2.
BTW, I agree the GSL is great for building standalone tools, but it’s totally irrelevant for any interactive work.
> For instance, there’s no official Jupyter notebook support for Perl
Not sure how official support would work in the Jupyter Project since anybody can write a kernel. I wrote the Perl one (IPerl) and that has existed since 2014 (when Jupyter was spun off from IPython). It supports graphics and has APIs for working with all other output types.
Now I do need to help make it work with Binder, but it does work.
---
The other point about MCMC samplers is valid. This is why I wrote a binding to R to access everything available in R and why I use Inline::Python sometimes. I should create a binding for Stan --- should not be hard --- at least for CmdStan at first, then Stan C++ next.
>. Now imagine their language lacks something more complicated, like a robust MCMC sampler package à la PyMC3 or STAN, and they have to roll that themselves. Yikes
How is that relevant? CPAN is a package repository, not a MCMC sampling library. Can you point me to a Perl library that implements an API for constructing a probabilistic graphical model and then performs inference on it via MCMC, like PyMC3 or STAN? Is it as robust and fully featured as either of those?
Stan isn’t really written in any of those languages either.
The python pystan is wrapper that ships data to/from the Stan binary and marshals it into a python-friendly form; I think Julia’s is similar.
I’m not exactly volunteering to do it, but a PerlStan would not be that hard to implement. As for scientific communication, a point you raised above, I don’t think it’d be too bad. Most readers of a paper would be interested in the model itself, and that would be written in Stan’s DSL regardless.
Fine, STAN is a bad example since it’s written as a DSL parsed by a standalone interpreter.
But tons of other numerical methods are also missing from Perl. To use another stats example, in another comment, I gave the example that PDL only supports random variable generation for common distributions (e.g. normal, gamma, Poisson). Anything beyond stats 101 level and you’re on your own.
In bringing up CPAN, the other poster's point might have been that Matlab/Python/Octave don't generally contain native implementations of these either. A lot of Matlab and NumPy is wrapper around BLAS/ATLAS, for example.
One could do the same with Perl, and in fact, people have. If you need random variates from a Type 2 Gumbel distribution, for example, Math::GSL::Randist has you covered https://metacpan.org/pod/Math::GSL::Randist#Gumbel
Honestly, I'm not rushing to convert our stuff to PDL, but I did want to push back a little on the idea that python is The One True Way to do scientific computing. It's a fine language, but I think a lot of its specific benefits are overstated (or mixed in with the general idea of taking computing seriously).
Though when I do stats, I often reach for R and have done some work in the past to make PDL work with the R interpreter (it currently has some build bitrot and I need to fix that).
> sum(x)/n is much faster than using pythons built in mean function.
`statistics.mean` uses `_sum` which tries to avoid some basic round-off errors[1]. I think the implementation of `_sum` is needlessly baroque because the implementors are trying to handle multiple types in the same code in a not so type-aware language. Regardless, using `statistics.mean` instead of `sum(x)/len(x)` would eliminate the most common rounding error source.
As for statistical modelling handled by directly inverting matrices, there the problem is singular matrices that appear to be non-singular due to the vagaries of floating point arithmentic in addition to failing to use stable numeric techniques.
The point remains. The detriment to science are people who convert textbook formulas directly to code instead of being aware of implementation with good numerical properties.
See also my blog post "How you average numbers matters"[2].
> Now, in the real world, you have programs that ingest untold amounts of data. They sum numbers, divide them, multiply them, do unspeakable things to them in the name of “big data”. Very few of the people who consider themselves C++ wizards, or F# philosophers, or C# ninjas actually know that one needs to pay attention to how you torture the data. Otherwise, by the time you add, divide, multiply, subtract, and raise to the nth power you might be reporting mush and not data.
> One saving grace of the real world is the fact that a given variable is unlikely to contain values with such an extreme range. On the other hand, in the real world, one hardly ever works with just a single variable, and one can hardly every verify the results of individual summations independently.
Correct algorithms may be slower, but I am hoping that it is easy understand why they ought to be preferred.
This is a big plus compared to Python. I dread installing Python for some project that needs it. Is it wheel, pip, pip3? pip3.7, apt install pip, poetry? Maybe pyenv? Good god what goes in my .bashrc? What in my PATH?
To be clear, you program using ed and the command line instead of ever using an IDE? The said programming environment, you said 'line editors are nice'.
This is the same nonsense game people play where someone says "that's like taking a ship to cross the ocean instead of a plane" and someone else says "I like taking ships, I want to get fresh air for three weeks of solitude instead arriving the next morning"
I do for every language that isn't Common Lisp! I learned to program on a machine that had a copy of vi that refused to go into visual mode; instead of trying to fight it, I started using ed.
Everything about UNIX is oriented toward being a programming environment in itself. There are plenty of developers who just use UNIX as their IDE. Drew DeVault, as an example, is pretty notorious for it.
Complexity distracts and makes efficiency impossible. Modern IDEs are nothing but complexity. UNIX-as-IDE simplifies.
UNIX has all of the tools, and unlike IDEs, it allows you to seamlessly add them on. UNIX is more "cybernetic enhancement" to the IDE's "prosthetic limb."
ed is actually really useful if you're wanting to rapidly iterate on a program. There's a reason Ken Thompson advocated for it up to his retirement (which was very recently, mind).
This is mostly because of the compiler situation. Intel forbids redistribution of theirs, and GNU's isn't up to snuff, which continues to handicap the language.
I mean that someone using Perl for scientific computing makes it harder for others to collaborate on the project, which in turn makes the project less scientifically valuable.
This is due to two main reasons. For one, Perl’s incredible syntactical flexibility makes it easy to write “clever” one liners that are hard to comprehend. Speaking from experience, scientists tend to be the sort of people who value “clever” code over “clean” code.
Secondly, the Perl package ecosystem for numerical/scientific methods just isn’t as fully featured as the Julia/Python/R ecosystems. This leads to individuals having to reimplement methods themselves, which leads to idiosyncrasies and likely bugs. For instance, I see no way to generate Wishart random variables (or RVs from other distributions beyond the common ones) in PDL. Julia, Python (via SciPy), and R all have full featured support for many different distributions beyond the common ones.
A Perl user would thus have to implement this themselves. Someone else reading their code would have to both familiarize themselves with the custom function’s syntax (as opposed to immediately recognizing the standardized scipy.stats.wishart, which behaves like any other scipy probability distribution class) and likely check for any bugs, since a standard package is far more likely to be correct than some random one-off function. I’ve had the unfortunate experience of working with someone who refused to use off-the-shelf libraries for numerical methods, and unsurprisingly their code was not only hard to read (since there was no standardization) but also full of bugs.
> Someone using Perl for scientific computing makes it harder for others to collaborate on the project
Just to the other people that never have the ambition, desire or toke the time to develop new valuable skills. This is not a bad thing necessarily. Those people are time sinks.
In any case you can take the decision to document or not your code in any language. "; # this line does that" is not hard to write.
And different science teams collaborate, but also compete for money. So everybody doing the same can lead to "the more dishonest takes all" and kills all the other teams. Sometimes is useful to protect yourself from the people trying to backstab you and steal your best tools. Tools that toke you decades to develop and polish. Not ALL is freely shared in science.
And yes, my Perl scripts were a triple headache to write, but still work flawlessly after all this years.
Perl is just a tool to do something, and you should never use one (and the same) tool for everything unless your goal is to be a mediocre scientist sucking from other people's efforts all the time.
PDL also has support for many of those distributions beyond the common ones. All the GSL ones in fact. Except Wishart didn't get a binding because that was just added to GSL in 2018. So thanks! I'll add the one line needed to bind that to PDL now and check if others are missing.
These are mostly the negative externalities of using computer code.
It is no harder to collaborate with Perl than with R (been there done that) or Julia (have not done that.
PDL is addressing your point about libraries.
Horses for courses. If your research involves streams of text (a lot of things are streams of text) and transforming them then Perl is a likely contender as the best choice.
All the comments suggesting that this somehow shouldn't be and that people should move away from PDL, are depressing in the way that, in a split second, years of effort and thousands lines of code that are probably doing good work are dismissed just like that.
If a bank is still using Cobol, it is interesting and a testament to how a Cobol programmer can still make a good living on it. But if a scientist is using Perl to carry out calculations, this is somehow bad?