Perl Data Language: Scientific Computing with Perl

sivoais · on June 9, 2021

Hi, PDL core dev here. Feel free to ask me anything about it.

The last release wasn't in February, it was just last week! <https://metacpan.org/release/ETJ/PDL-2.050>.

I agree with many of the commenters here that Python has a lot of great libraries and is a major player for scientific computing these days. I also code in Python from time to time, but I prefer the OO modelling and language flexibility features of Perl.

Speaking for myself and not the other PDL devs, I don't think this is an issue for Perl-using scientists as Perl can actually call Python code quite easily using Inline::Python. In the future I will be working on interoperability between the two better specifically for NumPy / Pandas. This is also the path being taken by Julia and R.

enriquto · on June 9, 2021

Looks great! I used perl a lot when I started programming and it is lovely to see it alive and kicking with scientific computing!

As a "heavy" user of scientific computing, I must say that the name "data language" is a bit disheartening... It echoes of useless "data frames" not of cool "sparse matrices" which is what I actually need. Does PDS support large sparse matrices? I grepped around the tutorial and the book and the word "sparse" is nowhere to be found. Yet it is an essential data structure in scientific computation. Are there any plans to, e.g., provide an interface into standard libraries like suitesparse?

sivoais · on June 9, 2021

I plan to improve that, but will need to figure out the design (perhaps with something from Eigen). There is <https://metacpan.org/pod/PDL::CCS>, but it is not a real full PDL ndarray and is actually a wrapper around the PDL API.

1996 · on June 9, 2021

Very interesting, thank you!

Do you have a tutorial and some examples? If not, could you write one?

I sometimes deploy perl code at large scale for financial computing where only performance matters: with XS the overhead is low while gaining language flexibility.

Even in 2021, this is usually faster than alternatives by orders of magnitude.

PDL could be a good addition to our toolset for specific workloads.

sivoais · on June 9, 2021

Here is a link to the PDL book <http://pdl.perl.org/content/pdl-book-toc.html>.

I can share some examples of using PDL:

- Demos of basic usage <https://metacpan.org/release/ETJ/PDL-2.050/source/Demos/Gene...>

- Image analysis <https://nbviewer.ipython.org/github/zmughal/zmughal-iperl-no...> (I am also the author of IPerl, so if you have questions about it, let me know. My top priority with IPerl right now is to make it easy to install.)

- Physics calculations <https://github.com/wlmb/Photonic>

- Access to GSL functions for integration and statistics (with comparisons to SciPy and R): <https://gist.github.com/zmughal/fd79961a166d653a7316aef2f010...>. Note how PDL can take an array of values as input (which gets promoted into a PDL of type double) and then returns a PDL of type double of the same size. The values of that original array are processed entirely in C once they get converted to a PDL.

- Example of using Gnuplot <https://github.com/PDLPorters/PDL-Graphics-Gnuplot/blob/mast...>.

---

Just to give a summary of how PDL works relative to XS:

PDL allows for creating numeric ndarrays of any number of dimension of a specific type (e.g., byte, float, double, complex double) that can be operated on by generalized functions. These functions are compiled using a DSL called PP that generates multiple XS functions by taking a signature that defines the number of dimensions that the function operates over for each input/output variable and adding loops around it. These loops are quite flexible and can be made to work in-place so that no temporary arrays are created (also allows for doing pre-allocation). The loops will run multiple times over that same piece of memory --- this is still fast unless you have many small computations.

And if you do have many small computations, the PP DSL is available for the user to use as well so if they need to take a specific PDL computation written in Perl, they can translate the innermost loop into C and then it can do the whole computation in one loop (a faster data access pattern). There is a book for that as well called "Practical Magick with C, PDL, and PDL::PP -- a guide to compiled add-ons for PDL" <https://arxiv.org/abs/1702.07753>.

---

I'm also active on the `#pdl` IRC channel on <https://www.irc.perl.org/>, so feel free to drop by.

zengargoyle · on June 9, 2021

Now you just need to port it to Raku. (Maybe you have).

sivoais · on June 9, 2021

I would really like to do some scientific computing in Raku. It has crossed my mind that I can maintain both Perl5 and Raku ports of some of the library code I'm writing. I just haven't worked through the tooling.

audit · on June 9, 2021

Thank you for your work. I used PDL early 2000 when working in bioinformatics area.

I did not know at the time any of the specialized languages, so intially approaching the project -- I was very concerned on how to deal with matrices, but as I got to understand the PDL better -- i was getting better and better at it.

If I may suggest someting (this is based on the old experience though) --

a) some 'built-in' way to seamlessly distribute work across processes and machines.

b) some seamless excel and libreoffice calc integration.

Meaning that I should be able to 'release' my programs as Excel/Libre Office files.

Where I code in PDL but leverage Spreadsheet as a 'UI' + calc runtime.

So that when I run my 'make' I get out a Excel/Libre office file that I can version and distribute into user or subsequent compute environments.

Where the PDL code is translated into the runtime understood by the spreadsheet engine.

I know this is a lot to ask, and may be not in the direction you are going, but wanted to mention still.

sivoais · on June 9, 2021

Good ideas!

a)

A built-in way would be good. There is some work being explored in using OpenMP with Perl/PDL to get some of that. In the mean time, there is MCE which does distribute across processes and there are examples of using this with PDL <https://github.com/marioroy/mce-cookbook#sharing-perl-data-l...>, but I have not had an opportunity to use it.

b)

Output for a spreadsheet would be difficult if I understand the problem correctly. This would more about creating a mapping of PDL function names to spreadsheet function names --- not all PDL functions exist in spreadsheet languages. It might be possible to embed or do IPC with a Perl interpreter like <https://www.pyxll.com/>, but I don't know about how easy that would be to deploy when distributing to users.

Am I understanding correctly?

Interestingly enough, creating a mapping of PDL functions would be useful for other reasons, so the first part might be possible, but the code might need to be written in a certain way that makes writing the dataflow between cells easier.

aduitsis · on June 8, 2021

I can see in the page that last PDL release was on February, okay. Probably the presence of activity indicates that there are people using PDL, and more than that, maintaining it. Since PDL existed for many years, I would speculate that these people presumably didn't jump into PDL yesterday. There can very well be large codebases using it for a long time.

All the comments suggesting that this somehow shouldn't be and that people should move away from PDL, are depressing in the way that, in a split second, years of effort and thousands lines of code that are probably doing good work are dismissed just like that.

If a bank is still using Cobol, it is interesting and a testament to how a Cobol programmer can still make a good living on it. But if a scientist is using Perl to carry out calculations, this is somehow bad?

ajsnigrutin · on June 8, 2021

This is the sentiment I've been seeing online, especially here.

1/3 of the posts here are "<old tool that already exists in a stable, mature codebase> in {rust|go|whatevernewlanguagecomesnextweek} released v0.0.1"

Perl is a great language, that does it's job for many, many things, especially with CPAN, and it has been doing so for years. You can buy a 20yo book on perl, and 99.99% of the example code from that book still works, and same goes for projects from that era (which cannot be said for python, where developers and distro mainanters seem to enjoy removing usable, mature projects, just because they're written for python2.7 and incompatible with 3+).

If I have to write a script once, that I can forget about, and just expect it to run for years, perl will always be my first choice.

anthk · on June 9, 2021

>99.99% of the example code

Orelly's Perl books from the CD bookshelf still work.

Just declare a variable with "my =" in front of it (just once), and everything will work as usual:

old:

    $num = 3;
    print $num;

new:

    my $num = 3;
    print $num;

ajsnigrutin · on June 9, 2021

It works without "my" too :)

$ perl

    $num = 3;

    print $num;

3

(perl v5.32.1, without "use strict" of course)

anthk · on June 9, 2021

Ah,TIL.

As I always used

use strict; use warnings;

as something like muscle memory, I didn't know this.

b2gills · on June 16, 2021

One of the things that `use strict` does is enable `use strict "vars"`.

    use strict;

    no strict qw(vars);
    $foo = $bar;

That's part of the reason `use strict` is recommended.

---

The other major reason is `"refs"`, which disable symbolic refs. Honestly this is *THE* main reason I recommend `use strict`.

Symbolic refs are how you did arrays of arrays prior to Perl5. (Among other uses.)

    use 4.0;

    @a = 'b','c';
    @b = (1, 2);
    @c = (3, 4);

    print $a[1]->[1], "\n"; # 4

That can be a security risk if an attacker can insert or change strings in `@a`.

It isn't (generally) needed anymore of course.

    use 5.0;
    my @a = ( \[1,2], \[3,4] );

    print $a[1]->[1], "\n"; # 4

(The only reason `@b` and `@c` existed was to symbolically reference them.)

ocschwar · on June 9, 2021

According to Google, a one off I wrote in Perl in 1998 is still in use at the lab I wrote it in.

lmm · on June 9, 2021

> If a bank is still using Cobol, it is interesting and a testament to how a Cobol programmer can still make a good living on it.

No it isn't. It's a testament to how backward that bank is. You'll see upvoted contrary takes here, sure, but that's because middlebrow contrarianism is a good way to get upvoted on HN.

ocschwar · on June 9, 2021

I loved using Perl for projects in the early days of the Web. For anything even remotely expressive or artistic, Perl was the way to go. But if you want to communicate scientific insight, using the write-only language is something I have to maintain my doubts. But, if Inline::Python works as well as the comments above indicate, then I might be tempted again. Number crunching in Python, pretty pictures and presentation in Perl.. Hmm.

worik · on June 9, 2021

How is Perl write only and Python not?

I am biased. I love Perl and hate Python. Makes me feel very old....

R0b0t1 · on June 9, 2021

Don't worry, I think Perl is good. I like the freedom you get when writing it. It's especially good at tasks that shell is just not quite expressive enough for. I used it heavily in a sysadmin job and it was way easier than writing ansible scripts.

The only thing holding me back is if I want to use ${library} I probably can't do so from Perl.

TurboHaskal · on June 9, 2021

What can I say, Perlphobia is real.

stabbles · on June 8, 2021

It's unlikely to perform well, so it may not be the right tool for the job.

natch · on June 8, 2021

Can't speak to this library but in general Perl stands out as being extremely performant, if that is the goalpost you want to go with.

pvaldes · on June 8, 2021

PDL was created to improve the perl performance when working with n-dimensional matrixes so should be even better than plain perl

MontyCarloHall · on June 8, 2021

> If a bank is still using Cobol, it is interesting and a testament to how a Cobol programmer can still make a good living on it. But if a scientist is using Perl to carry out calculations, this is somehow bad?

Yes. The negative externalities of doing scientific analyses in Perl are much greater than a bank having a legacy COBOL codebase. Only a handful of engineers within the bank will ever see that COBOL codebase. Science is globally collaborative; many people at many different institutions across the world would have to deal with some idiosyncratic scientist’s decision to write their analysis in Perl.

Also, the bank only has a COBOL codebase because it’s reluctant to make major changes to an extremely important system that’s been working flawlessly since the early 60s. There’s absolutely no reason to start a totally brand new project in Perl (or COBOL, for that matter), when far superior alternatives exist.

nanis · on June 9, 2021

> idiosyncratic scientist’s decision to write their analysis in Perl.

In my experience, looking at "scientist" produced code, the programming language matters very little. It is not hard to produce something completely inscrutable and non-replicable in Python and R the same way it's been done for ages using SAS, Stata, MatLab etc.

I still see people rolling out their regression computations using matrix inversion and calculating averages as `sum(x)/n`.

I really like PDL when I can use it. I have had problems building it from source on Windows in the past, but it is actually a very well thought out library.

Also worth mentioning, you can get a lot of mileage out of GSL[1].

[1]: https://www.gnu.org/software/gsl/

MontyCarloHall · on June 9, 2021

> It is not hard to produce something completely inscrutable and non-replicable in Python and R the same way it's been done for ages using SAS, Stata, MatLab etc.

It’s possible to write inscrutable code in any language, but some languages sure make it easier.

Syntax issues aside, the main advantage of Julia/Python/R (the latter’s syntax might even be worse than Perl’s) for scientific computing is their ecosystems. A language for a particular use case is only as good as the packages available for that use case. The scientific package ecosystems for Ju/Py/R are far richer than that of Perl, simply because their userbases are much larger. Thus, a scientist using Perl would likely be forced to roll a lot of their own functions, which makes the code idiosyncratic and more likely to contain bugs. (To use one of your examples, people might implement OLS regression by manually computing the hat matrix because no stats package exists for the language they’re using. Now imagine their language lacks something more complicated, like a robust MCMC sampler package à la PyMC3 or STAN, and they have to roll that themselves. Yikes.)

And that’s not even getting into the value of the languages for interactive scientific computing, which is how most of it gets done these days. For instance, there’s no official Jupyter notebook support for Perl (although unofficial plugins exist, they don’t support inline graphics/dataframes/other widgets), and the REPLs for Julia/Python/R are much more modern and fully-featured than PDL2.

BTW, I agree the GSL is great for building standalone tools, but it’s totally irrelevant for any interactive work.

sivoais · on June 10, 2021

> For instance, there’s no official Jupyter notebook support for Perl

Not sure how official support would work in the Jupyter Project since anybody can write a kernel. I wrote the Perl one (IPerl) and that has existed since 2014 (when Jupyter was spun off from IPython). It supports graphics and has APIs for working with all other output types.

Now I do need to help make it work with Binder, but it does work.

---

The other point about MCMC samplers is valid. This is why I wrote a binding to R to access everything available in R and why I use Inline::Python sometimes. I should create a binding for Stan --- should not be hard --- at least for CmdStan at first, then Stan C++ next.

anthk · on June 9, 2021

>. Now imagine their language lacks something more complicated, like a robust MCMC sampler package à la PyMC3 or STAN, and they have to roll that themselves. Yikes

CPAN predates all of those.

MontyCarloHall · on June 9, 2021

How is that relevant? CPAN is a package repository, not a MCMC sampling library. Can you point me to a Perl library that implements an API for constructing a probabilistic graphical model and then performs inference on it via MCMC, like PyMC3 or STAN? Is it as robust and fully featured as either of those?

mattkrause · on June 9, 2021

Stan isn’t really written in any of those languages either.

The python pystan is wrapper that ships data to/from the Stan binary and marshals it into a python-friendly form; I think Julia’s is similar.

I’m not exactly volunteering to do it, but a PerlStan would not be that hard to implement. As for scientific communication, a point you raised above, I don’t think it’d be too bad. Most readers of a paper would be interested in the model itself, and that would be written in Stan’s DSL regardless.

MontyCarloHall · on June 9, 2021

Fine, STAN is a bad example since it’s written as a DSL parsed by a standalone interpreter.

But tons of other numerical methods are also missing from Perl. To use another stats example, in another comment, I gave the example that PDL only supports random variable generation for common distributions (e.g. normal, gamma, Poisson). Anything beyond stats 101 level and you’re on your own.

mattkrause · on June 9, 2021

In bringing up CPAN, the other poster's point might have been that Matlab/Python/Octave don't generally contain native implementations of these either. A lot of Matlab and NumPy is wrapper around BLAS/ATLAS, for example.

One could do the same with Perl, and in fact, people have. If you need random variates from a Type 2 Gumbel distribution, for example, Math::GSL::Randist has you covered https://metacpan.org/pod/Math::GSL::Randist#Gumbel

Honestly, I'm not rushing to convert our stuff to PDL, but I did want to push back a little on the idea that python is The One True Way to do scientific computing. It's a fine language, but I think a lot of its specific benefits are overstated (or mixed in with the general idea of taking computing seriously).

sivoais · on June 9, 2021

Yep, there's more than one way to do things and PDL wraps all the same GSL functions <https://metacpan.org/pod/PDL::GSL::RNG#ran_gumbel1>.

Also note that PDL does automatic broadcasting of input variables so it does an entire C loop for an array of values being evaluated. See this example <https://gist.github.com/zmughal/fd79961a166d653a7316aef2f010...> for how that applies to all GSL functions that are available in PDL. Though I do notice that some of the distributions available at <https://docs.scipy.org/doc/scipy/reference/stats.html#contin...> are not in GSL.

Though when I do stats, I often reach for R and have done some work in the past to make PDL work with the R interpreter (it currently has some build bitrot and I need to fix that).

smabie · on June 9, 2021

tbf, sum(x)/n is much faster than using pythons built in mean function.

nanis · on June 9, 2021

> sum(x)/n is much faster than using pythons built in mean function.

`statistics.mean` uses `_sum` which tries to avoid some basic round-off errors[1]. I think the implementation of `_sum` is needlessly baroque because the implementors are trying to handle multiple types in the same code in a not so type-aware language. Regardless, using `statistics.mean` instead of `sum(x)/len(x)` would eliminate the most common rounding error source.

As for statistical modelling handled by directly inverting matrices, there the problem is singular matrices that appear to be non-singular due to the vagaries of floating point arithmentic in addition to failing to use stable numeric techniques.

The point remains. The detriment to science are people who convert textbook formulas directly to code instead of being aware of implementation with good numerical properties.

Note:

    >>> x = [1e9 + .1, 1.1] * 50000
    >>> sum(x)/len(x)
    500000000.60091573

whereas

    >>> import statistics
    >>> statistics.mean(x)
    500000000.6

See also my blog post "How you average numbers matters"[2].

> Now, in the real world, you have programs that ingest untold amounts of data. They sum numbers, divide them, multiply them, do unspeakable things to them in the name of “big data”. Very few of the people who consider themselves C++ wizards, or F# philosophers, or C# ninjas actually know that one needs to pay attention to how you torture the data. Otherwise, by the time you add, divide, multiply, subtract, and raise to the nth power you might be reporting mush and not data.

> One saving grace of the real world is the fact that a given variable is unlikely to contain values with such an extreme range. On the other hand, in the real world, one hardly ever works with just a single variable, and one can hardly every verify the results of individual summations independently.

Correct algorithms may be slower, but I am hoping that it is easy understand why they ought to be preferred.

[1]: https://github.com/python/cpython/blob/5571cabf1b3385087aba2...

[2]: https://www.nu42.com/2015/03/how-you-average-numbers.html

dolmen · on June 11, 2021

Thanks for this post.

As this is worth to be better known, I submitted it here: https://news.ycombinator.com/item?id=27470323

caslon · on June 8, 2021

Perl is shipped in virtually every Linux distribution; it's much closer to standard than, say, Node.

fmakunbound · on June 9, 2021

This is a big plus compared to Python. I dread installing Python for some project that needs it. Is it wheel, pip, pip3? pip3.7, apt install pip, poetry? Maybe pyenv? Good god what goes in my .bashrc? What in my PATH?

anthk · on June 9, 2021

Also, on Perl:

Install cpanminus and then install local::lib.

cpanm local::lib

cat >> ~/.profile << EOF eval $(perl -I ~/perl5/lib/perl5/ -Mlocal::lib) EOF

done.

aduitsis · on June 9, 2021

Please take a look at Carton too.

MontyCarloHall · on June 8, 2021

So is ed. That doesn’t mean it’s a better coding environment than modern IDEs, even though it’s far more “standard.”

caslon · on June 8, 2021

Actually, line editors are really nice, and UNIX arguably makes a better IDE than any purpose-specific one.

CyberDildonics · on June 9, 2021

To be clear, you program using ed and the command line instead of ever using an IDE? The said programming environment, you said 'line editors are nice'.

This is the same nonsense game people play where someone says "that's like taking a ship to cross the ocean instead of a plane" and someone else says "I like taking ships, I want to get fresh air for three weeks of solitude instead arriving the next morning"

caslon · on June 9, 2021

I do for every language that isn't Common Lisp! I learned to program on a machine that had a copy of vi that refused to go into visual mode; instead of trying to fight it, I started using ed.

Everything about UNIX is oriented toward being a programming environment in itself. There are plenty of developers who just use UNIX as their IDE. Drew DeVault, as an example, is pretty notorious for it.

Complexity distracts and makes efficiency impossible. Modern IDEs are nothing but complexity. UNIX-as-IDE simplifies.

thesnide · on June 9, 2021

Confirms that the brain is incredibly plastic.

It can learn to compensate for a missing tool, arm, eye, ...

And after a while it feels just natural.

caslon · on June 9, 2021

UNIX has all of the tools, and unlike IDEs, it allows you to seamlessly add them on. UNIX is more "cybernetic enhancement" to the IDE's "prosthetic limb."

ed is actually really useful if you're wanting to rapidly iterate on a program. There's a reason Ken Thompson advocated for it up to his retirement (which was very recently, mind).

anthk · on June 9, 2021

- ed/nvi

- bc|dc/perl/little of C

- Gnuplot

- entr + make

Amazing performance with very little usage of resources.

pklausler · on June 8, 2021

Or Fortran, ironically.

caslon · on June 8, 2021

This is mostly because of the compiler situation. Intel forbids redistribution of theirs, and GNU's isn't up to snuff, which continues to handicap the language.

worik · on June 9, 2021

"The negative externalities of doing scientific analyses in Perl"

What do you mean?

MontyCarloHall · on June 9, 2021

I mean that someone using Perl for scientific computing makes it harder for others to collaborate on the project, which in turn makes the project less scientifically valuable.

This is due to two main reasons. For one, Perl’s incredible syntactical flexibility makes it easy to write “clever” one liners that are hard to comprehend. Speaking from experience, scientists tend to be the sort of people who value “clever” code over “clean” code.

Secondly, the Perl package ecosystem for numerical/scientific methods just isn’t as fully featured as the Julia/Python/R ecosystems. This leads to individuals having to reimplement methods themselves, which leads to idiosyncrasies and likely bugs. For instance, I see no way to generate Wishart random variables (or RVs from other distributions beyond the common ones) in PDL. Julia, Python (via SciPy), and R all have full featured support for many different distributions beyond the common ones.

A Perl user would thus have to implement this themselves. Someone else reading their code would have to both familiarize themselves with the custom function’s syntax (as opposed to immediately recognizing the standardized scipy.stats.wishart, which behaves like any other scipy probability distribution class) and likely check for any bugs, since a standard package is far more likely to be correct than some random one-off function. I’ve had the unfortunate experience of working with someone who refused to use off-the-shelf libraries for numerical methods, and unsurprisingly their code was not only hard to read (since there was no standardization) but also full of bugs.

pvaldes · on June 9, 2021

> Someone using Perl for scientific computing makes it harder for others to collaborate on the project

Just to the other people that never have the ambition, desire or toke the time to develop new valuable skills. This is not a bad thing necessarily. Those people are time sinks.

In any case you can take the decision to document or not your code in any language. "; # this line does that" is not hard to write.

And different science teams collaborate, but also compete for money. So everybody doing the same can lead to "the more dishonest takes all" and kills all the other teams. Sometimes is useful to protect yourself from the people trying to backstab you and steal your best tools. Tools that toke you decades to develop and polish. Not ALL is freely shared in science.

And yes, my Perl scripts were a triple headache to write, but still work flawlessly after all this years.

Perl is just a tool to do something, and you should never use one (and the same) tool for everything unless your goal is to be a mediocre scientist sucking from other people's efforts all the time.

sivoais · on June 10, 2021

PDL also has support for many of those distributions beyond the common ones. All the GSL ones in fact. Except Wishart didn't get a binding because that was just added to GSL in 2018. So thanks! I'll add the one line needed to bind that to PDL now and check if others are missing.

worik · on June 10, 2021

These are mostly the negative externalities of using computer code.

It is no harder to collaborate with Perl than with R (been there done that) or Julia (have not done that.

PDL is addressing your point about libraries.

Horses for courses. If your research involves streams of text (a lot of things are streams of text) and transforming them then Perl is a likely contender as the best choice.

sega_sai · on June 8, 2021

I'm afraid this is ~ 10-15 years too late. I've used PDL somewhat in ~ 2005-2010 when python didn't have that many packages/numpy, and it did the job and could substitute IDL to some degree. But realistically, right now I don't see any reason to use PDL instead of Python.

bjornjajayaja · on June 8, 2021

Meh… Python zealots abound. Folks often say the same thing about Fortran but frankly, those are the engines that power ALL scripting languages math features. You’re running on C/C++/FORTRAN. If one needs better text-parsing features on top of that: use Perl; if you use a Python environment: use Python.

Perl is superior in many ways and I think using it for data-exploration still has tons of merit.

Folks “shoot themselves in the foot” when they don’t understand the language. In Perls case: list/scalar context is usually the culprit, but it is quite easy to understand. It’s more flexible and concise which in many ways makes Perl better at exploratory programming.

sega_sai · on June 8, 2021

I (and many in science) switched from C/PDL/IDL to Python in last 10 years because it was is the best tool, not for zealotry.

Python vs Perl for science has nothing to do with c/c++/fortran. C/C++/fortran have their own place for science computations. Perl does not. For anything involving any kind of numerics Python will be faster/better tested/will have more libraries/will have better visualisation capabilities, so there is no need for perl.

Sure if you are only working with strings, you can use perl, but that's hardly scientific computing (not at least the field that I work in).

pvaldes · on June 9, 2021

> if you are only working with strings ... that's hardly scientific computing

It depends on the field. RNA/DNA, Polypeptids and proteins are just long chains of text, therefore Perl can deal easily with the problems of finding things, manipulating them to build a new chain or translating the chain to a different format. This is a significant chunk of what Bioinformatics do all the time, and Bioperl can manage it.

Also a big part of astronomy is analyzing or finding stars in a tridimensional matrix of space. PDL can be useful with that. Is not dificult to extract a slice of interest in the space matrix and focus our research on it. The main problem could be the lack of experienced people available having exactly this problem to solve.

I don't know if original Perl is very good or bad for that, but there are several Math modules that could have what you want and be easily connected with the former stuff. In fact there are a lot of them to choose:

https://metacpan.org/search?q=Math

Raku at least has a Math::Model module to simulate Physics stuff. I ignore how developed is the module or how its perform would compare with Pyton's similar stuff

mr_toad · on June 8, 2021

> In Perls case: list/scalar context is usually the culprit

That’s just the surface. Working with matrices means nested data structures, and Perls syntax for nested data structures makes the list/scalar context look like child’s play.

zbentley · on June 9, 2021

I'm no Perl lover (spent many years programming on codebases far larger than they ever should have been allowed to grow before switching languages), but what's so hard about references? Definitely not one of the sharper edges in the language IMO.

citrin_ru · on June 9, 2021

Python has just one way to use list and dict and they are always passed by reference, but in Perl it can be either array/hash or arrayref/hashref. More options requires more thinking. Also Perl recently (in 5.20) introduced postfix dereference syntax which means that enough people were unhappy with existing ways to dereference. But it adds one more straw on camel's back - makes syntax harder to learn which is already a frequent complain about Perl.

Having said all than I have no problems using references in Perl, but see why a Pythonista can find Perl code hard to read.

7thaccount · on June 8, 2021

The question isn't about Perl vs Python (which I still think is a clear winner for Python, although there are some uses where I could perfectly understand reaching for Perl first).

The question is whether Perl's PDL is better than ScyPy/Numpy/Pandas/Cython, the Spyder IDE, bunch of plotting libraries...etc.

analog31 · on June 9, 2021

Indeed, I'm a scientific programmer, and I've noticed that when most scientists talk about "Python," they don't make a clear distinction between where the language ends and the libraries begin. If they even know. Most users, myself included, download some big installer like Anaconda or WinPython, and off we go.

7thaccount · on June 9, 2021

Same. Obviously a OO scripting language by itself isn't fast enough for most scientific coding, but when the C/Fortran libraries are tightly integrated with lots of stats/plots and other libraries with great IDE/database...etc it makes for a great and free/open scientific modeling platform. It's really the whole package and the language is just 1 component of that.

analog31 · on June 9, 2021

Indeed, and it's my understanding that supporting that kind of integration is a strength of Python. Since I use Python in the lab, I've also found that the whole ctypes thing is a life saver when dealing with hardware drivers that are only furnished with a C API.

mangecoeur · on June 8, 2021

People like other people using python because it makes it easier to share and exchange and built a community of knowledge and libraries. Science is a lot about ease of collaboration.

pvaldes · on June 8, 2021

Is this true or is a belief?

The fact is that most people don't even have an opinion about what is Python or Perl.

Python can seem easy to share after you copy part of a python script and miss one blank space somewhere at the end of a line or start copying in the wrong line. If you use a dumb text editor the script will easily turn into a ugly mess. Perl scripts don't have this problem, so some people could say that they are easier to exchange and share in fact.

Many Perl authors will be really glad to share your code with you. This is what they built an online community of knowledge and libraries called CPAN where you can find it easily. To find authors willing to help and explain obscure parts of their own scripts also if asked politely, is not uncommon or particularly difficult.

mangecoeur · on June 9, 2021

> part of a python script and miss one blank space somewhere at the end of a line or start copying in the wrong line. If you use a dumb text editor the script will easily turn into a ugly mess.

I think we can bury this 'Python has significant whitespace' criticism for good now. I have taught python to people from hugely diverse backgrounds (including literature and law) and not once has this been an issue (on the contrary it's a massive help to readability).

Also, no sane person teaches people to code python with a 'dumb' text editor - you give them Jupyter notebooks or VSCode or PyCharm (which has an excellent educational version) or Notepad++ or something.

bquinlan · on June 8, 2021

I think that the comment that you relied to meant that it is easier for scientists to collaborate using a single language rather than many. For some domains, Python has emerged as that single language. That doesn't mean that Python is the best language.

mangecoeur · on June 9, 2021

Exactly this. To some extent the language doesn't matter - although I think Python has become especially popular thanks to being one of the easiest languages to learn and use in practice. Then you get lots of network effects - e.g. grad students learn python, and when they become supervisors they teach their students python.

asdff · on June 9, 2021

You'd be hard pressed to find someone who only knows python, though. Where there is python data science there is usually also R, like smoke and fire. The syntaxes are similar enough, imo R a little simpler even.

pvaldes · on June 9, 2021

This is mediocracy and a bad strategy

anthk · on June 9, 2021

> People like other people using python because it makes it easier to share and exchange and built a community of knowledge and libraries. Science is a lot about ease of collaboration.

You realize CPAN did it first right?

mangecoeur · on June 9, 2021

That's not the point, people are using python so when they work with other people, it helps if everyone is using python. Maybe CPAN was first but it never became standard in the science community.

dolodoot · on June 8, 2021

> right now I don't see any reason to use PDL instead of Python.

I like python a lot and actually i wish less scientists used it. Perl/PDL is in my opinion much better suited to this (understandable) just getting the job done approach i often find in sciene or other areas in which writing software is not the primary goal.

flobosg · on June 8, 2021

Somewhat related, “How Perl Saved the Human Genome Project (1996)” : https://www.foo.be/docs/tpj/issues/vol1_2/tpj0102-0001.html

hoytech · on June 8, 2021

I really like PDL, even though I don't get to use it much anymore. Here's a fun application, FM radio demodulation in a page or so of pretty straightforward code:

https://hoytech.com/talks/fm-demodulation

anonu · on June 8, 2021

I think Perl missed the boat... too little way way too late.

Also - they can't even get their SSL certs straight.

I loved Perl - it was certainly my gateway to coding. Regex was beautiful in perl. But I shot myself in the foot so many times with that language.

hatsuseno · on June 9, 2021

Specifically how can't Perl get their SSL certs straight?

dima55 · on June 9, 2021

I used perl and PDL heavily, before moving to Python and numpy. Both have annoying issues, and oddly, their warts are complementary. Particularly, the core API in PDL is miles better than numpy's. Before I could tolerate actually using numpy, I had to write a library to patch away numpy's warts, by effectively writing a PDL compatibility layer. Check it out:

https://github.com/dkogan/numpysane/

Now the core numpy has usable broadcasting, concatenation and basic linear algebra. Kudos to the PDL team for the excellent core design.

denimboy · on June 8, 2021

See also integration between PDL and MXNet deep learning framework:

   https://mxnet.apache.org/versions/1.8.0/api/perl

codezero · on June 8, 2021

Wow what a throwback. I used IDL when I worked in astrophysics, and I loved it. It had a lot of fun capabilities and was super easy to build C libraries for.

Back then there was also an attempt at making GDL (gnu data language), but the tooling for IDL was so deep I doubt anything can directly replace it. Then again I've been out of this for twelve years.

wumpus · on June 9, 2021

I just got back into astrophysics after having been gone for 25 years, and no one uses IDL anymore.

codezero · on June 9, 2021

I’d love to know what it looks like now. I knew a few folks doing a lot of GPU work, but so many old folks still making charts and figures in IDL, myself included.

wumpus · on June 14, 2021

It's mostly Python most of the time, from my vantage point. Several of the GPU projects I'm watching have GPU kernels contained inside of python frameworks. So the typical user only has to change the python part of the code.

codezero · on June 14, 2021

Hah, that's fantastic. This reminds me of a talk I attended about how most of science/academia has a lot of fast-moving glue to stitch together all the old/hard bits.

https://www.easterbrook.ca/steve/2010/12/agu-session-on-soft...

0xbadcafebee · on June 8, 2021

Please read this page first (and note the date): http://pdl.perl.org/?docs=Philosophy&title=PDL::Philosophy

anthk · on June 9, 2021

Who cares about the date. AWK is old as heck, and with The C Programming Language 2e over "The Unix Programming Environment" (Overriding pre ANSI-C) you can do magic with very few resources.

Oh, you need to understand the underlying maths well in order to code your functions right? That's the biggest issue in data science management. Too much relying on specialized " biggies" like Numpy/CUDA with atrocious codebases where the calculations lasts months compared to a 1 hour chore with C or even Perl with Gnuplot.

__vim__ · on June 9, 2021

Could you give examples of these atrocious code bases and programs that are as slow as you say.

anthk · on June 9, 2021

http://phroxy.z3bra.org/bitreich.org:70/0/con/2020/rec/energ...

f6v · on June 8, 2021

Bioinformatics just finished transitioning from Perl to R and Python. The last thing I need is a Perl renaissance. Hope it never happens.

anthk · on June 9, 2021

On science and energy saving:

http://phroxy.z3bra.org/bitreich.org:70/0/con/2020/rec/energ...

It's C, well, but C, Perl and Unix are related cousins and prototyping in Perl it's really fast.

leephillips · on June 8, 2021

How interesting. I’m surprised that I’d never heard of this before.

the_only_law · on June 8, 2021

I remember many years ago, outside programming I had a moderate interest in biology. I recall looking into bioinformatics, and reading about Python, but also about how popular Perl was in field. This was 5-10 years ago, though the sources I was reading very likely could have been 15 years old or more.

gvurrdon · on June 8, 2021

Perl was extremely popular 20+ years ago in biology; I made the jump from the lab bench to bioinformatics with Perl and Bioperl (https://bioperl.org/). But, after a while I discovered Ruby, and later Python, and moved on, also switching to other fields. The presence of Perl and a "proper" terminal on Mac OS X back then was a big draw in encouraging bioinformaticians to use it (well, that is my recollection, anyway).

kingaillas · on June 8, 2021

This tickled a memory... I remember seeing a Bioinformatics for Perl book ~20 years ago. I found it on O'reilly's website: https://www.oreilly.com/library/view/beginning-perl-for/0596...

Searching more I see a Python one appears in 2009: https://www.oreilly.com/library/view/bioinformatics-programm...

And now there are several out. Plus expanded topics like Python and Machine Learning, Python and Data Analysis, etc.

gvurrdon · on June 9, 2021

That first was a good book - there's still a copy on my shelf in the office, I think, though I've not had chance to go there in person to check recently.

leephillips · on June 8, 2021

The first thing that occurs to me, and this may be superficial, is that bioinformatics has a linguistic character to it, with pattern matching of sequences and so forth, and this is something where Perl is more adept than most other languages.

f6v · on June 8, 2021

Bioinformatics today is much more about complex statistical models applied to sequencing data. There’s some pattern matching when doing low-level FASTQ processing for your fancy combinatorial barcoding. But other than that you need tools capable of processing huge datasets. Python ecosystem fits better in modern era.

MontyCarloHall · on June 9, 2021

Yup. I’ll occasionally still break out Perl for quickly parsing FASTQs/SAM alignments/VCFs on the command line when sed/awk/grep won’t cut it, but for everything else it’s Pandas, NumPy/SciPy, Torch, and TensorFlow.

warlog · on June 8, 2021

Which .env fits better, exactly?

etangent · on June 8, 2021

your hunch is correct

pvaldes · on June 8, 2021

Perl has its own branch for bioinformatics. The Bioperl modules.

https://bioperl.org/

FabHK · on June 8, 2021

Next in this series: Embedded Systems programming with HTML/CSS, Computer Graphics with SQL, and Text Processing with Matlab.

anthk · on June 9, 2021

Or scientific data processing with Python.

adenozine · on June 8, 2021

As if research scientists' code wasn't unfun enough to read already!

psychometry · on June 8, 2021

Please don't. Most of the Python/R/C code I've seen from scientists was terrible. 100% of the Perl code was, though.