What's wrong with creating a function that does those things? It would be less s...

modeless · 2024-03-29T15:52:41

What's wrong is it obscures simple math expressions behind tons of dots and parentheses. The thing is that the core of deep learning algorithms is usually very simple math. It's useful to be able to translate that math directly from research papers into straightforward expressions that mirror the structure in the paper like a = b / c + d * e rather than something less similar like a = b.divide(c).add(d.multiply(e)).

throwaway4aday · 2024-03-30T01:02:59

Depending on what you learned first, dots and parentheses are a lot simpler to understand than math expressions.

DanielHB · 2024-03-30T07:58:50

You could probably build tagged template literal like:

a = e`${b} / ${c}`

Not ideal, but much better and without magic pre processors

QuadmasterXLII · 2024-03-29T11:03:31

In python when I try a new gpu accelerated array library, to write norm_arr = arr / lib.sqrt( lib.sum( arr*2, axis=-1, keepdims=True) , I have to read documentation for sum to see whther they use axis or dim in sum.

In javascript, to write the same thing I need to read documentation for a sum function, a broadcasted division function, a multiply function. I can probably assume that the sqrt function behaves and is named as I would expect

throwaway4aday · 2024-03-29T14:30:03

If each of those operators were implemented as functions then you'd have different names for different implementations in order to avoid confusion over what type of division or multiplication they were performing. It's more verbose but that's a good thing since it prevents you from making incorrect assumptions about what's going to happen when you do a * b.

4hg4ufxhy · 2024-03-29T13:26:13

Why can you make assumptions about operation overloads but not functions?

foolswisdom · 2024-03-29T14:03:20

Because there is nothing to make assumptions about. In the example code, both multiplication and division have a scalar on one side, there's no possible ambiguity of behavior. But there is the eternal question of terminology: do you specify dimensions by "axis" or "dim" and does your API actually use both terms in different places?

(that's what I think the GP meant, anyway).