I work with an extremely effective machine learning engineer, and the biggest thing I've learned is how far you can get with vibes, even in a more traditional ML situation.
He invests most of his time visualizing the inputs and outputs of his systems very carefully, instead of focusing super heavily on metrics; this is a lot more effective than I thought it would be, particularly in the early stages of a project.
Measuring things correctly is hard. Spending time measuring things before you know what should even be measured/how it should be measured is almost surely to be a wasted effort early in a project.
That's just the default. You can set max_seq_len to 8k. From the readme [0]:
> All models support sequence length up to 8192 tokens, but we pre-allocate the cache according to max_seq_len and max_batch_size values. So set those according to your hardware.
Tooling around embeddings has improved. Creating and fine-tuning custom embeddings for your tabular data should be easier and more powerful these days.
Not only is it not constant time, it's not even polynomial - it's psuedo-polynomial. Also it'll fail on negative numbers, right? You'll need something like `10000 * log(time + min(time) + 1)` to be linear in the bits used to represent the inputs.
> In computational complexity theory, a numeric algorithm runs in pseudo-polynomial time if its running time is a polynomial in the numeric value of the input (the largest integer present in the input)—but not necessarily in the length of the input (the number of bits required to represent it), which is the case for polynomial time algorithms.
> Not only it not constant time, it's not even it's polynomial
You understand that this is part of the joke, right?
If we really want to get down to the details and kill the joke, then you don't actually need to wait real time. Computational complexity is concerned with steps in a computational model, not how much time passes on a clock. Sleep sort uses OS scheduler properties and in a virtual time environment, time advances to the next scheduled event. That's what brings you back to actual polynomial complexity, if you assume this kind of thing as your computational model.
> - it's psuedo-polynomial.
If you lecture people then please at least get your spelling right.
I still don't get it. Sleep sort still needs O(n) Haskell-steps, while for example sorting in bash is just 2 bash-steps, calling it, and receiving the output.
I fail to see the joke, really. I only see false and nonsense statements, which still could be funny or interesting, but I don't see how?
The "constant time" is wall clock time, where it will be at least the biggest value times the constant microseconds plus like 5% for overhead like the garbage collector.
Complexity analysis deals with asymptotic cases where N gets arbitrarily large. Sleepsort would take the same wall time to sort [1,2] and [1,2,1,2] - but it would presumably take somewhat longer to sort an array of 1e10 ones and twos, because it's not a constant time algorithm.
On the joke part, sleepsort is intrinsically an extremely funny concept and I think everyone here gets that. But "constant time" has a rigorous/pedantic definition, which sleepsort doesn't meet, so I think for some readers calling it that kills the joke (in the same sort of way that it would kill the joke if TFA's code snippets used lots of invalid syntax).
I'd never seen sleepsort before, so I thought it was funny. ;-)
I like the idea of (ab)using the scheduler to sort numbers.
Now I'm inspired to make my own "sorting" routine, maybe touching filenames and then sorting them with `ls -n`, or rendering triangles and then ripping them off in z-plane order, or stuffing frames into a video and then playing it back.
Surely any PP problem can be done in P, it's just a matter of encoding? As in if you give me a box that computes whatever in PP, then I can give you back a box that takes a single input consisting of 1s to the length of each value delimited by 0s, turns those back into integers (linear), uses your PP box, and returns its output - but now it's polynomial in the length of my input?
(I said integers but I don't think that's significant, just a matter of encoding scheme - we can use a single 0 for decimal point and two for delimiting inputs say.)
But anyway isn't the joke that sleep-time doesn't count, because the computer could be doing something else? It's actually quite compelling in a 'this is stupid, I love it' sort of way.
You can indeed make the distinction between polynomial and pseudo-polynomial stop existing by enforcing the inputs are in unary. But you haven't made anything faster, you've just made almost all of them worse.
Some ways to make recipients feel more comfortable:
- You can suggest some other contribution. "Would you mind bringing snacks? / Would you mind handling music on the drive? / Would you mind giving X a ride?"
- You can allow them to reciprocate in less expensive situations, like taking the check when you are at a cheaper place.
True in this situation, but note that intermediate activations and gradients do take memory and in other contexts that's the limiting factor. For example purely convolutional image networks generally take fixed-size image inputs, and require cropping or downsampling or sliding windows to reach those sizes - despite the convolution memory usage being constant for whatever input image size.