That, or they were skeptical in dangerous ways. I'm thinking of the example of Richard Feynman questioning if you should brush your teeth, which although is valid to scientifically challenge. There's other examples of "smart people", especially in their own domains openly applying their technique to other fields for the detriment of observers.
In my mind it's important to have both, as long ad the main issue is made clear and nitpicks are truly nitpicks (and thus acceptable in limited quantity to not block a change).
Otherwise you'd have a "bombshell" to refactoring, then on the 2nd review pass a handful of nitpicks that you could have addressed during the initial refactor.
Wow, I've never seen Lotus Improv but so far there's a lot to love.
If you're not aware, excel has something similar to the formula system if you use tables, and in combination with LET and LAMBDA it's also pretty pleasant.
I've got the rest of the video to watch, but thanks for your comment!
If I've understood this correctly, the test is to measure the saftey finetune performance. These commercial models have been finetuned so that they are "safe", and safe models should not blindly quote what they are told.
Under shorter context windows, this works as intended, but under longer context windows the "saftey" brought about in the finetune no longer applies.
I often to this manually, using Handbrake before uploading screen recordings.
Screen recording software trade performance for file size, so reencoding it afterwards can result in much smaller file sizes. I've personally had recordings I ~100MB reencode to ~10MB.
It's not about IP reputation, but the bank detecting a payment in one country followed by a login in another. This is exactly what the bank would see if someone stole your card.