Hacker News new | past | comments | ask | show | jobs | submit | more sweezyjeezy's comments login

Example: the Crab Nebula was the result of a supernova that was seen by Chinese astronomers about 1000 years ago: https://en.wikipedia.org/wiki/Crab_Nebula


One approach I've made work a couple of times is to pull all of the data going in and out of a system for a time period, and rerun it after the refactor. This gave me a lot more confidence than unit tests.


100% - our daily stand-ups are a waste of time usually, but we have fantastic team camaraderie and I think us shooting the breeze for 5 minutes every day is a huge part of this.


If your stand-ups are a waste of time, consider making it less status-y and more collaborative. We lengthed ours a bit and encouraged more dialog rather than just focusing on the traditional 3 questions (what did you work on since last standup, what are you doing before next standup, and where are you blocked) and it's been much more productive.


> While cosine similarity is invariant under such rotations R, one of the key insights in this paper is that the first (but not the second) objective is also invariant to rescalings of the columns of A and B

Ha interesting I wrote a blog post where I pointed this out a few years ago [1], and how we got around it for item-item similarity at an old job (essentially an implicit re-projection to original space as noted in section 3).

https://swarbrickjones.wordpress.com/2016/11/24/note-on-an-i...


Interestingly it used to be quite standard with 'small' language models to use a search algorithm to render a full block of text, the most basic being beam search. Then you can get better with more processing power to do a wider path search. This is not what OP is talking about, it just means generating a larger number of candidate continuations. However it's not necessary or optimal for newer LLMs, because it tends to siphon the LLM into quite generic places, and it can get very repetitive.


Nope, this definitelly fills a few gaps, thanks. I'm still too lazy of thinking about this whole O(n) time thing even though I'm constantly wondering whether "more" or better results could be achieved by throwing CPUs at stuff, hahaha. I rarely think in terms of time in general, just about depth, breadth and clarity.


re: caching python functions, is it referring to functools.lru_cache maybe?


Almost certainly. It's just called `functools.cache` now. It's also known as memoisation and a classic use of higher-level functions (aka decorators).


And memoisation is a common technique to shift performance (processing -> memory) tradeoffs when doing anything trivially recursive.

I don't agree with TFA's take of "Nor does it make any sense to have caching at the function level?"


Yeah but try getting e.g. Dall-E 3 to do photorealism, I think they've RLHF'd the crap out of it in the name of safety.


That's not safety, the safety RLHF is because it tries to generate porn and people with three legs if you don't stop it.

It has the weird art style because that's what looks the most "aesthetic". And because it doesn't actually have nearly as good enough data as you'd think it does.

Sora looks like it could be better.


well that's what you get with closed ai.


That's why we need open AI which scoops up all the data with its specific contexts and history and transforms it into a vast incomprehensible machine for us peons to gawk at while we starve and boil to death


low quality discourse imo


I don't think this will refute the article, but I would have found it more convincing if the benchmark had included a single setitem assignment as well, to be sure that the difference wasn't python doing a lazy dict-or-set type assignment on {}


I've also had this thought, but found that inspecting the type shows its by default a dictionary, and that it only is interpreted as a set if you treat it as such (eg add comma-seperated elements when instantiating)

assert type({}) == type(dict())) assert type({1,2,3} == type(set())


FWIW I think df.column.value_counts() is better to use here in pandas.


It unfortunately also exceeded available memory.

A basic approach which worked was sequentially loading each df from the filesystem, iterating through record hashes, and incrementing a counter; however the runtime was an order of magnitude greater than my final implementation in Polars.


I'm not sure that screen-shotting the image will work FWIW - any rescaling interpolation in rendering the image on the page or loading it for a model will likely reduce or nullify the effect.

Also these perturbation based adversarial attacks are often model specific. You take the model's gradient at each pixel and iteratively perturbate the image to make it more and more confident that it's e.g. a cat.


So what you're saying run your images via a filter/resizer before feeding them into your AI.


haha that's an interesting point! Recompress as a jpeg maybe.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: