More

tashbarg · 2024-10-19T09:43:47 1729331027

You can certainly not produce inversions. The data that is left in the hash is not enough to produce anything vaguely photorealistic.

However, you can fill the gaps and generate photorealistic photos that fit to the extremely reduced information you get from the hash. You are generating believable (as defined by the training data) photos that fit the hash.

That’s a huge difference.

Statements like yours are extremely dangerous. Without proper understanding of what GenAI can and can not do, people start relying on things that are not there.

Imagine your photorealistic inversion AI putting a mole or a wrinkle in the face of somebody without any foundation in the actual hash. Just because it fits better to the trained data. Explain that to the judge, when the person with just the right facial features sits in front of them.

fasa99 · 2024-10-20T14:21:29 1729434089

>Imagine your photorealistic inversion AI putting a mole or a wrinkle in the face of somebody without any foundation in the actual hash. Just because it fits better to the trained data.

Seeing as AI was trained on 99999999999999 images of 9999 people, if the image in question is of one of those people, it's well conceivable that the AI will implicitly ID the person and attach their corresponding mole. Or in other words, it's possible a good portion of PhotoDNA's database is in the AI training set, so in principle there are cases where the AI does know.

tashbarg · 2024-10-21T11:46:25 1729511185

There are only 144 Bytes in a PhotoDNA hash and they are used to identify the whole picture. This is definitely not enough data to identify a face reliably.

The proposed AI does not identify people and it will not report that it "found" the person in the training data. It does not know. And it won't tell you.

Assume twins, one is in the training data, one isn't. The one in the training data has a scar, the other one does not. We "invert" a picture of the twin without the scar and who is not in the training set. As you explained, the resulting image will have the twin from the data set including a highly detailed picture of the scar. And for some reason, that is a good thing.

You are attributing more to this AI than it conceivably can do. Even going as far as finding an excuse for putting false or unfounded data.

It is tremendously important to make clear: most (if not all) of current AI technology is not fit for forensic analysis beyond guiding humans in their own analysis.

jasonvorhe · 2024-10-19T09:53:15 1729331595

This modern narrative of people posting their opinions or assumptions somewhere being "dangerous" because someone could just believe it is much more dangerous because it can be applied to any opinion anywhere that was ever published.

No judge will ever rule on something based on a comment they read in the Internet.

tashbarg · 2024-10-19T20:41:33 1729370493

Judges usually rely on experts in forensic science who, of course, are infallible and absolutely not influenced by what they read online during their day.

https://innocenceproject.org/misapplication-of-forensic-scie...

It is dangerous to push the narrative that GenAI can "put information back" where it was once removed. Especially dangerous, because most GenAI is built to put something there that is extremely believable. And while an innocent comment on HN might not play the biggest role, the linked project claims exactly what it can - by definition - not do ("a PhotoDNA hash can be used to produce thumbnail-quality reproductions of the original image") and it looks scientific, too.

ogurechny · 2024-10-20T11:21:55 1729423315

You have already assumed that “judges” are somehow better suited to make such decisions than “regular people”, even though they are simply cogs in the wheels of social machines, and will mostly automatically approve anything up to mass murders if “general direction” of the society is like that. But it's convenient for you to believe that they have certain qualities.

Needless to say, when people are so brainwashed that they are ready to pray to actual machines, decisions of those machines won't be questioned. It would just be inconvenient.

zaptrem · 2024-10-20T02:06:35 1729389995

To be clear, my use of the word “photorealistic” instead of “accurate” was very intentional.

stogot · 2024-10-19T16:23:51 1729355031

Hold my beer: symmetric flip. Flip the photo horizontally and it’s essentially the “same” image without a hash collision

tashbarg · 2024-08-25T20:46:36 1724618796

MD5 is “broken” as a cryptographic hash function. It still is perfectly fine as a non-cryptographic hash function.

Vecr · 2024-08-25T21:01:08 1724619668

Not really, it's slower than truncated blake3 for no gain and much loss.

bigiain · 2024-08-26T02:42:59 1724640179

There's some gain to be had in that I can reliably expect md5 to be available and compatible with pretty much anything back as far as Perl4 or PHP from the 90s, right up to bleeding edge version of Rust or Clojure or exotic language de jour.

Whether that's actually worth anything for a particular use case is a good question, and the answer will mostly be "not just no but HELL NO!"

zerodensity · 2024-08-25T23:37:50 1724629070

But is it slower than sha1? Which is the alternative if you don't roll your own in V8.

Vecr · 2024-08-25T23:55:16 1724630116

About six times faster compared to sha1. Depends on the hardware/cache environment.

slaymaker1907 · 2024-08-25T21:35:06 1724621706

Yeah, if you really need non-guessability, you should be using the version that’s completely random anyways.

ozim · 2024-08-25T21:56:11 1724622971

If you rely on non-guessability you use it as a security measure? So your sentence doesn’t invalidate previous poster.

tashbarg · 2024-06-22T14:11:06 1719065466

Zebras come in colors? Ours are all kinda monochrome.

tashbarg · on March 7, 2023

So ... your solution is no tfa?

Putting second factor material in password managers is terrible advice. For reasons unknown to me, it might be the right solution for you. But in general, it defeats the two factor authentication purpose if you reduce the factors again to knowledge alone.

The whole point of tfa is, that the second factor is something you possess and not something you know (which is the first factor).

ydant · on March 7, 2023

There are multiple attack vectors that 2-factor helps with, and storing your 2-factor alongside your password does still help in some, just not all.

For the more common attacks I expect to encounter, namely a single password being leaked, a password manager is still based on something I "possess" (to an extent) - the decrypted password vault. It's separate from the single password that's likely to have been compromised in the most common scenario.

Of course, if my whole vault is compromised, then yes, storing my 2-factor in there made my life worse than the alternative. I just don't see that as anywhere near as likely a scenario as an individual account being compromised. Having 2-factor enabled in a less secure method is still better than not having 2-factor enabled at all.

Basically, there's nuance to this, it's not the extreme you present - a more in-depth comment on this: https://security.stackexchange.com/questions/150448/is-it-se...

wordyskeleton · on March 7, 2023

You're assuming a compromised password == compromised 1Password vault which is clearly not going to be the case most of the time

tashbarg · on Dec 30, 2022

It makes perfect sense if you consider the right abstraction. TCP connections are streams. There are no packets on that abstraction level. You’re not supposed to care about packets. You’re not supposed to know how large a packet even is.

The default is an efficient stream of bytes that has some trade-off to latency. If you care about latency, then you can set a flag.

Ferret7446 · on Dec 30, 2022

There is no perfect abstraction. Speed matters. A stream where data is delivered ASAP is better than a stream where the data gets delayed... maybe... because the OS decides you didn't write enough data.

The default actually violates the abstraction more because now you care how large a packet is, because somehow writing a smaller amount of data causes your latency to spike for some mysterious reason.

avianlyric · on Dec 30, 2022

> A stream where data is delivered ASAP is better than a stream where the data gets delayed

That depends on your situation, because as you say no abstraction is perfect. Having a stream delivered “faster” isn’t helpful if means your overhead makes up 50% of your traffic, exactly what nagle avoids.

Nagles algorithm is also pretty smart, it’s only going to delay your next packet until it’s either full, or the far end has acknowledged your preceding packet. If your got a crap ton of data to send, and you’re dumping straight into the TCP buffer, then Nagle won’t delay anything because there’s enough data to fill packets. Nagle only kicks in if you’re doing many frequent tiny writes to a TCP connection, which is rarely a valid thing to do if you care about latency and throughput, so Nagles algorithm assuming the dev has made a mistake is reasonable.

If you really care about stream latency, then UDP is your friend. Then you can completely dispense with all the traffic control processes in TCP and have stuff sent exactly when you want it sent.

tashbarg · on Dec 24, 2022

I guess parent is focusing on the point, that PDFs can render as perfectly human-readable documents, but can be completely non-machine readable at the same time.

tashbarg · on Dec 7, 2022

A modern 4GHz CPU is not only 40 times faster. It is a few thousand times faster than a 100MHz CPU back from the days. Probably not 20,000, but at least 2,000 times faster seems reasonable.

And responsiveness back then was so good, because your program was very close to hardware with very little in between if not running completely free from OS abstractions.

jameshart · on Dec 7, 2022

Can you show your working on this? Because a 100MHz CPU can do 100,000,000 things a second, and a 4GHz CPU can do 4,000,000,000 things a second, and if my math's right, that means the 4GHz CPU can do 40 times as many things a second at the 100MHz CPU.

Now, you might argue 'the 4GHz CPU is multicore!', and so sure, maybe we're up to 8 times 40, which is, I'm pretty sure, 320. And maybe you'll say that the cache is bigger, so you'll be able to keep the data pipelines full and get more done on the faster CPU. But how are you getting to 'at least 2,000'?

tashbarg · on Dec 7, 2022

Sure. I'll oversimplify a lot, but the feeling of how things work should be correct.

The clock frequency is not a good way of measuring performance. Never was. Even earlier designs as the 8086 did not do one thing (instruction) every cycle. They did far less.

Modern CPUs are extremely complex beasts that can take in a lot of instructions. They take a good look on those instructions, change them in a way that does not alter the result but makes some optimizations possible and then distribute those instruction to a bunch of internal workers that can work on those at the same time. More on this can be found in the wikipedia rabbit hole starting with instruction level parallelism.

One way to measure this is to look at how many of a selected set of instructions per cycle can be done. An 8086 could do 0.066. A 386DX did 0.134, a 486 could do 0.7. A Pentium 100 already could do 1.88, and so on. Modern CPUs get to 10, per core.

But wait, there's more. This comparison gives only a very rough idea of a CPUs capabilities since it focuses on a very specific thing that might have little to do with actual observed performance. Especially since modern CPUs have extremely specialized instructions that can do enormous amount of computations on enormous amounts of data in little time. And there we are in the wonderful world of benchmarks that may or may not reflect reality by measuring execution time of a defined workload.

Passmark does CPU benchmarks and their weakest CPU in the database seems to be a Pentium 4 @ 1.3GHz. Single Core, single thread. It comes in at 77 (passmarks?). An i7-13700 is rated with 34,431. Does that make it 500 times faster than the 1.3GHz P4? Hard to tell, but its a hell of a difference. And from the P4 to a Pentium or even a 486 running at 100Hz ... at least another hell of a difference.

We can also try Dhrystone MIPS, another benchmark. Wikipedia has - strangely enough - numbers for the Pentium and the 486 at 100MHz: 188 MIPS for the Pentium, 70 MIPS for the 486. The most modern (2019!) desktop cpu entry comes in around 750,000 MIPS. A Threadripper from 2020 over 2,300,000 MIPS.

So, how much more can a modern CPU do than an ancient one? A lot. And especially a lot more than you would expect from the faster frequency alone. Even with only one core, it can do several hundred times the workload. And we got a lot of cores.

seized · on Dec 7, 2022

While it's harder to calculate, that 4Ghz CPU comes with vastly faster RAM, busses, and disk. Not many 100 MHz systems around with NVMe or even SATA...

ChuckNorris89 · on Dec 7, 2022

>your program was very close to hardware with very little in between if not running completely free from OS abstractions

This! It also meant that it was very very easy for any program or misbehaving driver to completely crash your system. Not to mention all the security implications of every app having direct hardware access.

Dylan16807 · on Dec 7, 2022

But when I go look at my text editor being slow, I can see that the amount of CPU time spent dealing with the kernel is less than a tenth of it. So that's not the reason.

tashbarg · on Dec 7, 2022

That's not how latency or responsiveness works.

Dan Luu did cool experiments with input lag (https://danluu.com/input-lag/).

Dylan16807 · on Dec 7, 2022

It's a much better estimate than hand waving about memory isolation.

If we want to talk about how things work directly, my program can get things to the GPU in far less than a millisecond. The safety layers are not the problem.

tashbarg · on Aug 19, 2022

Mathematicians do not have funding for „large scale“. A 10-year old mid-range server is exactly the kind of system I would expect Magma to run on in the average case. Perhaps even just a desktop pc.

Source: worked with algebra researchers using Magma.

czbond · on Aug 19, 2022

I was being a bit facetious, but not by much. Maybe because they're mathematicians and had found a theorem - but a pen tester wouldn't have.

It costs less than a few hundred bucks to do numerous, multi compute AWS server spot instances for cracks on large dictionaries with large hash rates, on random seed password lists (where each password has it's own seed).

If it was trying to crack a quantum-safe where by design the classical computer shouldn't be able to even solve it (except for potentially with a theorem hole) - you'd think they'd start higher.

tashbarg · on July 10, 2022

Back of the envelope: for 20-30 kWhs of capacity less, you put in an ICE that can burn gas for 10kWh/kg.

tashbarg · on June 1, 2022

No, it does not. Not _this_.

Erlang has, however, excellent support for distributed computing using its own kind of processes.