'sha1:' (0x736861313a) and other self-describing labels can be represented in he...

stouset · on April 4, 2015

So... it's readable in one out of dozens of possible encodings? And realistically, in that encoding (ASCII), the part following the readable prefix is unreadable garbage. In exchange for this convenience, you add a four bytes of unreadable prefix instead of one.

I'm not sure I see the point.

I deal with outputs from crypto functions on a daily basis. Never once have I intentionally rendered it as 8-byte ASCII. It's most often in hex. Mixing and matching encodings (e.g., prefixing a hex string with an ASCII string) for a single blob of data is silly and just causes headaches whenever you need to change encodings, either by having some data double-encoded or by having to specially handle the prefix separately.

gojomo · on April 4, 2015

My hunch is that the overwhelmingly dominant and important use case is where these identifiers appear in URLs, including URL fragments (and potentially in brand-new protocols). A few other important cases are also where they're visible to people, as in the example (or hypothetical) command-lines that kicked off this thread. In those cases, explicitness-at-a-glance helps, as will having one canonical encoding (such as b32, b64, or b58).

Compared to that, adaptation to other constrained systems is a case-by-case issue. And if those systems are already capable of squeezing in these non-native slightly-longer hash-like-strings, then a few more bytes usually won't hurt, or if they do then whatever deep-in-the-bits coder (like yourself) who's shoehorning things in can handle compactification. The display/exchange format should be as casually readable as possible.

Such an ASCII-name-inspired prefix is readable in all encodings. In some, it's just a magic number (but one that any coder can make educated guesses about); but in the one encoding that's likely most important for mutual comprehension between humans (user-exchanged strings and URLs), it's super-duper-readable.

And yes, the prefix should in general have special handling: as the multihash project README notes in an important "warning", the prefix lacks the same distribution as the other bytes. Treating the whole thing as an opaque-but-still-reliable identifier invites indexing misoptimizations right off the bat, and then other later bugs, if any of the hash functions become deprecated, or a new hash is added with variant semantics.