I would be in favor of this idea if a little more thought was put into how the hashing function works.
As it stands, someone typing in:
food in Chicago
will get a different URL than:
food in Chicago
And the same goes for: Chicago food, chicago food, food near Chicago, etc.
Every one, with a single character difference (extra space, different word order, capitalization difference, regional spelling like theatre vs theater, etc) will result in a different hash.
You've now made 'humanized URLs' into 'no one will guess your domain'.
It's an interesting approach to avoiding search engines, but it doesn't solve the problem that search engines do solve: multiple similar but different entries resulting in the same "appropriate"/top website result.
With this approach, not even face book, Facebook, and facebook would result in the same .com (and please don't suggest just purchasing a billion domains and redirecting them all).
You could add a normalizing step before hashing in the extension, similarly to what's usually done to email addresses typed in by users:
- Remove duplicate spaces and punctuation
- lowercase entire query (just like DNS)
- Detect and normalize homographs (is this a impossible problem, or are there solutions out there already?)
Are you sure? I just pasted both into my UTF8-encoded Linux terminal running `od -tx1` and got the same hex octets. (Also tried in Chrome's JS terminal "<paste string>".split("").map(function(s) { return s.charCodeAt(0); }) ) It's quite probable I don't know what I'm doing, so I'd appreciate knowing how to do it properly.
Also, since these strings would be typed, I'm not sure the homograph attack applies. Why would someone slip in a Cyrillic letter or something while typing the URL themselves? If extended to clickable links that displayed the pre-hash text, I could see the issue, but pudquick specifically said "someone typing in" the two URLs.
As it stands, someone typing in:
food in Chicago
will get a different URL than:
food in Chicago
And the same goes for: Chicago food, chicago food, food near Chicago, etc.
Every one, with a single character difference (extra space, different word order, capitalization difference, regional spelling like theatre vs theater, etc) will result in a different hash.
You've now made 'humanized URLs' into 'no one will guess your domain'.
It's an interesting approach to avoiding search engines, but it doesn't solve the problem that search engines do solve: multiple similar but different entries resulting in the same "appropriate"/top website result.
With this approach, not even face book, Facebook, and facebook would result in the same .com (and please don't suggest just purchasing a billion domains and redirecting them all).