The wrong syntax notwithstanding, this doesn't let you recursively use querySelector(All), e.g. to find children of a node like document.querySelector("#foo").querySelectorAll(".bar")
But I think the OP's jQuery replacement is also dropping features in the service of a small footprint. So this was my 80/20 contribution to the "smallest jQuery replacement" problem ;)
I'm always surprised that an API that is defined by matching 0-n dom elements doesn't return a container that by default maps over them list monad style.
It feels like unifying it with the ASCII i is the mistake here. There should have just been 4 turkish characters in 2 pairs, rather than trying to reuse I/i
It's not like we insist Α (Greek) is the same as A (Latin) or А (Cyrillic) just because they're visually identical.
But even with separate characters, you aren't safe because the ASCII "unification" isn't just Unicode's fault to begin with, in some cases it is historic/cultural in its own ways: German ß has distinct upper and lower case forms, but also has a complicated history of sometimes, depending on locale, the upper case form is "SS" rather than the upper-case form of ß. In many of those same locales the lower-case form of "SS" is "ss", not ß. It doesn't even try to round-trip, and that's sort of intentional/cultural.
Uppercase ẞ exists since 2017, so before that using SS as a replacement was the correct way of doing things. That is relatively recent wh3n it comes tonthat kind of change
This stems from the earlier Turkish 8-bit character sets like IBM code page 857, which Unicode was designed to be roundtrip-compatible with.
Aside from that, it‘s unlikely that authors writing both Turkish and non-Turkish words would properly switch their input method or language setting between both, so they would get mixed up in practice anyway.
There is no escape from knowing (or best-guessing) which language you are performing transformations on, or else just leave the text as-is.
When Unicode was being specced out originally I guess. There was more interest in unifying characters at that stage (see also the far more controversial Han unification)
Uh-huh. At that time roundtrip compatiblity with all widely used 8-bit encodings was a major design criterion. Roundtrip meaning that you could take an input string in e.g. iso 8859-9, convert it to unicode, convert it back, and get the same string, still usable for purposes like database lookups. Would you have argued to break database lookups at the time?
There's nothing around the capability to round trip that through unicode that required 49 in ISO-8859-9 to be assigned the same unicode codepoint as 49 in ISO-8859-1 because they happen to be visually identical
There is a reason: ISO-8859-9 is an extended ASCII character set. The shared characters are not an accident, they are by definition the same characters. Most ISO character sets follow a specific template with fixed ranges for shared and custom characters. Interpreting that i as anything special would violate the spec.
Back in those days, people would store a mixture of ASCII and other data in the same database, e.g. ASCII in some rows, ISO-8859-9 in others. (My bank at the time did that, some customers had all-ASCII names, some had names with ø and so on.) If unicode were only mostly compatible with the combination, it wouldn't have been safe to start migrating software that accessed databases/servers/… For example, using UTF8 for display and a database's encoding to access a DBMS would have had difficult-to-understand limitations.
You can fix all kinds of bugs if you're able to disregard compatibility with old data or old systems. But you can't. And that's why unicode is constrained by e.g. the combination of a decision made in Sweden hundreds of years ago with one made in Germany around the same time. Compatibility with both leads to nontrivial choices and complexity, incompatibility leads to the scrap heap of software.
So, uh, is this actually desirable per the Turkish language? Or is it more-or-less a bug?
I'm having trouble imagining a scenario where you wouldn't want uppercase and lowercase to map 1-to-1, unless the entire concept of "uppercase" and "lowercase" means something very different in that language, in which case maybe we shouldn't be calling them by those terms at all.
My understanding is it's a bug that the case changes don't round trip correctly, in part due to questionable Unicode design that made the upper and lower case operations language dependent.
This stack overflow has more details - but apparently Turkish i and I are not their own Unicode code points which is why this ends up gnarly.
• Lowercase dotted I ("i") maps to uppercase dotted I ("İ")
• Lowercase dotless I ("ı") maps to uppercase dotless I ("I")
In English, uppercase dotless I ("I") maps to lowercase dotted I ("i"), because those are the only kinds we have.
Ew! So it's a conflict of language behavior. There's no "correct" way to handle this unless you know which language is currently in use!
Even if you were to start over, I'm not convinced that using different unicode point points would have been the right solution since the rest of the alphabet is the same.
yup. lowercase and uppercase operations depend on language. It's rough.
In some apis this distinction shows through - e.g. javascript's Intl.Collator is a language-aware sorting interface in JS.
In practice, the best bet is usually to try to not do any casing conversions and just let the users handle uppercase vs lowercase on their own. But if you have to do case-insensitive operations, lots more questions about which normalization you should use, and if you want to match user intuition you are going to want to take the language of the text into consideration.
Yeah, making a specific "Turkish lowercase dotted i" character which looks and behaves exactly like the regular i except for uppercasing feels like introducing even more unexpected situations (and also invites the next homograph attack)
I guess it's a general situation: If you have some data structure which works correctly for 99.99% of all cases, but there is one edge case that cannot be represented correctly, do you really want to throw out the whole data structure?
Indeed, the parent already gives one: flip_case(flip_case("ff")) = "ff". (Since it's hard to tell with what I guess is default ligature formation, at least in my browser, the first is an 'ff' ligature and the second is two 'f's.)
> it's just that Unicode is a total and complete clusterfuck
[...]
> When design-by-committee gives birth to something way too complex, insecurity is never far behind.
Human writing is (and has historically been) a "clusterfuck". Any system that's designed to encode every single known human writing system is bound to be way too complex.
I almost always side with blaming systems that are too complex or insecure by design as opposed to blaming the users (the canonical example being C++), but in the case of Unicode there's no way to make a simpler system; we'll keep having problems until people stop treating Unicode text as something that works more or less like English or Western European text.
In other words: if your code is doing input validation over an untrusted Unicode string in the year of our Lord 2024, no one is to blame but yourself.
(That's not to say the Unicode committee didn't make some blunders along the way -- for instance the Han unification was heavily criticized -- but those have nothing to do with the problems described by Schneier).
If you tried to come up with a “lightweight” Unicode alternative it would almost certainly evolve right back into the clusterfuck that Unicode is. In fact the odds would mean it would probably be even worse.
Unicode is complex because capturing all the worlds writing systems into a single system is categorically complex. Because human meatspace language is complex.
And even then if you decided to “rewrite the worlds language systems themselves” to conform to a simpler system it too would eventually evolve right back into the clusterfuck that is the worlds languages.
It’s inescapable. You cannot possibly corral however many billion people live on this planet into something less complex. Humans are too complex and the ideas and emotions they need to express are too complex.
The fact that Unicode does as good of a job as it does and has stuck around for so long is a pretty big testament to how well designed and versatile it is! What came before it was at least an order of magnitude worse and whatever replaces it will have to be several orders of magnitude better.
Whatever drives a Unicode replacement would have to demonstrate a huge upset to how we do things… like having to communicate with intelligent life on other planets or something and even then they probably have just as big of a cluster fuck as Unicode to represent whatever their writing system is. And even then Unicode might be able to support it!
How could you ever make it simple given that the problem domain itself is complex as fuck? Should we all just have stuck with code pages and proprietary character encodings? Or just have people unable to use their own languages? Or even to spell their own names? It’s easy for a culturally blind English speaker to complain that text should be simple, must be due to design by committee that it isn’t!
Unicode is worse than design-by-committee. It's a design-by-committee attempt to represent several hundred design-by-culture systems in one unified whole. Desgin-by-culture is even messier than design-by-committee, since everyone in the culture contributes to the design and there's never a formal specification, you just have to observe how something is used!
Could you try an argument that unicode is insecure compared to roll-your-own support for the necessary scripts? You may consider "necessary" to mean "the ones used in countries where at least two of Microsoft, Apple and Sun sold localised OSes".
It's even stranger (to me) that they picked an existing protocol's name, given that they never even define "RTP" in their paper.
If it's just arbitrary letters, might as well avoid the collision. If it isn't arbitrary letters, please, if you're describing the protocol, you should start by describing the damn acronym.
Agree, calling it rtp is likely to cause confusion. Probably best to choose a four letter acronym. Assume all the good three letter ones have already been used.
I think if you don't insist on them making sense as an acronym, there are still some interesting 3-letter combinations free (if you avoid the obvious minefields, that is).
E.g., I've never heard of qqq:// . There is a QQQ Trust apparently, but no network protocol AFAIK.
I'd like a more precise definition of "misinformation" before judging this proposal.
There is nothing bad in strengthening people's skills in the scientific method, educating them about cognitive biases and teaching them how do be aware about the sources and context of statements.
What worries me about the proposed method is that it does not work without any kind of objective sense of truth: Which statements you present as "truth" and which as "falsehood" is completely arbitrary - it just depends on the kind of weakened strawman argument you present. This makes it more a political propaganda tool that can be used to present your side's narrative as the absolute truth. And that feels sinister and downright authoritarian.
(Even if you're a True Patriot for whatever side you're on and believe manipulating people like this is just and moral for the cause: Keep in mind that the other side could use the exact same inoculation strategy against you)
I think we should acknowledge that currently the world is in a deep epistemic crisis: We're back in a cold war situation where there are (at least) two deeply incompatible explanations of the world's power structures, each one seen as the "obvious truth" by billions of people. The differences have become so irreconcilable that we've already arrived at a point of open war between the two sides.
I think this situation should give you pause when thinking about absolute (political) truths.
The “Truth” is a perturbation of existential reality. And the “truth” is a figment of mind. And “integrity” is the consistency between the two. Notice that “truth” does not need to be identical to Truth, only an approximation that does not contradict.
Information is the removal of uncertainty, or the resolve of potential (as distributed among probabilities.) Notice potential and probability are different aspects of the same thing. Potential being an existential phenomena and probability as a mathematical construct.
Science is solely concerned with aligning the mental model of truth with existential Truth. Anything otherwise is the same cult worship as any branch of speculation (even constructive). Science is a guide, it is not the solution for practical understanding (that would be deductive and inductive reasoning.)
Misinformation, ignorance, and confusion are the DEFAULT STATE OF MIND (in a universe governed by entropy, that’s “potential distribution”, not “number of states”, as states are a measured resolve not the possibility of resolve.)
Can people be inoculated against misinformation? Only through self doubt, and doubt of anything that cannot be reliably measured as a resolve.
Undeceive the self! We are all ignorant, confused, and uncertain first, even the wise!
I know, this is the standard VC playbook: First growth at all costs to get everyone into the ecosystem, then pull up the net and monetize. The latter phase is usually when all the subscriptions, value-add nag screens, data sharing agreements and other enshittification goodies pop up.
And how would it work otherwise? You can't perpetually offer a product for free (in all senses of the word) AND satisfy exponential ROI expectations at the same time.
If this is what's going on here, I'm worried what the "monetization" phase will entail.
The other option I see is being owned by a behemoth like Microsoft, in VSCode case, who can pay millions per month in engineering salaries and PaaS, while keeping it free.
They are so big they can monetize it using Copilot or not even monetize it properly, just to get good faith from Developers, Developers, Developers.
No, the other option is something like Sublime Text that has a small team working on it and is paid for by a very fair one-off payment by the customer.
I wouldn't exactly call a ytdl-style media downloader with a whole library of site-specific extractors and converters "dumb" but still cool that more projects like ytdl exist.
If it's instantly released, then yes. But in this thread are reports where the offensive actions happened 15 years ago. After such a long time of "good behavior" it makes no sense for me to still keep the domain blocked/downranked.
Honestly, these days, with domains in general being nearly free compared to the profit potential of a single successful spammer grift, I’m not sure I even see the point of blacklisting domains at all. 25 years ago maybe a spammer would be devastated that he had to “start all over and buy a new domain and build up its reputation.” Now, spammers launch and abandon what, a million new domains a day? Google or anyone spitefully holding onto hard feelings about what a domain “did” years ago is pointless because the spammers will move on anyway. They wouldn’t reuse abcqwertuiop26abc dot xyz anyway because it’s safer to make up a new gibberish domain anyway. Only people who acquire domains legitimately are hurt by this.
I would want to experiment judging them based on what they’ve been seen to do in the past month.
I'm imagining/advocating for blacklisting them for say, 12 months, and re-evaluating them at that point. This imposes the identical cost on the spammer as now (each "detection" costs them a year's domain registration) while allowing a reputation "reset" for innocent people who acquire haunted domains.
Yes, the spammers can sit on their domains once blacklisted, renew them, and redeploy their spam on them 12 months later, but they'd have nothing to gain from the reuse, since the names of their domains are just nonsense anyway.
I’m guessing that would complicate blacklist maintenance quite a bit, which is why we aren’t seeing it work that way.
Most of these blacklists (at least initially) were emergency type measures - ‘block these spammers’, then move on with life.
Blacklist maintainers would need to maintain date first seen/date last seen info, and purge/re-add correctly.
Technically, seems like an ‘append only’ type thing is what they’ve been doing for the most part.
As this evolves and the idea that these do need some kind of expiration or we end up with more maintenance headaches becomes more widely known, maybe eh?
Or if there is some kind of legal rules around it.
The Ignobel prize in literature the police got awarded was a nice touch.
I still wonder how their DB was set up to accept this data in the first place. It makes sense to allow a person to be associated with multiple addresses - people move, sometimes a lot - but a person should not under any circumstances have multiple DoBs, should it?
(Unless I missed "Falsehoods programmers believe about personal data: People are born only once" or something)
Parents did not want the baby, so they left it at the door step, date of birth was not known, so some was assigned and used in some legal documents. Later, original parents changed their minds, real date of birth became known.
(For sanity sake, I would just say choose one or flip a coin and be done with it, but at the same time I could imagine that some layer could take my sanity into account)
A person can't, but there can be multiple people with the exact same name, with different birthdays (or even the same!) so DoB isn't guarantee to be unique without some other identifier.
> “Do we want to make it easy to remove that water bottle, or that mic? Because that water bottle was there when you took the photo,”
I'm actually more concerned about the amount of hubris apparent in that conversation - and about the ease with which they discuss making certain actions deliberately more difficult, just because they have an opinion about them.
Maybe they need a reminder that they manufacture tools - which their users employ to put their intentions into reality.
I don't really want to have a philosophical debate with my microwave whether or not heating this particular fast food item is really a good for my health, for the environment and for society in general when I just want something for lunch.
In contrast, they seem to see themselves as policymakers who want to decide what kind of intentions their users should have in the first place.
> Maybe they need a reminder that they manufacture tools - which their users employ to put their intentions into reality.
Tools are manufactured with a purpose and intention in mind. It is perfectly reasonable and desirable for the creator of a tool to think about how it is going to be used.
Alfred Nobel invented dynamite and was dismayed by its use for war. Ethan Zuckerman apologised for the pop-up ad. John Larson invented the polygraph then spent four decades rallying against it. John Sylvan regretted having invented coffee capsules because of their environmental impact.
If anything, we need more consciousness put into the creation and distribution of tools, not less. Take responsibility for what you make.
reply