Let me say first of all that I'm glad they're working on standardizing this. When making REST APIs, I find HTML form scaffolds incredibly useful, but it means that you probably have to accept both JSON (because JSON is reasonable) and occasional form-encoding (because forms), leading to subtle incompatibilities. Or you have to disregard HTML and turn your forms into JavaScript things that submit JSON. Either way, the current state is ugly.
Here's the part that I don't particularly like, speaking of subtle incompatibilities:
I've seen this ugly pattern before in things that map XML to JSON. Values spontaneously convert to lists when you have more than one of them. Here come some easily overlooked type errors.
I don't know of any common patterns for working with "a thing or a list of things" in JSON; that kind of type mixing is the thing you hope to get away from by defining a good API. But all code that handles HTML JSON is going to have to deal with these maybe-list-maybe-not values, in a repetitive and boilerplatey way.
I hope that a standard such as this will eventually be adopted by real-life frameworks such as Django REST Framework, but I also hope that they just reject the possibility of multiple fields with the same name.
Then add attributes to explicitly give the structure. Since this is a browser spec we can add attributes. Here's the same form but with some attributes giving the explicit encoding.
PHP handles this by having lists like this by requiring you to use an I put name like "bottle-on-a-wall[]". The brackets indicate that it should be a list. I don't hate this convention...
It's pedantic, but it's a tad clunky now: I'd use something like "wow/such/deep[3]/much/power/!" to differentiate child nodes from array indexes, and also strongly advise against a node named "!" on principle. Of course then you'd say "but I want '/' in my node name!", and then you get to bikeshed an escape sequence, but you'd need that anyway for a node name with '[' or ']' in it.
However, what they've done is workable if all the browser vendors implement it.
square brackets are a reasonably common way to access data in an hash-like object. both Javascript (objects) and PHP (associative arrays) use/allow square brackets for this.
I don't think multiple fields with the same name is designed to be a normal mode of operation. If you actually want an array, use the [] suffix.
But if you have a bug which produces a form with multiple fields with the same name, what should it do? I can think of two strategies: last one wins and the proposal.
Both are probably going to lead to things going wrong in the backend. At least with the proposal you:
A) can detect it with boilerplate that looks for arrays where you expect primitives.
B) figure out what the original data was if you need to try and manually repair it.
I'm sure we'll see lots of bad tutorials suggesting you just use the same field name to make an array though :(
do any server-side environments currently present an array/array-like structure from request parameters with duplicate names (and specifically without any kind of array indicator, like name[] in php requests for example)?
Sure, yes. In Python land for example Django and Flask both present the request parameters in a dictionary-like data structure that supports lookup of the complete list of values passed for a given name. Regular lookups will give you just a single value, but they also present an interface to get the complete list.
In fairness, you have to look at how standards get somewhere - this is an editor's draft which is a starting point of an idea rather than a done deal. Don't be surprised if the final product winds up being significantly different than this - even better, get involved in the conversation to make it what we need. That's not to pour cold water on it: It's good as it is, but there are changes which potentially help explain the magic of participating in form encoding and submission which may be better and allow more adaptation and experimentation over time.
Why such an emphasis on "losing no information" when the form is obviously malformed?
You need only to look at the crazy ways in which MySQL mangles data to realize that silently "correcting" invalid input is not the way to go. The web has suffered enough of that bullshit, we seriously don't need another. Example 7 (mixing scalar and array types) gives me shudders. Example 10 (mismatched braces) seems to have a reasonable fallback behavior, though I'd prefer dropping the malformed field altogether.
If the form is obviously malformed, transmission should fail, and it should fail as loudly and catastrophically as possible, so that the developer is forced to correct the mistake before the code in question ever leaves the dev box.
Preferably, the form shouldn't even work if any part of it is malformed. If we're too timid to do that, at least we should leave out malformed fields instead of silently shuffling them around. Otherwise we'll end up with frameworks that check three different places and return the closest match, leaving the developer blissfully ignorant of his error.
While we're at it, we also need strict limits on valid paths (e.g. no mismatched braces, no braces inside braces) and nesting depth (most frameworks already enforce some sort of limit there), and what to do when such limits are violated. Again, the default should be a loud warning and obvious failure, not silent mangling to make the data fit.
This is supposed to be a new standard, there's no backward-compatibility baggage to carry. So let's make this as clean and unambiguous as possible!
1. Other form encoding types have discrete MIME type fields.
2. While data URIs are awesome, it forces you to process that URI in some way (parse, regex, whatever) just to get the MIME type. Also, this forces 13 bytes of redundancy every time.
> It lets you comment out a line without having to remove the trailing comma from the previous line
That would only be an advantage above comma at the end on the last line. It really only moves the problem from the last line to the first one. Now you can't comment out the first line without removing a comma...
In the comma-first method, adding a new element at the end produces a one mine diff. But when doing comma-last method, then adding a new element to the list gives a two line diff.
It can make resolving merges just a little bit easier.
I think it's great when doing adhoc analysis that only you are ever going to look at, though maybe I just need a better IDE. Another neat / silly trick is to add a truthy condition at the beginning of a where:
select *
from foo
where 1=1
AND bar = 1
AND baz = 2
That way you can comment out any of your conditions without breaking the syntax.
If IE didn't barf on a trailing comma after the last value, we wouldn't have this problem. That's probably the biggest source of all my IE JavaScript bugs. To be fair, I'm not actually sure what the spec says, but the spec might be wrong :-)
IE9 and above should handle it fine, so it's just a matter of time before this problem goes away. In the meantime there are two things you can do: use an editor which highlights this as an error (e.g. Webstorm), or add a pre-commit hook on your code repository which disallows committing code that doesn't pass jslint checks. We use both practices on a 200 kline js codebase, and it has essentially gotten rid of javascript errors at the customer due to bad syntax (as well as those tricky = / == / === issues).
True about the comment. That's why I like the trailing comma style in pep8[1], which enables both commented out array elements and smaller, more meaningful diffs when making additions or deletions. If only trailing commas were valid JSON!
> It lets you comment out a line without having to remove the trailing comma from the previous line. That'd be useful if JSON had comments.
It also lets you remove the line (for the same reason that you can comment it out) without modifying other lines. This is somewhat convenient if the file is one that you are going to manually edit (and even more useful if it is going to be the subject of line-oriented diffing tools, since the only changed lines will be the ones with meaningful changes.)
If you use alphabetically ordered keys (a pretty good practice anyhow), that advantage goes away. If you just develop a habit of adding stuff to the front where semantics do not matter, that advantage goes away.
I found it almost a necessity when I started generating larger DOM trees in JavaScript. No more games of hunt-the-missing-comma-on-the-ragged-edge: https://gist.github.com/insin/8e72bed793772d82ca8d (These syntax woes are one of the main problems React's JSX solves)
The comma first pattern is pretty common. I've seen it used in JavaScript (as well as JSON), SQL and Haskell; I'm sure there are others.
Some encourage this so you can comment out a line without having to add/remove commas. However, this isn't the case if you want to comment out the first line; so in effect, it just moves the problem from the bottom to the top of the list. Other proponents say it's easier to spot typos (i.e., missing commas). For example:
In the second one, it's clearer that there's a comma missing.
Personally, I think it's ugly and, while I'm sure I've erroneously omitted commas before, they're caught by the parser. No example comes immediately to mind where missing a comma can be interpreted as something other than a parse error.
Concerned. Memes are in-group signalling one notch above the crudest kind such as football chants, a few notches below the more sophisticated kind like quoting Shakespeare, but all ultimately with the potential to exclude and confuse - which is definitely the opposite of what a technical spec should be trying to do.
It amazes me that we're now at the point of standardizing sticking array references inside strings and yet we're still not having a serious discussion about what comes after HTML.
It amazes me that you think we could have a serious discussion about what comes after HTML when essentially nobody is seriously considering replacing HTML. HTML is what it is, nothing else is what HTML is, and HTML is going to be around a good long while.
It amazes me that you think we could have a serious discussion about what comes after HTML when essentially nobody is seriously considering replacing HTML.
Of course we are. There are numerous new technologies competing to replace the role we have shoe-horned HTML into playing -- look at all the different templating technologies now in development, for example, and things like Web Components. There are also numerous technologies for marking up semantic content and/or styling such data for presentation.
The only thing that isn't changing right now is that we're still stuck with eventually reducing these alternatives to plain HTML for display in browsers, which is about as good an idea as insisting we reduce all styling to CSS and all programming to JavaScript. It's a historical accident, it's resulted in widespread dependence on tools that are nowhere near fit for the purposes they are now asked to serve, but there is so much momentum in the industry that building tools to accommodate the weaknesses as well as possible is the preferred strategy over completely starting over. See also: Almost everything about programming ever.
Fully agreed. A low(ish)-level, non-JS API to the sea of DOM "primitives" inside blink/gecko/webkit/etc. suitable for targeting in an FFI-like manner by any language is the standard I'd like to see. We don't need 3 languages: it's all there in C++ classes and C structs.
I've been writing a lot of ClojureScript/React lately and with that I only really touch HTML to include CSS and scripts. It's pretty glorious, highly productive, and makes a very strong case for opening up lower levels to allow new paradigms to evolve - with this set-up, HTML and JS only get in the way.
The browser is now (almost) an OS in a VM and, imho, the sooner we start treating it like that the better.
I guess that it's because at first, it was assumed that it was one of the servers jobs to parse such structures based on their names.
Since these same servers failed to establish a consensus (Django using "a=1&a=2" as array, whilst php uses "a[]=1&a[]=1", for example), someone has to start ...
Seems pretty decent. Also neat that the nesting style could be repurposed to support nested structures in regular form-encoded HTML forms.
Main limitation on actually being able to use this is that `GET` and `POST` continue to be the only supported methods in browser form submissions right now, so eg. you wouldn't be able to make JSON `PUT` requests with this style anytime soon.
Might be that adoption of this would swing the consensus on supporting other HTTP methods in HTML forms.
Isn't that kind of like saying "They're still working on HTML after 23 years"? Technically you're right, but version 1.1 of XForms was completed and published over 5 years ago.
That said, XForms is dead AFAIK, and that's not a bad thing.
Yeah, my point was more along the lines of they've been working for 10 years and gotten near zero adoption. Not saying it will fail again for sure, but if this had value, XForms would have found it. URL-encoded POST data works just fine.
The latest release of my jarg[0] utility supports the HTML JSON form syntax. Writing out JSON at the command line is tedious, this makes it a little nicer. The examples from the draft are compatible with jarg:
The JSON-based file upload would be nice (AFAIK there's not great way to do this ATM, but I haven't looked in over a year). The rest seems pretty weak-tea though. I can see multiple issues with more defined type (e.g. numeric rather than string values, null rather than blank string), but without dealing with that stuff, this seems of extremely limited utility.
It's nice for small files but base64 really is inefficient. I think it is still used in mails, but really, you should use it only when you control all the usages that will be done with it.
In fact, HTML already has a solution for that in the form of multipart/form-data.
Instead of encoding the files directly inside JSON, you could add them as extra parts of the MIME message, and just have a reference in the JSON value (as in email, which uses a "cid:<part-identifier>" URI to inline images as such, as described in RFC2392).
You'd still need special support on the server, though.
I'm mobile so don't have a good way to just test this myself... Any idea how good/bad base64+gzip is (ie gzipping the json before submitting it)? If it's within a few percent then this probably isn't a bad solution!
The browser doesn't gzip _requests_ by itself, only the server does so with the _response_ if the user-agent (including browsers) states that it supports such content encoding. Of course you can implement gzip in JavaScript, but if you do that, you can already mangle the request and send the file to the server without Base64 encoding.
I humbly disagree. JSON Pointer is a syntax for specifying a location in a JSON object. Multiple form items with JSON pointer strings as names would map to the equivalent of a number of add operations in a PATCH call that start with an empty object.
Wow, W3C at it's best again. Non-modular, non-negotiable, JSON it is, take it or leave it. Well fuck you W3C. Base64 encoded files? Seriously? What if my app workes better with msgpack encoded forms? Or with XML encoded? So you're going to support one particular serialization format, quite a horrible one, but that's subjective and that's the whole point. Every app has different needs and you should spec. out a system that is modular and leaves the choice to the user, even for the price of "complicating things".
How would you deal with rendering arbitrary form encodings in the browser? A proposal adding support for form submission of arbitrary encodings could be valid, but it'd have to just be a single form input with the data included verbatim.
This proposal allows regular HTML forms with multiple input elements, but submitting over JSON. I can't see how you could define that for arbitrary encodings without first defining how the form fields map to the encoded data for all the encodings you'd want to support.
Eg: By referencing an encoder function using the standard on* attribute. Could be called onbeforesubmit=encodeMsgpack. This function would take a JS object, generated according to the W3C's JSON form spec, and return a pair of [string, arraybuffer]. String being the MIME content type, and arraybuffer containing the request body.
I don't know if there's a proper name for this ability, but neither JSON nor YAML allow you to embed child tree nodes inside of node values. This ability requires named end tags. Example:
<p>This is <b>bold</b> text.</p>
"p": "This is ... uh ... nevermind."
You could make a compelling argument that you shouldn't do this (separate block level and inline elements into separate encodings), but remember that even the relatively minor HTML->XHTML movement to put a bit of sanity into single-use tags like <br> -> <br/> failed miserably.
Nodes have multiple children. In this case, you can split the <p>'s children into three: a "text node"; a <b> node; and another text node. You'd end up with something like this:
(which is a lot easier for a machine to parse, at least.)
JSON and YAML are great for what they do: data serialization, but they're just not appropriate choices for text markup. You have to use the right tool for the job.
There was a time when the industry wanted to cram 100% of everything into XML, and that gave us XSLT, XAML, SOAP, etc. We don't want to go back to that, either.
Personally, I'm most fond of an extended Markdown syntax to replace HTML. But I'm not going to hold my breath waiting on web browser vendors to agree on such a syntax, so instead I have my HTTP server do the conversion to HTML for me.
But if you had to force it into JSON, then I would suggest using a different markup for inline elements, eg:
html
body
p: "This is [/italic/] text."
p: "This is [[google.com => a hyperlink.]]"
The WoW Armory website used to be written in XSLT on top of a horrendeous java stack. It's commonly referred to as the badmory and was slow as all hells. It had a pure-html fallback (rendered serverside into html) for unsupported browsers. It's one of the worst pieces of web I've ever seen... and I've seen some stuff.
It's no longer using all this, of course. They hired the guys from wowhead.com and had them redo the whole thing in a saner stack.
An interesting example I came across is documenting an xml schema by converting it to xhtml. I think it goes without saying that Javascript and JSON make react a lot friendlier though.
Here's the part that I don't particularly like, speaking of subtle incompatibilities:
I've seen this ugly pattern before in things that map XML to JSON. Values spontaneously convert to lists when you have more than one of them. Here come some easily overlooked type errors.I don't know of any common patterns for working with "a thing or a list of things" in JSON; that kind of type mixing is the thing you hope to get away from by defining a good API. But all code that handles HTML JSON is going to have to deal with these maybe-list-maybe-not values, in a repetitive and boilerplatey way.
I hope that a standard such as this will eventually be adopted by real-life frameworks such as Django REST Framework, but I also hope that they just reject the possibility of multiple fields with the same name.