Hacker News new | past | comments | ask | show | jobs | submit login
W3C HTML JSON form submission (w3.org)
262 points by ozcanesen on Nov 26, 2014 | hide | past | favorite | 100 comments



Let me say first of all that I'm glad they're working on standardizing this. When making REST APIs, I find HTML form scaffolds incredibly useful, but it means that you probably have to accept both JSON (because JSON is reasonable) and occasional form-encoding (because forms), leading to subtle incompatibilities. Or you have to disregard HTML and turn your forms into JavaScript things that submit JSON. Either way, the current state is ugly.

Here's the part that I don't particularly like, speaking of subtle incompatibilities:

    EXAMPLE 2: Multiple Values
    <form enctype='application/json'>
      <input type='number' name='bottle-on-wall' value='1'>
      <input type='number' name='bottle-on-wall' value='2'>
      <input type='number' name='bottle-on-wall' value='3'>
    </form>

    // produces
    {
      "bottle-on-wall":   [1, 2, 3]
    }
I've seen this ugly pattern before in things that map XML to JSON. Values spontaneously convert to lists when you have more than one of them. Here come some easily overlooked type errors.

I don't know of any common patterns for working with "a thing or a list of things" in JSON; that kind of type mixing is the thing you hope to get away from by defining a good API. But all code that handles HTML JSON is going to have to deal with these maybe-list-maybe-not values, in a repetitive and boilerplatey way.

I hope that a standard such as this will eventually be adopted by real-life frameworks such as Django REST Framework, but I also hope that they just reject the possibility of multiple fields with the same name.


Think that's fair point, yup. Very easy to do unintentionally, and it's non-intuitive that it'll suddenly change the return type.

It's not clear if this style is really necessary. Why not just use the indexed syntax in this case?

I guess it may be the authors intention to mirror form encoded submissions, that would send all of these values to the server.

I'd suggest raising the point on the issue tracker so that it at least gets sufficient design discussion.

https://github.com/darobin/formic/issues


Well, the safe way would be to mirror the current form spec and treat everything as arrays of strings by default.

    <form enctype='application/json'>
      <input type='text' name='fruit' value='apple'>
      <input type='number' name='bottle-on-wall' value='1'>
      <input type='number' name='bottle-on-wall' value='2'>
      <input type='number' name='bottle-on-wall' value='3'>
    </form>


    {
      "fruit": ["apple"],
      "bottle-on-wall": ["1", "2", "3"]
    }
Then add attributes to explicitly give the structure. Since this is a browser spec we can add attributes. Here's the same form but with some attributes giving the explicit encoding.

    <form enctype='application/json'>
      <input type='text' name='fruit' value='apple' enc='string'>
      <input type='number' name='bottle-on-wall' value='1' enc='array/number'>
      <input type='number' name='bottle-on-wall' value='2' enc='array/number'>
      <input type='number' name='bottle-on-wall' value='3' enc='array/number'>
    </form>

    {
      "fruit": "apple",
      "bottle-on-wall": [1, 2, 3]
    }
Dynamically varying JSON structures by form names and number of form elements is definitely going to be a source of bugs and security problems.


PHP handles this by having lists like this by requiring you to use an I put name like "bottle-on-a-wall[]". The brackets indicate that it should be a list. I don't hate this convention...


Yeah. I feel like the bracket syntax is better. The other example above isn't explicit enough, IMO.


the only problem with the bracket syntax is when it comes to nested things

        <form enctype='application/json'>
          <input name='wow[such][deep][3][much][power][!]' value='Amaze'>
        </form>

        // produces
        {
            "wow":  {
                "such": {
                    "deep": [
                        null
                    ,   null
                    ,   null
                    ,   {
                            "much": {
                                "power": {
                                    "!":  "Amaze"
                                }
                            }
                        }
                    ]
                }
            }
        }


What's the problem with that? If it's deeply nested, then it's deeply nested, no matter what.


It's pedantic, but it's a tad clunky now: I'd use something like "wow/such/deep[3]/much/power/!" to differentiate child nodes from array indexes, and also strongly advise against a node named "!" on principle. Of course then you'd say "but I want '/' in my node name!", and then you get to bikeshed an escape sequence, but you'd need that anyway for a node name with '[' or ']' in it.

However, what they've done is workable if all the browser vendors implement it.


square brackets are a reasonably common way to access data in an hash-like object. both Javascript (objects) and PHP (associative arrays) use/allow square brackets for this.


I don't think multiple fields with the same name is designed to be a normal mode of operation. If you actually want an array, use the [] suffix.

But if you have a bug which produces a form with multiple fields with the same name, what should it do? I can think of two strategies: last one wins and the proposal.

Both are probably going to lead to things going wrong in the backend. At least with the proposal you:

A) can detect it with boilerplate that looks for arrays where you expect primitives.

B) figure out what the original data was if you need to try and manually repair it.

I'm sure we'll see lots of bad tutorials suggesting you just use the same field name to make an array though :(


do any server-side environments currently present an array/array-like structure from request parameters with duplicate names (and specifically without any kind of array indicator, like name[] in php requests for example)?


Sure, yes. In Python land for example Django and Flask both present the request parameters in a dictionary-like data structure that supports lookup of the complete list of values passed for a given name. Regular lookups will give you just a single value, but they also present an interface to get the complete list.


jersey[0], the reference implementation of JAX-RS[1] does this if you specify an incoming query string parameter as a `List`

[0] https://jersey.java.net/ [1] https://jax-rs-spec.java.net/


In fairness, you have to look at how standards get somewhere - this is an editor's draft which is a starting point of an idea rather than a done deal. Don't be surprised if the final product winds up being significantly different than this - even better, get involved in the conversation to make it what we need. That's not to pour cold water on it: It's good as it is, but there are changes which potentially help explain the magic of participating in form encoding and submission which may be better and allow more adaptation and experimentation over time.


Why such an emphasis on "losing no information" when the form is obviously malformed?

You need only to look at the crazy ways in which MySQL mangles data to realize that silently "correcting" invalid input is not the way to go. The web has suffered enough of that bullshit, we seriously don't need another. Example 7 (mixing scalar and array types) gives me shudders. Example 10 (mismatched braces) seems to have a reasonable fallback behavior, though I'd prefer dropping the malformed field altogether.

If the form is obviously malformed, transmission should fail, and it should fail as loudly and catastrophically as possible, so that the developer is forced to correct the mistake before the code in question ever leaves the dev box.

Preferably, the form shouldn't even work if any part of it is malformed. If we're too timid to do that, at least we should leave out malformed fields instead of silently shuffling them around. Otherwise we'll end up with frameworks that check three different places and return the closest match, leaving the developer blissfully ignorant of his error.

While we're at it, we also need strict limits on valid paths (e.g. no mismatched braces, no braces inside braces) and nesting depth (most frameworks already enforce some sort of limit there), and what to do when such limits are violated. Again, the default should be a loud warning and obvious failure, not silent mangling to make the data fit.

This is supposed to be a new standard, there's no backward-compatibility baggage to carry. So let's make this as clean and unambiguous as possible!


That's not the Web way. The Web is the lowest common denominator, all that talk of "correctness" goes over the head of most Web Developers.


I don't agree with Example 9, we should use data uri scheme for file content

    "files": [{
      "name": "dahut.txt",
      "src": "data:text/plain;base64,REFBQUFBQUFIVVVVVVVVVVVVVCEhIQo="
    }]
http://en.wikipedia.org/wiki/Data_URI_scheme


1. Other form encoding types have discrete MIME type fields.

2. While data URIs are awesome, it forces you to process that URI in some way (parse, regex, whatever) just to get the MIME type. Also, this forces 13 bytes of redundancy every time.


    {
      "name":   "Bender"
    , "hind":   "Bitable"
    , "shiny":  true
    }
Who puts commas at the start of a continuing line? What good could that possibly do?


It lets you comment out a line without having to remove the trailing comma from the previous line. That'd be useful if JSON had comments.

I've seen people do this in the SELECT portion of SQL queries too.

Personally, I hate this.


> It lets you comment out a line without having to remove the trailing comma from the previous line

That would only be an advantage above comma at the end on the last line. It really only moves the problem from the last line to the first one. Now you can't comment out the first line without removing a comma...


In the comma-first method, adding a new element at the end produces a one mine diff. But when doing comma-last method, then adding a new element to the list gives a two line diff.

It can make resolving merges just a little bit easier.


I think it's great when doing adhoc analysis that only you are ever going to look at, though maybe I just need a better IDE. Another neat / silly trick is to add a truthy condition at the beginning of a where:

   select * 
   from foo
   where 1=1 
     AND bar = 1
     AND baz = 2
That way you can comment out any of your conditions without breaking the syntax.


If IE didn't barf on a trailing comma after the last value, we wouldn't have this problem. That's probably the biggest source of all my IE JavaScript bugs. To be fair, I'm not actually sure what the spec says, but the spec might be wrong :-)


IE9 and above should handle it fine, so it's just a matter of time before this problem goes away. In the meantime there are two things you can do: use an editor which highlights this as an error (e.g. Webstorm), or add a pre-commit hook on your code repository which disallows committing code that doesn't pass jslint checks. We use both practices on a 200 kline js codebase, and it has essentially gotten rid of javascript errors at the customer due to bad syntax (as well as those tricky = / == / === issues).


True about the comment. That's why I like the trailing comma style in pep8[1], which enables both commented out array elements and smaller, more meaningful diffs when making additions or deletions. If only trailing commas were valid JSON!

[1] https://dev.launchpad.net/PythonStyleGuide


I do it in UPDATES

    UPDATE FOO
    SET BAR = 1
    ,   BAZ = 2
    ,   QUUX = 3
I like it that way. Makes it clear the relationship between the continuing lines and their parents. Just the natural extension of

    WHERE ALICE = 1
    AND BOB =2
    AND CHARLIE = 3


That seems backwards to me. I put the operator on the previous line so I know the expression has more parts coming.

In your example, I don't know SET BAR = 1 isn't the end until I read the next line.


Different strokes I guess. I've tried both ways and I've found this one pleasant for scanning down an expression.

I mean, it's kind of the SQL analogue to the

  MyObject
  .Child
  .Grandchild
  .ItsMethod(stuff)
  .HeyThatReturnedAnotherObject()
  .MoreObject(moreStuff)


> It lets you comment out a line without having to remove the trailing comma from the previous line. That'd be useful if JSON had comments.

It also lets you remove the line (for the same reason that you can comment it out) without modifying other lines. This is somewhat convenient if the file is one that you are going to manually edit (and even more useful if it is going to be the subject of line-oriented diffing tools, since the only changed lines will be the ones with meaningful changes.)

I'd agree that its less readable, though.


it also let's you add a line at the end (most common case) without modifying the previous line.


If you use alphabetically ordered keys (a pretty good practice anyhow), that advantage goes away. If you just develop a habit of adding stuff to the front where semantics do not matter, that advantage goes away.


It also lets you find missing commas more easily.


I found it almost a necessity when I started generating larger DOM trees in JavaScript. No more games of hunt-the-missing-comma-on-the-ragged-edge: https://gist.github.com/insin/8e72bed793772d82ca8d (These syntax woes are one of the main problems React's JSX solves)


It would also make removing a line a 1-line diff instead of 2. But I still fully agree with the hate.


The comma first pattern is pretty common. I've seen it used in JavaScript (as well as JSON), SQL and Haskell; I'm sure there are others.

Some encourage this so you can comment out a line without having to add/remove commas. However, this isn't the case if you want to comment out the first line; so in effect, it just moves the problem from the bottom to the top of the list. Other proponents say it's easier to spot typos (i.e., missing commas). For example:

    var something = {
      foo: 'bar',
      abc: '123'
      quux: null,
      xyz:  456
    };
vs:

    var something = {
      foo: 'bar'
    , abc: '123'
      quux: null
    , xyz:  456
    };
In the second one, it's clearer that there's a comma missing.

Personally, I think it's ugly and, while I'm sure I've erroneously omitted commas before, they're caught by the parser. No example comes immediately to mind where missing a comma can be interpreted as something other than a parse error.


Me, it is so much better. The comma is really a particle of the next item, not the previous.

Here's an item there's going to be another one,

Or

Here's an item

, I'm another one


That's how the amazing TJ Holowaychuk declares his imports.

https://github.com/mochajs/mocha/blob/master/lib/suite.js


You'll see it like this:

    { "name": "Bender"
    , "hind": "Bitable"
    , "shiny": true }


It makes diffs look nicer when you add and remove items


Send a pull - github for the win! http://darobin.github.io/formic/specs/json/


I'm not sure whether to be heartened or concerned that the W3C is referencing the doge meme in its specifications... see Example 6.


Concerned. Memes are in-group signalling one notch above the crudest kind such as football chants, a few notches below the more sophisticated kind like quoting Shakespeare, but all ultimately with the potential to exclude and confuse - which is definitely the opposite of what a technical spec should be trying to do.


I don't see why would this exclude or confuse. It's essentially an in-joke, and shouldn't affect anyone reading the spec who is unaware of the meme.

I'm not a fan, but I don't think we should make it out to be more than it really is.


It amazes me that we're now at the point of standardizing sticking array references inside strings and yet we're still not having a serious discussion about what comes after HTML.


It amazes me that you think we could have a serious discussion about what comes after HTML when essentially nobody is seriously considering replacing HTML. HTML is what it is, nothing else is what HTML is, and HTML is going to be around a good long while.


It amazes me that you think we could have a serious discussion about what comes after HTML when essentially nobody is seriously considering replacing HTML.

Of course we are. There are numerous new technologies competing to replace the role we have shoe-horned HTML into playing -- look at all the different templating technologies now in development, for example, and things like Web Components. There are also numerous technologies for marking up semantic content and/or styling such data for presentation.

The only thing that isn't changing right now is that we're still stuck with eventually reducing these alternatives to plain HTML for display in browsers, which is about as good an idea as insisting we reduce all styling to CSS and all programming to JavaScript. It's a historical accident, it's resulted in widespread dependence on tools that are nowhere near fit for the purposes they are now asked to serve, but there is so much momentum in the industry that building tools to accommodate the weaknesses as well as possible is the preferred strategy over completely starting over. See also: Almost everything about programming ever.


Fully agreed. A low(ish)-level, non-JS API to the sea of DOM "primitives" inside blink/gecko/webkit/etc. suitable for targeting in an FFI-like manner by any language is the standard I'd like to see. We don't need 3 languages: it's all there in C++ classes and C structs.

I've been writing a lot of ClojureScript/React lately and with that I only really touch HTML to include CSS and scripts. It's pretty glorious, highly productive, and makes a very strong case for opening up lower levels to allow new paradigms to evolve - with this set-up, HTML and JS only get in the way.

The browser is now (almost) an OS in a VM and, imho, the sooner we start treating it like that the better.


it'll be like when they replaced paper with the internet...


What's wrong with a human-readable serialization format?


Nothing - I was talking about this:

    <input name='kids[1]' value='Thelma'>
    <input name='kids[0]' value='Ashley'>


I guess that it's because at first, it was assumed that it was one of the servers jobs to parse such structures based on their names.

Since these same servers failed to establish a consensus (Django using "a=1&a=2" as array, whilst php uses "a[]=1&a[]=1", for example), someone has to start ...


Whoa. I missed that. Probably by clicking the link to the serialization algorithm.

That's bad. Really, really bad.


What structure are we suggesting? This?:

    <inputgroup name='kids'>
        <input value='Thelma'>
        <input value='Ashley'>
    </inputgroup>


QML?


Seems pretty decent. Also neat that the nesting style could be repurposed to support nested structures in regular form-encoded HTML forms.

Main limitation on actually being able to use this is that `GET` and `POST` continue to be the only supported methods in browser form submissions right now, so eg. you wouldn't be able to make JSON `PUT` requests with this style anytime soon.

Might be that adoption of this would swing the consensus on supporting other HTTP methods in HTML forms.


They're still working on XForms after 10 years http://www.w3.org/MarkUp/Forms/


Isn't that kind of like saying "They're still working on HTML after 23 years"? Technically you're right, but version 1.1 of XForms was completed and published over 5 years ago.

That said, XForms is dead AFAIK, and that's not a bad thing.


Yeah, my point was more along the lines of they've been working for 10 years and gotten near zero adoption. Not saying it will fail again for sure, but if this had value, XForms would have found it. URL-encoded POST data works just fine.


XForms has a bigger scope and is connected to XHTML. Two good reasons why this might succeed where XForms failed.


Work on XForms 2.0 is ongoing! Odds of it ever getting implemented in a browser are slim, however.


My XForms implementation (XSLTForms) works in a browser using XSLT and Javascript. The next version will even just require Javascript!


The latest release of my jarg[0] utility supports the HTML JSON form syntax. Writing out JSON at the command line is tedious, this makes it a little nicer. The examples from the draft are compatible with jarg:

    $ jarg wow[such][deep][3][much][power][!]=Amaze
    {"wow": {"such": {"deep": [null, null, null, {"much": {"power": {"!": "Amaze"}}}]}}}
[0]: http://jdp.github.io/jarg/


Kind of nice, basically turns form submission into a bare-bones API call.


Which they pretty much already were. The only value I can imagine this adding is a way to encode forms with nested structure.


The changes they make solve the problem of corner cases submitting weird input to an API that expects only JSON

In 2014, I think this is a good idea.

It also solves the issues with ambiguous syntax surrounding arrays of values.


Am I the only one who is worried about the fact that this is exponential in size?

  <input name="field[1000000]">
Will generate a request that is ~5MB.


I don't see why that should be worrying. What's the scenario you're foreseeing?


So, y'know, raise it on their issue tracker. :)


The JSON-based file upload would be nice (AFAIK there's not great way to do this ATM, but I haven't looked in over a year). The rest seems pretty weak-tea though. I can see multiple issues with more defined type (e.g. numeric rather than string values, null rather than blank string), but without dealing with that stuff, this seems of extremely limited utility.


It's nice for small files but base64 really is inefficient. I think it is still used in mails, but really, you should use it only when you control all the usages that will be done with it.


Submitting files with this form encoding is of course going to have the base64 overhead, but otherwise this looks great!


In fact, HTML already has a solution for that in the form of multipart/form-data.

Instead of encoding the files directly inside JSON, you could add them as extra parts of the MIME message, and just have a reference in the JSON value (as in email, which uses a "cid:<part-identifier>" URI to inline images as such, as described in RFC2392).

You'd still need special support on the server, though.


Submitting files with this form encoding is of course going to have the base64 overhead

HTTP connections typically use compression with any modern browser and server, so there is likely to be little overhead in practice.


Yea this makes me wish there was a better way to deal with this. JSON is popular because it's simple, but as a result sucks for a bunch of use-cases.


I'm mobile so don't have a good way to just test this myself... Any idea how good/bad base64+gzip is (ie gzipping the json before submitting it)? If it's within a few percent then this probably isn't a bad solution!


The browser doesn't gzip _requests_ by itself, only the server does so with the _response_ if the user-agent (including browsers) states that it supports such content encoding. Of course you can implement gzip in JavaScript, but if you do that, you can already mangle the request and send the file to the server without Base64 encoding.


A discussion about the implementation of the spec in jquery. It started on June 21

https://github.com/macek/jquery-serialize-object/issues/24



A new standard for referencing a point in a JSON object? I wonder if they considered RFC 6901 and rejected it.

I personally prefer this new square bracket notation, but being a standard already gets more points.


JSON Pointer isn't quite the same thing.


I humbly disagree. JSON Pointer is a syntax for specifying a location in a JSON object. Multiple form items with JSON pointer strings as names would map to the equivalent of a number of add operations in a PATCH call that start with an empty object.


Wow, W3C at it's best again. Non-modular, non-negotiable, JSON it is, take it or leave it. Well fuck you W3C. Base64 encoded files? Seriously? What if my app workes better with msgpack encoded forms? Or with XML encoded? So you're going to support one particular serialization format, quite a horrible one, but that's subjective and that's the whole point. Every app has different needs and you should spec. out a system that is modular and leaves the choice to the user, even for the price of "complicating things".


How would you deal with rendering arbitrary form encodings in the browser? A proposal adding support for form submission of arbitrary encodings could be valid, but it'd have to just be a single form input with the data included verbatim.

This proposal allows regular HTML forms with multiple input elements, but submitting over JSON. I can't see how you could define that for arbitrary encodings without first defining how the form fields map to the encoded data for all the encodings you'd want to support.


Eg: By referencing an encoder function using the standard on* attribute. Could be called onbeforesubmit=encodeMsgpack. This function would take a JS object, generated according to the W3C's JSON form spec, and return a pair of [string, arraybuffer]. String being the MIME content type, and arraybuffer containing the request body.


If you're going to run custom JS code, why not simply submit it through JS HTTP requests? What have you gained by this new API?


I think the spec you are looking for is JavaScript.


Guessing that it's interesting when using an uniform endpoint for both forms and js-driven requests.


Can we just get rid of HTML and replace it with JSON while we're at it?


I don't know if there's a proper name for this ability, but neither JSON nor YAML allow you to embed child tree nodes inside of node values. This ability requires named end tags. Example:

    <p>This is <b>bold</b> text.</p>
    "p": "This is ... uh ... nevermind."
You could make a compelling argument that you shouldn't do this (separate block level and inline elements into separate encodings), but remember that even the relatively minor HTML->XHTML movement to put a bit of sanity into single-use tags like <br> -> <br/> failed miserably.


Nodes have multiple children. In this case, you can split the <p>'s children into three: a "text node"; a <b> node; and another text node. You'd end up with something like this:

    {
      p: [
        { text: 'This is' },
        { b: { text: 'bold' }},
        { text: 'text.'}
      ]
    }
Not that I don't agree with you -- my suggestion is pretty messy and doesn't even go into how we'd deal with tag attributes -- but it's doable.


This has to be pre-processed somehow, if people don't like writing HTML by hand, consider writing that.


Yeah, and that's not really equivalent HTML either. That's more like:

    <p>
      <text>This is</text>
      <b><text>bold</text></b>
      <text>text.</text>
    </p>
(which is a lot easier for a machine to parse, at least.)

JSON and YAML are great for what they do: data serialization, but they're just not appropriate choices for text markup. You have to use the right tool for the job.

There was a time when the industry wanted to cram 100% of everything into XML, and that gave us XSLT, XAML, SOAP, etc. We don't want to go back to that, either.

Personally, I'm most fond of an extended Markdown syntax to replace HTML. But I'm not going to hold my breath waiting on web browser vendors to agree on such a syntax, so instead I have my HTTP server do the conversion to HTML for me.

But if you had to force it into JSON, then I would suggest using a different markup for inline elements, eg:

    html
      body
        p: "This is [/italic/] text."
        p: "This is [[google.com => a hyperlink.]]"


JSON rendered with react Js using JSX feels to me like the future of html.


I've heard that before...

https://en.wikipedia.org/wiki/XSLT


Isn't the Blizzard website largely written in XML and XSLT? I can imagine quite a few scenarios where that can be a good approach.


The WoW Armory website used to be written in XSLT on top of a horrendeous java stack. It's commonly referred to as the badmory and was slow as all hells. It had a pure-html fallback (rendered serverside into html) for unsupported browsers. It's one of the worst pieces of web I've ever seen... and I've seen some stuff.

It's no longer using all this, of course. They hired the guys from wowhead.com and had them redo the whole thing in a saner stack.


An interesting example I came across is documenting an xml schema by converting it to xhtml. I think it goes without saying that Javascript and JSON make react a lot friendlier though.


> wow[such][deep][3][much][power][!]

And there goes my interest for this submission. Don't use overused memes in a submission. Liking the idea though.


I enjoyed this. I also liked the Bender reference.

It reminded me of that show that makes me laugh on Netflix.

It also illustrated the point that he was trying to make. I prefer warm examples, rather than ones using book titles.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: