Hacker News new | past | comments | ask | show | jobs | submit login
Mark – A simple and unified notation for both object and markup data (github.com/henry-luo)
158 points by henryluo on Feb 5, 2018 | hide | past | favorite | 169 comments



So this is literally Lisp, just with curly braces instead of parenthesis :).

I don't understand this part of the readme:

> The advantage of Mark over S-expressions is that it is more modern, and can directly run in browser and node.js environments.

Does this mean I'm so out of date with JS that this syntax is actually legit JS? Or does Mark run its own parsers, at which point it's just like sexps, except it uses the "more modern" curly brace instead of "less modern" parenthesis?


EDN, in addition to being simpler, also has an optimized transport abstraction, transit, which serializes to application/transit+msgpack (binary) or application/transit+json (often vastly faster deserialization than messagepack because which hits the language native JSON parsers for performance in web browser, python, etc). It surprised me how big a deal hitting native json parsers is, EDN was at the top of our profiler for 100kb payloads but transit+json is zippy. The abstraction also handles value de-duplication and has a human-readable verbose writer.

http://blog.cognitect.com/blog/2014/7/22/transit http://cognitect.github.io/transit-tour/

cool project though, good to see s-expressions become more popular


> So this is literally Lisp

No, it's close to S-expressions, but it's not a programming language like Lisp.

> just with curly braces instead of parenthesis

Well, and more syntax than S-expressions: it's got both objects and arrays as fundamental structures instead of just lists, and it has commas as noise characters.


> Well, and more syntax than S-expressions: it's got both objects and arrays as fundamental structures instead of just lists

Well, Common Lisp has both of those as fundamental atoms:

    #S(foo :bar 3)

    #(1 2 3)
The former creates an instance of a FOO struct with its BAR slot set to 3; the latter is a 3-item array.


And then CLOS objects, hash tables, regular expressions, etc. can be added with reader macros.


It's cool and I love it, but it's irrelevant in the context of a universal data exchange language. In such case, you'd want to have those primitives defined in the exchange language spec itself - even if you'd end up implementing them as reader macros in CL (which I don't recommend - reader macros turn your READs into EVALs, which you obviously shouldn't do on untrusted input).


> you'd want to have those primitives defined in the exchange language spec itself

I agree with this: certain things need to be in the spec.

> even if you'd end up implementing them as reader macros in CL (which I don't recommend - reader macros turn your READs into EVALs, which you obviously shouldn't do on untrusted input).

This I don't agree with, because technically #\( & #\" are reader macros … they're just very well-defined reader macros. Presumably a spec which defined hash tables, regular expressions or whatever would define them as well as the Lisp spec defines lists and strings (and if not, well — it's a bad spec!).


> I agree with this: certain things need to be in the spec.

That was my main point. In the second part of the comment I didn't mean to discourage use of reader macros - it was more of an aside that the general facility of CL-style reader macros literally makes READ "shell out" into EVAL, so you need to (diligently) disable it for untrusted input (or reimplement a limited READ by hand). So we can't say "oh, but S-exps in Common Lisp can have anything through reader macros". Presumably if hash table literals were specified as a part of basic syntax, we could depend on it being standard and part of the safe subset of READ's duties; as it is however, we can't depend on it for arbitrary inputs.


Yeah, didn't mean it that literally.

> it's got both objects and arrays as fundamental structures instead of just lists

I'm bit sad that various lisps never standardized on a format for this; had they, then maybe we would have S-expressions as a popular data interchange format.


They did. It's called Common Lisp.


Common Lisp doesn't have a standard representation of hash tables, unfortunately. Also, CL didn't clean up the Lisp space completely; right now, there are Schemes, there's Clojure, LFE, Hy, and bunch of other niche Lisps, each with their own idiosyncrasies around syntax.


> Common Lisp doesn't have a standard representation of hash tables, unfortunately.

Ah, but you mentioned objects & arrays, not hash tables grin. Agreed that hash tables would have been nice, although that does then get into issues such as canonical representations (which matter e.g. for hashing).

> Also, CL didn't clean up the Lisp space completely; right now, there are Schemes, there's Clojure, LFE, Hy, and bunch of other niche Lisps, each with their own idiosyncrasies around syntax.

It would be nice if folks who want to use Lisp would use Common Lisp rather than reïnventing various forms of more-or-less round wheel. It's a remarkably well-engineered language (not perfect, of course: argument order, upcasing, pathnames & environments all leap to mind as problematic areas), and so far as I can tell quite a bit better than any of the alternatives.

In particular, it'd be really nice to see people using Schemes for serious engineering work to use Lisp instead. It's just not well-suited to writing large systems, except by grafting on an ad-hoc, informally-specified, potentially bug-ridden subset of Lisp.


> Well, and more syntax than S-expressions: it's got both objects and arrays as fundamental structures instead of just lists

Which can easily be added to sexps (see EDN).


The S-expressions of mainstream Lisp dialects have objects and arrays. E.g. Common Lisp:

  ;; structure:
  #s(type :slot1 value1 :slot2 value2 ...)

  ;; vector:
  #(1 2 3 4)


Perhaps more like EDN since it doesn't have a runtime. But yeah, it's s-exps with curly braces. Which in my opinion, look worse than round parentheses … but that's opinion

https://github.com/edn-format/edn https://learnxinyminutes.com/docs/edn/


I really like edn. I wish it were more widely used.

It hits a sweet spot for me between yaml and json. Yaml is easy to type/read, but I feel it's a bit too complex on the parsing side. And json is a pain to type, so I'm reluctant to use it for human entered configuration files.


"More modern" is a euphemism for "not made by someone who had a gray beard in 1975". It's just another form of ageism.


Even by that criteria, there's EDN, which is basically sexp with more built-in types (and extensibility).


Being 'more modern' means Mark takes a JS-first or web-first approach in its design. Whether we like it or not, JS has dominated the web. JSON is successful, partly because it takes a JS-first approach. Mark inherits this approach.


The only ageist thing here is that you assumed the author's age, or that it had anything to do with the claim of modernity.


Your statement is factually incorrect.


It's sad that you feel that improving older, outdated technologies is a form of ageism.


It is sad that so much junior talent is wasted on attempts at improvement via blind reiteration. This is one possible consequence of a mentorship vacuum: bright minds look for challenges, even ones that have been adequately overcome long ago. Imagine the good that would come of directing such energy with a clear purpose.


There is nothing sad in that. Yes, it might be non-optimal, but by trying out your own approach you will find out its limitations first-hand and understand the problem and reasoning behind alternate approaches much better.

Junior talent doesn't become senior talent by just doing the right things, but by doing the wrong things and learning from them.


Junior talent becomes senior talent by virtue of acquiring experience. Doing the wrong things and learning is one way of acquiring experience, but being directed by a mentor is a much more efficient and productive way of acquiring that seniority. A lot of lessons learned from reinventing wheels can be distilled down into a conversation or a pair programming session, but in the absence of senior leadership, it becomes a week long hacking session on a library that will ultimately rot in Git forever because it's foundationally unsound.


It becomes interesting when later said library is picked up and put into production by similar novices.


Yup. There really is nothing wrong with reinventing the wheel for educational purposes. The problem starts when that reinvented wheel gains a good README/webpage, and gets picked up by an ecosystem driven by novices.


>Yes, it might be non-optimal, but by trying out your own approach you will find out its limitations first-hand and understand the problem and reasoning behind alternate approaches much better.

Hmm... I think that’s how PHP was made.


The problem here is that "modern" is usually a justification for regression, relative to older technologies, usually done by people who never bothered to look at the old technologies before declaring them obsolete.


Reminds me of a recent talk at FOSDEM based on this premise.

The circuit less traveled. Investigating some alternate histories of computing: https://www.youtube.com/watch?v=jlERSVSDl7Y


Wow, that was a surprisingly insightful talk; thanks for linking!

I'm intrigued in particular by the talk's conclusion about (I guess, again) disappearing distinction between volatile and non-volatile storage. To date, I've been a vocal advocate of hierarchical filesystems (not UNIX, but just as a unit of user-facing abstraction). The talk sent me on the way of reflecting whether I'm not just supporting another historical "wrong path". Lots of more thinking in front of me here. So thanks.


"Mark" does not improve on anything.


I would have rather said it's like QML language (which is used as the basis for Qt QML UI language). Can't find the link to the reference for plain QML. QML is like JSON, but with typed objects, and objects can have children in turn, so you can create tree structures from objects. It's actually very nice, I wish it had parsers for more languages.


It runs its own parser. Which means mark is a big silly ol string until you parse it.

var obj = Mark.parse(`{div {span 'Hello World!' }}`);


Doesn't seem like valid JS, you wouldn't need mark.js if that were the case.


Yes you would. JSON is valid JS, and executing JSON in a browser is a recipe for disaster.


JSON isn’t valid JS – its representation of strings allows U+2028 and U+2029 to appear unescaped, but JavaScript string literals don’t.

Not sure how else executing (valid) JSON in a browser would be a recipe for disaster? `eval` was the standard way to parse JSON from trusted sources for a long time.


> Does this mean I'm so out of date with JS that this syntax is actually legit JS? Or does Mark run its own parsers, at which point it's just like sexps, except it uses the "more modern" curly brace instead of "less modern" parenthesis?

It has its own parsing and stringify library it looks like: https://github.com/henry-luo/mark#markjs


Firstly, I highly respect Lisp personally, and I have no intention of downplaying it. As some have seen, the Lisp spirit is actually in the design of Mark.

Secondly, to clarify what I mean by 'being more modern'. Of course, it does not mean changing from () to <> or {}, will make it more modern or something better.

Being 'more modern' means Mark takes a JS-first or web-first approach in its design. Whether we like it or not, JS has dominated the web. JSON is successful, partly because it takes a JS-first approach. Mark inherits this approach.

Being JS-first, means there'll be least adoption barrier in web.

Being JS-first, of course does not mean JS-only. Mark is designed to be generic and used by other programming languages like JSON.


I don't understand it either.

Looks like s-expressions to me and it isn't legit JS.


It's literally common lisp down to the use of pre-expressions with no binding as pragmas. It's like lisp someone dropped on a char and now all the parenthesis have a funny bump.

But it's not literally lisp in the sense that the meta-syntactic stuff isn't there.


Genuine question, my impression is that Lisp is dynamically typed, but this uses type declarations. Would that make it different than Lisp?


Descriptions of typing are about as unevenly used as descriptions like "pass by reference." (So, practically useless.)

In Lisps:

- values have types (bool, symbol, number, list, array, structs, functions, ...)

- variables have by default one type: union of all the above

- in Common Lisp, you can restrict the types of values allowed in a variable

> but this uses type declarations

I must have missed this. If it's not just what struct-like entities are allowed in the markup, where did you see that?


Lisp has strong typing, so 1 is 1, not "1" or #\1, which, unless mark has a built in way of annotating types doesn't give this any advantages over s-expressions.


Lisp expressions also don't have any annoying type clutter that you have to have at every node in the syntax. Like (1 "1") is just a list of two things; we don't need the word "list" anywhere.


It's a superset of JSON, which is valuable.


One major weakness of JSON is lack of a corresponding "infoset"; that is, an equivalence predicate. When are two JSON blobs "the same"? There's no sign of anything like this here.

Another is the lack of support for binary data. There's no sign of support for binary data here.

Finally, there's this claim:

> The advantage of Mark over S-expressions is that it is more modern, and can directly run in browser and node.js environments.

Is it more modern? I don't think I care.

Can it directly run in browser and node.js environments? What does that mean? It seems to need a parser. But then, S-expression parsers certainly directly run in browser and node.js environments.

---

IMO, SPKI SEXPs are much more sensible than this design and many, many other designs:

https://people.csail.mit.edu/rivest/Sexp.txt


> IMO, SPKI SEXPs are much more sensible than this design and many, many other designs

Yes, yes, ten thousand times yes! I really don't understand why, over two decades hence, the world has stuck with XPKI & ASN.1, and has invented XML & JSON, when SPKI solved the PKI problem for good & canonical S-expressions solved the flexible- and human-readable–data-exchange problems for good.


Since you both seem to know the spec: how would you encode key/value pairs? Or would you have to have a list of nested lists, like

    (my_dict (key value) (key value) (key value))
Un-ordered qualities for data can be useful (e.g. they allow you to reorder data to stream "important" stuff first), but I don't see it anywhere in here.


With canonical S-expressions, unordered sets are a problem because part of the point is to be able to have a single canonical sequence of bytes, which can be hashed or compared bytewise for equality.

In general, I'd resist specifying data as arbitrary key-value pairs, but if I decided that I indeed needed them, I'd do exactly as you suggest — and I'd mandate that the be sorted lexicographically by their keys.


Each existing format have advantages and disadvantages for particular purposes.

Benefit of HTML: You can actually write it by hand and easily see where each element begins and ends, even when the document is longer than a screenfull. Mark has the "}}}}} problem with larger documents, so it is not as suitable for human-written markup.

It is not clear to me how mixed content like <cite>Hello <i>world</i></cite> is expressed in Mark. I expect it will be pretty convoluted.

Benefits of JSON: Maps directly to simple data structures: List, dictionaries and simple values. Similar data structures are supported in almost any language. Mark has "type names" and anonymous text content which complicates serialization and serialization a lot, and is sure to give interoperability (and perhaps security) problems.

So - worst of both worlds? Instead of tying to be an overall worse alternative to all the formats, they should rather focus on a specific niche where Mark can be a better alternative.

Take configuration files, for example. They don't have large amount of textual content like HTML, and they don't need to be transferred between disparate systems.

   {size width:100 height:100}
vs

   <size width="100" height="100"></size>
vs

  {"size": {"width":100,"height":100 }}
In this case, the Mark syntax is simpler and cleaner. Mixed content is not needed, which would make the format simpler. Yeah it is basically the same as S-expressions, but that is not a bad thing.


Mark has the "}}}}} problem with larger documents, so it is not as suitable for human-written markup.

And HTML has a problem of </span></li></ul></div></div></div></div></body></html>, all spread over nine different lines, one tag per line.

Take a look at https://github.com/keithj/alexandria/blob/master/definitions... which is Lisp code styled in a standard manner. I don't see any problem there.


The HTML example is much better than "}}}}} though, since you can e.g. add a new item at the end of the list without needing a specialized editor to locate the right position. This is one of the reasons for the redundancy in repeating the tag name in the end-tag. In theory Lisp should have the same problem, but usually code (hopefully) rarely have nested blocks larger than a screen, so it is not a big issue in practice, even if )))) looks ugly. Bottom line is code have a different structure than typical hypertext documents, so just because a notation is suitable for one does not mean it is suitable for the other.


But when s-expression is used to represent document, not a program, then it is also not free to refactor deep nested content. So s-expression is no better than XML/HTML/JSON/Mark when encountering deep nested content.


It is exactly the same as Mark, but I am arguing it is worse than XML/HTML for these scenarios.


without needing a specialized editor to locate the right position

Paren-matching is a commodity today in all sane programming editors. It is no longer anything you could call "specialized".


HTML has the problem of </div></div></div></div></div></div></div>. Lisp has the problem of '))))))))'. JSON has the problem of '}}}}}}'. And YAML has the problem of deep indentation.

When it comes to worse-case scenario, no one wins. :-(


for completeness: json5

  size: {width:100, height:100 /*yay*/}

  It is not clear to me how mixed content like <cite>Hello <i>world</i></cite> is expressed in Mark.
{cite "Hello" {i "world"}}


Oh, that is pretty cool. I didn't know about json5. This would also be quite nice for config files. Regular json is not nice for config files due to lack of comments.

Json5 is still not as "editable" as it looks though. You need to separate values with comma (except the last value), so there is more syntactic noise. So you get:

  {
     size: {width:100, height:100 /*yay*/},
  }
This is not an issue when the text is machine-generated (as Json typically is), but is an issue when it is edited by hand as config files often is.


YAML is nicer still for config files


Yeah, nicer to read and write. More complex to parse though. S-expressions are incredibly simple to parse. But I guess every language have a YAML-parser these days.


I do prefer to write YAML, but the complexity is so bad every parser in every language is broken:

http://matrix.yaml.io


but for config you can stick to a self-documented subset of YAML that works

so in practice you don't notice any brokenness

it's not like a web browser that has to work on a diversity of third party sources

I'm not even sure how to read that matrix anyway, and it does say: > The YAML Test Suite currently targets YAML Version 1.2. ... some frameworks implement 1.1 or 1.0 only


What's the point in using YAML if you're using an ad-hoc subset!?


Comparing {mark} to XML, it doesn't seem to support namespaces which makes the claim to be extensible somewhat dubious. How am I supposed to add custom objects without risking name clashes? Namespaces also make XML kind of fully typed without being tied to a single programming language.

Another strength of XML is support for mixed content which seems rather awkward in {mark}. The following

    <p>Some <b>bold</b> text</p>
apparently needs to written as

    {p 'Some' {b 'bold'} 'text'}
It would be more honest to mark support for mixed content as "verbose" in the feature table.

Besides, the name {mark} seems like a bad idea. How could you find relevant results when searching for {mark} using a search engine?


Current Mark design does not enforce a namespace standard. Namespace can be easily captured in Mark, e.g. {'ns:elmt' 'xml:attr':'value' ...}

XML Namespace seems to have a lot issues, thus Mark does not want to enforce something exactly following it.

Namespace in Mark, currently, is left upto the application user to define it.

We might be able to come up with a better way to define namespace.

As for the name, you can just use Mark. I use '{mark}' as an alternative name, to make it more graphical, more impressive.


Please don't.

XML Namespaces is syntactic vinegar.

Less is more.


it also "...does not have all the legacy things like DTD."

Ok sure, but does it have schematron,rng, or some sort of validation? How about transformations? Xpath?


Yes, I explicitly cut out DTD. I'm planning to developed a schema language for Mark, improving on prior art like XML Schema, JSON Schema, etc.

There's already a transformation library - Mark Template (https://github.com/henry-luo/mark-template) in beta release.

Mark at the moment supports CSS selector. I'm also thinking about a new Mark-specific selector.

Mark is very new. A lot to be done!


I'm enough of a type-safety bigot that I would have started with schema-first, as I want schemas for all the things. :-)

FWIW, I would suggest avoiding the (IMO) mistake of using your markup language for the schema.

E.g. like json-schemas where we need a "properties" map, "type": "string" (how many times do I have to type "type"), all sorts of syntactical overhead.

Personally, I think IDLs are much cleaner, as you can design a purpose-specific grammar. More work up front, and you don't get a parser for free, but again personally I think it's more pleasant in the long-run for developers to read and write.

Granted, not sure how that jives with your lisp/etc. way of thinking, but my two cents.

Good luck!


Hmm... while I'm not seeing any great advantage for {mark}, both of these appear to be 28 characters long. How is one more "verbose" than the other?


I think 'less verbose' just means 'no end tags'. Which I guess is great if you don't mind a long string of brackets at the end of your document.

While it would be cool to have something that was like JSON but could deal with complex documents, I also don't see how this is a huge improvement over XML.


Needs a "Why was Mark created?" section because this appears more 'neat' than 'useful'.


Yes, I'll do that.


> The advantage of Mark over S-expressions is that it is more modern, and can directly run in browser and node.js environments.

There seems to be a ton of s-expression parsers in npm already, that can run in browser and in node.js: https://www.npmjs.com/search?q=s-expression

Besides being able to run in js environments, what else does {mark} bring over s-expressions?


"Whoever does not understand Lisp is doomed to reinvent it"

- A wise man on the Internet once said


Defining the type of objects is a must when you want to exchange things in a strongly typed environment (Java on the server, TypeScript on the client, for ex). So +1 for {mark}. Do you handle multiple typing? (We use that a lot in Neo4J, and we think it is really neat)

Another comment: Coming from a Semantic Web background, and using N3 as the exchange format and N3.parse() as my client-side lib, I would advise to have a UID parameter to uniquely identify objects, and a refId syntax, so any parameter can reference some other objects of the data structure. That helps when you want to transmit a graph [1].

My humble 2 cents.

[1]: I would add that it is also useful when you retrieve some refIds that are not defined in the current data structure. You can then ask the server to dereference these refIds, and send another (portion of the) graph, that you can connect with the existing data structure.


FYI, an old discussion on N3: https://news.ycombinator.com/item?id=14475501


Can you clarify what is meant by "multiple typing"?


Let's say you transmit an object of type Person, that is also a Student and a MartialArtist. Your inheritance graph may define that a Student is also a Person. So not sending the Person type could be fine. But would you define a common subtype for Student+MartialArtist, just because your data serialization handles only one single type per object? Obviously no! You want to send your object with types "Student" and "MartialArtist". I.e multiple types.


If you want a better JSON, try binary-json "Concise Binary Object Representation" RFC 7049 http://cbor.io/


You either go with JSON because everything talks JSON or you go with something that doesn't have an explicit parsing step like flatbuffers or capnproto.

If you don't care about parsing CPU efficiency then gzipped JSON beats protobuffers, CBOR, etc when you care about bytes sent over the wire.

If you care about CPU efficiency then protobuffers, CBOR, etc are worse than flatbuffers or capnproto.

There is not a lot of space for a new standard between these two existing categories.


> gzipped JSON beats protobuffers, CBOR, etc when you care about bytes sent over the wire.

Gzipped JSON does not beat gzipped Protobufs in message size. Comparing gzipped JSON to uncompressed Protobuf doesn't make sense.


Thanks for suggesting CBOR. Mark shall definitely have some binary representation, like BSON or CBOR, in future.


My vote is for CSON: https://github.com/bevry/cson


I'd heard of CSON before. I didn't remember hearing about CBOR (but I had an implementation of CBOR already starred on GitHub apparently). However, given the following:

> CBOR is defined in an Internet Standards Document, RFC 7049. The format has been designed to be stable for decades.

I see no reason to go with CSON over CBOR. In fact just the opposite.


I'd suggest something like CSON for display and editing (with Link items for binary data), CBOR for transmission.


Interesting project but a little overboard with the self back-patting in the README.


> The advantage of Mark over S-expressions is that it is more modern

Is it more modern because it is newer? There is mention of how adoption is limited, but wouldn't the adoption of a completely new syntax be even more limited :-)

There is even a canonical representation using length prefixes: https://en.wikipedia.org/wiki/Canonical_S-expressions


Being 'more modern' means Mark takes a JS-first or web-first approach in its design. Whether we like it or not, JS has dominated the web. JSON is successful, partly because it takes a JS-first approach. Mark inherits this approach.


The biggest problem with these ideas is that json is already supported in the browser.

There might be a use case where your data is better represented in LDIF because it's hierarchical, but there's no built in LDIF support, so now you're importing a ton-o-javascript just to parse some new format.

At this point, we should realize json isn't meant to be human readable anyway. If you need to hunt through it, you put it into some type of json viewer so you can see the tree and query it. It's an interchange format, that's more compact than XML.

If you're shipping data between non-browser things like backend services, there are already binary formats like protobuff that have typing and can be optimized for small payloads.


Looks awfully similar to Clojure's EDN


I rarely code in Clojure, but I do use EDN and whenever I see new standards I always compare it to EDN.


Actually EDN is much better by default (sets, extensibility, etc)


Besides the nonsensical "advantage over S-expressions" statement in the README, the biggest issue I have with this is that Mark maps only to JavaScript, not to other languages where dicts/maps/hashes and arrays/slices/lists are two different things. Makes me wonder if it just has not occurred to the author that there are languages != JS.


If all other languages have no problem supporting XML, they'll have no problem supporting Mark.

It just that in languages like JS and Lua, where an object can be an map and a list at the same time, they'll have the convenience of mapping a Mark object into just one object, instead of many.


Another way to support Mark in other languages, is just to use map for both properties and contents. E.g. in Java, the key in map can be integer. Of course, the performance will not be as good as primitive array. But it can be one man's quick-and-dirty solution.

General JS arrays (not those TypedArrays) are actually maps indeed.


Thanks for several comments pointing out the unclearness of what's being "more modern".

I've updated the README to be: "The advantage of Mark over S-expressions is that it takes a more modern, JS-first approach in its design, and can be more conveniently used in web and node.js environments."

Hope it's clearer now.


Anyone who doesn't know Greenspun's 10th Rule is doomed to rhyme with it.


The nice thing about standards is that you have so many to choose from.

- Andrew Tanenbaum Computer Networks, 2nd ed., p. 254.

I think all developers go through some experience where they want to just "unify" everything because that will supposedly make it easier for them and other developers.

Overtime as you become more experienced or I guess jaded you realize that reality of a "GUT" technology platform or programming languages is a pipe dream and the effort to get people to use said new format/language/tech is more effort than what you get in return.

Anyway to be short about it I think most should just pick the best tool for the job and stop rebuilding things that don't need to. And if you do please make sure you have a plan to how you are going to replace all the old working stuff.


> I think most should just pick the best tool for the job and stop rebuilding things that don't need to.

I think you just contradicted yourself. Sometimes the best tool for the job is something new, something improved over what already exists.

I don't think the author intends to "replace all the old working stuff". But if this tool is better for new projects, then why not? I don't get all the negativity... do people here really love XML/JSON/YAML that much? There's a whole lot to complain about in all of those!!


I am not averse to new formats. I am averse to formats that try to “unify”.

And yeah I don’t have a problem with XML or JSON. Those two combined with some flatbuffer other men binary protocols cover most of my use cases... like really what’s with all the XML negativity.


"XML...Fully Typed: No."

XSDs don't count then? https://en.wikipedia.org/wiki/XML_schema


XML is only semi-structured/typed without schema. JSON and Mark are always typed.

Full formal schema definition, as in XML, is often a burden to ad-hoc scripting, which is common in JS. JSON/Mark provides sufficient type info for these adhoc usages.


JSON is not "fully typed". It just happens to have different syntax for strings, numbers, and booleans. But the application code still needs to come up with a way to distinguish between timestamps, enums, different object types, etc.

XML uses the same syntax for strings, integers, and booleans, but it has mature schema/typing tools that make it easy to apply more precise typing, which you'd want to do anyway to identify timestamps, enums, and different object types.


Pros: At least its not JSONx :D

Cons: Not seeing any advantage over JSON. If you want a type for objects just add a type field and have your code read it. Then you can use any of the existing parsers.


> It has clean syntax

You could remove every '{' with 0 loss of meaning.


Removing '{' would make Mark become YAML. But in order to be a superset of JSON, YAML adds support of JSON syntax. So '{' is back.


I was thinking this looked like YAML with {}


YAML does not have good support for mixed content.


I made my own little language called Geneva [0] for similar ideas but it acts as code and can be parsed as JSON. I also came up with a spec for doing this for HTML [1] (but no code to do this yet).

[0] https://github.com/smizell/geneva

[1] https://github.com/smizell/janeml



What he forgot to add:

Some disadvantages of Mark, comparing to JSON would be:

* Mark is insecure, JSON is secure.

* Mark is slower than JSON

Passing types directly to object.constructor is of course entirely insecure. https://github.com/rurban/Cpanel-JSON-XS/blob/master/XS.pm#L... (i.e. CVE-2015-1592)


Thanks for feedback on the security aspect. It is something that Mark definitely needs to consider carefully.

Current Mark implementation does not call arbitrary constructor during parsing. The constructors are created from scratch. But application users might want Mark to call their customer class constructor. I'm thinking passing in a callback function to Mark.parse().


What YAML did in this aspect is providing a whitelist of allowed classnames.


When used for mixed content, Mark is not necessarily always slower than JSON. Many existing JSON-based DOM solutions, like JsonML and virtual-dom, need to use several JS objects to represent one element, but Mark uses only one JS object.

However, I don't have time to do some benchmarking at the moment.


I like the idea but I don't think the benefits outweighs the negative implications it would have to implement it.

I mean JSON as a data format for api stuff is just enough as it is and you'd need some serious reason why to change from JSON and these reasons just doesn't cut it.


> The advantage of Mark over S-expressions is that it is more modern, and can directly run in browser and node.js environments.

… with the right translator to JavaScript, which also happens to be true of S-expressions.

His table is incorrect, incidentally: S-expressions support mixed content (if I understand what he means) and are also fully generic.

He doesn't have a good example of the benefits of his proposal over S-expressions: 'more modern' just means 'undiscovered bugs.'

I respect his enthusiasm and hard work, but I believe what the world needs is hard work on existing things rather than hard work reïnventing the wheel.


> Mark utilizes a novel feature in JavaScript that an plain JS object is actually array-like, it can contain both named properties and indexed properties.

Where can I read more about this feature of JavaScript?


JS objects are tables, with Arrays being a syntactic convenience with some extra properties. JS engines do heavily optimize for the array case with dense layout. https://docs.microsoft.com/en-us/scripting/javascript/object...

Calling it novel to JS is a stretch. Lua does this too and I’m sure there are other languages.


php does too


So it's basically just XML with curly braces or sexps...

And why bring YAML in the mix? Yaml isn't used for transfer I hope? Should be compared to TOML as well in that case that seems a lot better than YAML, especially for configs: https://github.com/toml-lang/toml

Or msgpack? Which also seems useful. Why not protobuf? Or just s-exps which is basically what this is.


If I understand the grammar properly, all plain-text elements have to be quoted. That makes sense for object data, isn't really markup friendly.


Great initiative. I'd say why not improve the leading format, that is, JSON ;-) ? I'm collectiong all data markup flavors and extension (HJSON, JSON 1.1, JSONX, SON, etc.) in the Awesome JSON - What's Next page @ https://github.com/json-next/awesome-json-next Cheers.


JSON has a strong selling-point that it is compatible with JS in syntax.

It is very hard to make major extension to JSON and still be compatible with JS syntax. Minor changes are possible, like in JSON5.

Once it breaks JS compatibility, I don't think people will think it is JSON next any more.


Good point. If it's not JSON next but a complete new format than you will have to compete with JSON, all JSON next formats and all other alternative formats. Good luck.


PS: Answering myself - the leading data format might actually be the humble Comma-separated values (CSV) format! Love it really :-) Let's make it better and improve it - let's welcome csv,version: 1.1 -> https://csvalues.github.io


Isn't this what the author is trying to do? Mark seems to be a superset of JSON.


Looks like Mark is more in the tradition of YAML, that is, YAML is a superset of JSON too, e.g. it wants to be its own format (not just a humble extension). For example, my better JSON format flavor is called JSON v1.1 to make it clear its just humble JSON, but improved :-).


Reinventing LISP is a nice thing.

I like this project.


I like this syntax also for this reason, though I have to add that the hell of parenthesis would prohibit it to be widely adopted.


Is that really worse than the "hell" of angle-brackets in XML/HTML? Because those are pretty widely adopted.


> Mark utilizes a novel feature in JavaScript that an plain JS object is actually array-like, it can contain both named properties and indexed properties.

wouldn't that make introspecting objects very annoying?


In Mark implementation, care has been taken so that indexed contents are not enumerable. So e.g. when you run a for ... in loop on a Mark object, you'll only see properties, not the contents.

This is one of the difference between Mark object and an array. Array contents are enumerable by default.


Doesn’t have a native date/time, so no advance over JSON really


You might be interested in Zish:

https://github.com/tlocke/zish

It's a data serialization format with timestamp, bytes and decimal data types.


Heh, check out https://www.obj-sys.com/asn1tutorial/node15.html

ASN.1 dates from 1984. So much for Mark being “modern”!


Date can be represented as an object {date '1985-04-12'} in Mark.

Not perfect, but we don't have to introduce a lot of built-in types. The syntax can be kept simple.


What would you use for schema validation? Obviously http://json-schema.org/ won't cut it.


A new Mark-specific schema will need to be developed, based on the prior art or XML Schema, JSON Scheam, etc.


This should probably be titled "Show HN:".


I'll do that next time.


I really like protocol buffers. If you've never used it, they're really worth checking out.


Not sure how many more xkcd 927s this thread will have but personally what bugs me the most is the thought that being slightly more versatile than JSON means that Mark is worth the trouble of adopting. Data structures are well represented by JSON, markup is mostly well represented with XML. I rarely really want to mix the two. Additionally, this doesn't feel like a language I would want to write documents in any more than I do XML.

I think we're fine with separate languages for data and markup.


Data and markup/document separation might not always be that clear cut.

The latest trend in CMS systems, piloted by the latest content editors, like Quill, Draft.js, ProseMirror, Slate.js is to use JSON to present the content, instead of using HTML or Markdown. Using object notation, gives rise to cleaner API and data model.

So the wall between data and markup, JSON | XML may collapse one day.


I wish the start of comment can still be made less verbose-y “{!-- comment --}”


Comment can just be {!comment} or {#comment} in Mark, if you like.

In the README example, I deliberately made it resemble HTML comment, so as to make it easier for people to correlate.


huh. when i read the spec, i thought comments were c like

    begin_sl_comment ::= '//'
    begin_ml_comment ::= '/*'
    end_ml_comment   ::= '*/'
maybe that ebnf is out of date.


In Mark, there are two types of 'comment'. // comment and /* comment */ are lexical comments, which are stripped during parsing.

Then there's Mark pragma, like {!pragma}, which are preserved in the data model. HTML comment is supported as Mark pragma, not Mark comment.


Oh that's a really interesting distinction.


I would strongly suggest a renaming. Mark is so generic that you can already tell you want {mark}, which has a pronouncability issue, and the name is too similar to the big player that is markdown (like a bright star making it impossible to see a dimmer one right next to it), making "mark" sound to me more likely to be a markdown renderer than a JSON replacement. Given that both mark and markdown are markup languages of various sorts, the names are just too close to each other, i.e., it would be different if markdown was what it was and you were proposing "mark" as the name of a library that lets you put marks based on geographical criteria on a map or something totally different.

Mark reserving all number-only keys is statistically likely to become a problem as a project grows larger. I'd suggest finding a different way to get out-of-band data to be fully out-of-band, rather than trying to carve out a chunk of keys.

Somewhat similarly, defining a "pragma" as "something surrounded with braces that isn't a legal object" means that if you ever want to change the definition of an object in the future, you can't, because you will turn things that used to be pragmas into objects, or less likely (because you'll try to avoid this going forward) but still possible, vice versa. You need to concretely specify what a pragma is unambiguously, in a way that you can evolve either without affecting the other. It also means errors in generation become legal pragmas instead of errors, which will cause surprises, and on the flip side, errors in parsing objects can turn them into legal pragmas rather than parse errors.

I would reserve saying "Mark is a superset of JSON" for the case when you really can feed any JSON to a Mark parser and get a (roughly) equivalent structure. Alternatively, go through the documentation with a text find option and make sure every time you say "superset" it is qualified as a "feature superset". Especially in light of "(Mark does not support hexadecimal integer. This is a feature that Mark omits from JSON5.)" The word superset should either be qualified every time or mean a strict superset; "Mark is a nearly-feature-superset of JSON" would be more accurate.

In general, a review of http://seriot.ch/parsing_json.php may be appropriate; mark addresses only one serious issue, and the other fixes are ultimately fairly superficial (the trailing comma issue, for instance, is almost never a problem for me because ninety-nine-point-I-don't-know-how-many-nines percent of the time, JSON is a thing my tools generate; the cases where that is a serious issue have generally already moved on to another format like YAML, same for comments). Also, per my comment about parse errors turning objects into pragmas, if you expect this to become a big cross-language standard it is worth reviewing a snapshot of the variability in JSON parsers, which is a simpler format. A more complicated format should expect to see even more subtle divergences in its multiple implementations and things like "misreading an object as a pragma" to become even more likely at scale.


Hexadecimal integer is not part of JSON. It is a new syntax introduced in JSON5 (not JSON version 5). I don't think this feature is that useful, thus does not incorporate it into Mark.


(I can't downvote direct replies, so it wasn't me.) I was not suggesting that you incorporate it. I was suggesting modifying your marketing copy to incorporate the fact that it will not be a superset anymore. Superset is a word we should guard and not let it become "sort of superset-ish, maybe, mostly", but should mean superset. If you don't have every feature, you should not say it's a superset. Since not only is there nothing wrong with feature elimination, but when done well is a downright good thing, it's not like this is some sort of major problem for the marketing or something; just say you used some taste in what you brought over.

And again let me emphasize, since you seem to be saying it again in some other replies, that "{mark} is a superset of JSON", if you mean that syntactically (as opposed to features wise), MUST mean that every valid JSON document will produce a valid {mark} parse. Nothing less than that qualifies it as a superset. Given that you reserve numeric keys I don't think that is the case; whether the grammar is a superset is harder to determine so I haven't tried. That would be something best served by taking a very complete JSON parser test suite from someone and validating that all their corner cases that are supposed to parse in JSON, parse in {mark}. Based on my own experience in the world of parsing, the odds of you passing that first try are very low; if you manage, major kudos to you as that would be a very difficult test. (Though I would imagine that since the grammar largely came from JSON a lot of the surprises would be the ways in which your parser turns out to deviate from the grammar rather than grammar errors.)


Looks like a worse HAML...


tangential: I love json-5 which is like json, but allows comments and doesn't need to quote the keys, so it's just 'dumped js'.


nope, data shouldn't include structural information (except maybe in the header) and markup shouldn't include style information.


Not a fan. I'd rather JS adopted EDN instead.


EDN FTW.


waiting for mark 2. No... seriously, it's enough already https://xkcd.com/927/


I don't know if it's useful to have a universal format; each format (YAML, XML) is suited for for a specific purpose (human readable, completeness).

The headline reminds me of: https://xkcd.com/927/


Maybe we should stop when we reach the YA prefix.


Came here just to make the XKCD reference. Good stuff.


Thanks for the XKCD reference. May be I can use it for my next Mark blog head graphics. :-)


The doesn't have a grammar? When will people realize this is completely unacceptable.


Grammar details in section 4 of https://mark.js.org/mark-syntax.html


Henry, did you use Gunther Rademacher's RR diagram generator?

Best of luck with {mark}.


I used grammarkit (https://github.com/dundalek/GrammKit), which further uses railroad-diagrams (https://github.com/tabatkins/railroad-diagrams) library to generate the RR diagrams. I don't know if they are related to Gunther Rademacher's RR diagram generator.


Cool. I'll check those out.


Oh good




Consider applying for YC's first-ever Fall batch! Applications are open till Aug 27.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: