Mark – A simple and unified notation for both object and markup data

TeMPOraL · on Feb 5, 2018

So this is literally Lisp, just with curly braces instead of parenthesis :).

I don't understand this part of the readme:

> The advantage of Mark over S-expressions is that it is more modern, and can directly run in browser and node.js environments.

Does this mean I'm so out of date with JS that this syntax is actually legit JS? Or does Mark run its own parsers, at which point it's just like sexps, except it uses the "more modern" curly brace instead of "less modern" parenthesis?

dustingetz · on Feb 5, 2018

EDN, in addition to being simpler, also has an optimized transport abstraction, transit, which serializes to application/transit+msgpack (binary) or application/transit+json (often vastly faster deserialization than messagepack because which hits the language native JSON parsers for performance in web browser, python, etc). It surprised me how big a deal hitting native json parsers is, EDN was at the top of our profiler for 100kb payloads but transit+json is zippy. The abstraction also handles value de-duplication and has a human-readable verbose writer.

http://blog.cognitect.com/blog/2014/7/22/transit http://cognitect.github.io/transit-tour/

cool project though, good to see s-expressions become more popular

dragonwriter · on Feb 5, 2018

> So this is literally Lisp

No, it's close to S-expressions, but it's not a programming language like Lisp.

> just with curly braces instead of parenthesis

Well, and more syntax than S-expressions: it's got both objects and arrays as fundamental structures instead of just lists, and it has commas as noise characters.

zeveb · on Feb 5, 2018

> Well, and more syntax than S-expressions: it's got both objects and arrays as fundamental structures instead of just lists

Well, Common Lisp has both of those as fundamental atoms:

    #S(foo :bar 3)

    #(1 2 3)

The former creates an instance of a FOO struct with its BAR slot set to 3; the latter is a 3-item array.

nanny · on Feb 5, 2018

And then CLOS objects, hash tables, regular expressions, etc. can be added with reader macros.

TeMPOraL · on Feb 5, 2018

It's cool and I love it, but it's irrelevant in the context of a universal data exchange language. In such case, you'd want to have those primitives defined in the exchange language spec itself - even if you'd end up implementing them as reader macros in CL (which I don't recommend - reader macros turn your READs into EVALs, which you obviously shouldn't do on untrusted input).

zeveb · on Feb 5, 2018

> you'd want to have those primitives defined in the exchange language spec itself

I agree with this: certain things need to be in the spec.

> even if you'd end up implementing them as reader macros in CL (which I don't recommend - reader macros turn your READs into EVALs, which you obviously shouldn't do on untrusted input).

This I don't agree with, because technically #\( & #\" are reader macros … they're just very well-defined reader macros. Presumably a spec which defined hash tables, regular expressions or whatever would define them as well as the Lisp spec defines lists and strings (and if not, well — it's a bad spec!).

TeMPOraL · on Feb 5, 2018

> I agree with this: certain things need to be in the spec.

That was my main point. In the second part of the comment I didn't mean to discourage use of reader macros - it was more of an aside that the general facility of CL-style reader macros literally makes READ "shell out" into EVAL, so you need to (diligently) disable it for untrusted input (or reimplement a limited READ by hand). So we can't say "oh, but S-exps in Common Lisp can have anything through reader macros". Presumably if hash table literals were specified as a part of basic syntax, we could depend on it being standard and part of the safe subset of READ's duties; as it is however, we can't depend on it for arbitrary inputs.

TeMPOraL · on Feb 5, 2018

Yeah, didn't mean it that literally.

> it's got both objects and arrays as fundamental structures instead of just lists

I'm bit sad that various lisps never standardized on a format for this; had they, then maybe we would have S-expressions as a popular data interchange format.

nabla9 · on Feb 5, 2018

They did. It's called Common Lisp.

TeMPOraL · on Feb 5, 2018

Common Lisp doesn't have a standard representation of hash tables, unfortunately. Also, CL didn't clean up the Lisp space completely; right now, there are Schemes, there's Clojure, LFE, Hy, and bunch of other niche Lisps, each with their own idiosyncrasies around syntax.

zeveb · on Feb 5, 2018

> Common Lisp doesn't have a standard representation of hash tables, unfortunately.

Ah, but you mentioned objects & arrays, not hash tables grin. Agreed that hash tables would have been nice, although that does then get into issues such as canonical representations (which matter e.g. for hashing).

> Also, CL didn't clean up the Lisp space completely; right now, there are Schemes, there's Clojure, LFE, Hy, and bunch of other niche Lisps, each with their own idiosyncrasies around syntax.

It would be nice if folks who want to use Lisp would use Common Lisp rather than reïnventing various forms of more-or-less round wheel. It's a remarkably well-engineered language (not perfect, of course: argument order, upcasing, pathnames & environments all leap to mind as problematic areas), and so far as I can tell quite a bit better than any of the alternatives.

In particular, it'd be really nice to see people using Schemes for serious engineering work to use Lisp instead. It's just not well-suited to writing large systems, except by grafting on an ad-hoc, informally-specified, potentially bug-ridden subset of Lisp.

masklinn · on Feb 5, 2018

> Well, and more syntax than S-expressions: it's got both objects and arrays as fundamental structures instead of just lists

Which can easily be added to sexps (see EDN).

kazinator · on Feb 5, 2018

The S-expressions of mainstream Lisp dialects have objects and arrays. E.g. Common Lisp:

  ;; structure:
  #s(type :slot1 value1 :slot2 value2 ...)

  ;; vector:
  #(1 2 3 4)

anon1253 · on Feb 5, 2018

Perhaps more like EDN since it doesn't have a runtime. But yeah, it's s-exps with curly braces. Which in my opinion, look worse than round parentheses … but that's opinion

https://github.com/edn-format/edn https://learnxinyminutes.com/docs/edn/

dunham · on Feb 5, 2018

I really like edn. I wish it were more widely used.

It hits a sweet spot for me between yaml and json. Yaml is easy to type/read, but I feel it's a bit too complex on the parsing side. And json is a pain to type, so I'm reluctant to use it for human entered configuration files.

maxaf · on Feb 5, 2018

"More modern" is a euphemism for "not made by someone who had a gray beard in 1975". It's just another form of ageism.

masklinn · on Feb 5, 2018

Even by that criteria, there's EDN, which is basically sexp with more built-in types (and extensibility).

henryluo · on Feb 5, 2018

Being 'more modern' means Mark takes a JS-first or web-first approach in its design. Whether we like it or not, JS has dominated the web. JSON is successful, partly because it takes a JS-first approach. Mark inherits this approach.

nfoz · on Feb 5, 2018

The only ageist thing here is that you assumed the author's age, or that it had anything to do with the claim of modernity.

maxaf · on Feb 5, 2018

Your statement is factually incorrect.

orf · on Feb 5, 2018

It's sad that you feel that improving older, outdated technologies is a form of ageism.

maxaf · on Feb 5, 2018

It is sad that so much junior talent is wasted on attempts at improvement via blind reiteration. This is one possible consequence of a mentorship vacuum: bright minds look for challenges, even ones that have been adequately overcome long ago. Imagine the good that would come of directing such energy with a clear purpose.

hobofan · on Feb 5, 2018

There is nothing sad in that. Yes, it might be non-optimal, but by trying out your own approach you will find out its limitations first-hand and understand the problem and reasoning behind alternate approaches much better.

Junior talent doesn't become senior talent by just doing the right things, but by doing the wrong things and learning from them.

oorza · on Feb 5, 2018

Junior talent becomes senior talent by virtue of acquiring experience. Doing the wrong things and learning is one way of acquiring experience, but being directed by a mentor is a much more efficient and productive way of acquiring that seniority. A lot of lessons learned from reinventing wheels can be distilled down into a conversation or a pair programming session, but in the absence of senior leadership, it becomes a week long hacking session on a library that will ultimately rot in Git forever because it's foundationally unsound.

tzahola · on Feb 5, 2018

It becomes interesting when later said library is picked up and put into production by similar novices.

TeMPOraL · on Feb 5, 2018

Yup. There really is nothing wrong with reinventing the wheel for educational purposes. The problem starts when that reinvented wheel gains a good README/webpage, and gets picked up by an ecosystem driven by novices.

tzahola · on Feb 5, 2018

>Yes, it might be non-optimal, but by trying out your own approach you will find out its limitations first-hand and understand the problem and reasoning behind alternate approaches much better.

Hmm... I think that’s how PHP was made.

TeMPOraL · on Feb 5, 2018

The problem here is that "modern" is usually a justification for regression, relative to older technologies, usually done by people who never bothered to look at the old technologies before declaring them obsolete.

Momquist · on Feb 5, 2018

Reminds me of a recent talk at FOSDEM based on this premise.

The circuit less traveled. Investigating some alternate histories of computing: https://www.youtube.com/watch?v=jlERSVSDl7Y

TeMPOraL · on Feb 6, 2018

Wow, that was a surprisingly insightful talk; thanks for linking!

I'm intrigued in particular by the talk's conclusion about (I guess, again) disappearing distinction between volatile and non-volatile storage. To date, I've been a vocal advocate of hierarchical filesystems (not UNIX, but just as a unit of user-facing abstraction). The talk sent me on the way of reflecting whether I'm not just supporting another historical "wrong path". Lots of more thinking in front of me here. So thanks.

masklinn · on Feb 5, 2018

"Mark" does not improve on anything.

davidzweig · on Feb 5, 2018

I would have rather said it's like QML language (which is used as the basis for Qt QML UI language). Can't find the link to the reference for plain QML. QML is like JSON, but with typed objects, and objects can have children in turn, so you can create tree structures from objects. It's actually very nice, I wish it had parsers for more languages.

Coryodaniel · on Feb 5, 2018

It runs its own parser. Which means mark is a big silly ol string until you parse it.

var obj = Mark.parse(`{div {span 'Hello World!' }}`);

hepta · on Feb 5, 2018

Doesn't seem like valid JS, you wouldn't need mark.js if that were the case.

afandian · on Feb 5, 2018

Yes you would. JSON is valid JS, and executing JSON in a browser is a recipe for disaster.

minitech · on Feb 5, 2018

JSON isn’t valid JS – its representation of strings allows U+2028 and U+2029 to appear unescaped, but JavaScript string literals don’t.

Not sure how else executing (valid) JSON in a browser would be a recipe for disaster? `eval` was the standard way to parse JSON from trusted sources for a long time.

Bahamut · on Feb 5, 2018

> Does this mean I'm so out of date with JS that this syntax is actually legit JS? Or does Mark run its own parsers, at which point it's just like sexps, except it uses the "more modern" curly brace instead of "less modern" parenthesis?

It has its own parsing and stringify library it looks like: https://github.com/henry-luo/mark#markjs

henryluo · on Feb 5, 2018

Firstly, I highly respect Lisp personally, and I have no intention of downplaying it. As some have seen, the Lisp spirit is actually in the design of Mark.

Secondly, to clarify what I mean by 'being more modern'. Of course, it does not mean changing from () to <> or {}, will make it more modern or something better.

Being 'more modern' means Mark takes a JS-first or web-first approach in its design. Whether we like it or not, JS has dominated the web. JSON is successful, partly because it takes a JS-first approach. Mark inherits this approach.

Being JS-first, means there'll be least adoption barrier in web.

Being JS-first, of course does not mean JS-only. Mark is designed to be generic and used by other programming languages like JSON.

k__ · on Feb 5, 2018

I don't understand it either.

Looks like s-expressions to me and it isn't legit JS.

KirinDave · on Feb 5, 2018

It's literally common lisp down to the use of pre-expressions with no binding as pragmas. It's like lisp someone dropped on a char and now all the parenthesis have a funny bump.

But it's not literally lisp in the sense that the meta-syntactic stuff isn't there.

raiflip · on Feb 5, 2018

Genuine question, my impression is that Lisp is dynamically typed, but this uses type declarations. Would that make it different than Lisp?

kmill · on Feb 5, 2018

Descriptions of typing are about as unevenly used as descriptions like "pass by reference." (So, practically useless.)

In Lisps:

- values have types (bool, symbol, number, list, array, structs, functions, ...)

- variables have by default one type: union of all the above

- in Common Lisp, you can restrict the types of values allowed in a variable

> but this uses type declarations

I must have missed this. If it's not just what struct-like entities are allowed in the markup, where did you see that?

bjoli · on Feb 5, 2018

Lisp has strong typing, so 1 is 1, not "1" or #\1, which, unless mark has a built in way of annotating types doesn't give this any advantages over s-expressions.

kazinator · on Feb 5, 2018

Lisp expressions also don't have any annoying type clutter that you have to have at every node in the syntax. Like (1 "1") is just a list of two things; we don't need the word "list" anywhere.

aylmao · on Feb 6, 2018

It's a superset of JSON, which is valuable.

tonyg · on Feb 5, 2018

One major weakness of JSON is lack of a corresponding "infoset"; that is, an equivalence predicate. When are two JSON blobs "the same"? There's no sign of anything like this here.

Another is the lack of support for binary data. There's no sign of support for binary data here.

Finally, there's this claim:

> The advantage of Mark over S-expressions is that it is more modern, and can directly run in browser and node.js environments.

Is it more modern? I don't think I care.

Can it directly run in browser and node.js environments? What does that mean? It seems to need a parser. But then, S-expression parsers certainly directly run in browser and node.js environments.

---

IMO, SPKI SEXPs are much more sensible than this design and many, many other designs:

https://people.csail.mit.edu/rivest/Sexp.txt

zeveb · on Feb 5, 2018

> IMO, SPKI SEXPs are much more sensible than this design and many, many other designs

Yes, yes, ten thousand times yes! I really don't understand why, over two decades hence, the world has stuck with XPKI & ASN.1, and has invented XML & JSON, when SPKI solved the PKI problem for good & canonical S-expressions solved the flexible- and human-readable–data-exchange problems for good.

Groxx · on Feb 5, 2018

Since you both seem to know the spec: how would you encode key/value pairs? Or would you have to have a list of nested lists, like

    (my_dict (key value) (key value) (key value))

Un-ordered qualities for data can be useful (e.g. they allow you to reorder data to stream "important" stuff first), but I don't see it anywhere in here.

zeveb · on Feb 5, 2018

With canonical S-expressions, unordered sets are a problem because part of the point is to be able to have a single canonical sequence of bytes, which can be hashed or compared bytewise for equality.

In general, I'd resist specifying data as arbitrary key-value pairs, but if I decided that I indeed needed them, I'd do exactly as you suggest — and I'd mandate that the be sorted lexicographically by their keys.

olavk · on Feb 5, 2018

Each existing format have advantages and disadvantages for particular purposes.

Benefit of HTML: You can actually write it by hand and easily see where each element begins and ends, even when the document is longer than a screenfull. Mark has the "}}}}} problem with larger documents, so it is not as suitable for human-written markup.

It is not clear to me how mixed content like <cite>Hello <i>world</i></cite> is expressed in Mark. I expect it will be pretty convoluted.

Benefits of JSON: Maps directly to simple data structures: List, dictionaries and simple values. Similar data structures are supported in almost any language. Mark has "type names" and anonymous text content which complicates serialization and serialization a lot, and is sure to give interoperability (and perhaps security) problems.

So - worst of both worlds? Instead of tying to be an overall worse alternative to all the formats, they should rather focus on a specific niche where Mark can be a better alternative.

Take configuration files, for example. They don't have large amount of textual content like HTML, and they don't need to be transferred between disparate systems.

   {size width:100 height:100}

vs

   <size width="100" height="100"></size>

vs

  {"size": {"width":100,"height":100 }}

In this case, the Mark syntax is simpler and cleaner. Mixed content is not needed, which would make the format simpler. Yeah it is basically the same as S-expressions, but that is not a bad thing.

phoe-krk · on Feb 5, 2018

Mark has the "}}}}} problem with larger documents, so it is not as suitable for human-written markup.

And HTML has a problem of </span></li></ul></div></div></div></div></body></html>, all spread over nine different lines, one tag per line.

Take a look at https://github.com/keithj/alexandria/blob/master/definitions... which is Lisp code styled in a standard manner. I don't see any problem there.

olavk · on Feb 5, 2018

The HTML example is much better than "}}}}} though, since you can e.g. add a new item at the end of the list without needing a specialized editor to locate the right position. This is one of the reasons for the redundancy in repeating the tag name in the end-tag. In theory Lisp should have the same problem, but usually code (hopefully) rarely have nested blocks larger than a screen, so it is not a big issue in practice, even if )))) looks ugly. Bottom line is code have a different structure than typical hypertext documents, so just because a notation is suitable for one does not mean it is suitable for the other.

henryluo · on Feb 6, 2018

But when s-expression is used to represent document, not a program, then it is also not free to refactor deep nested content. So s-expression is no better than XML/HTML/JSON/Mark when encountering deep nested content.

olavk · on Feb 9, 2018

It is exactly the same as Mark, but I am arguing it is worse than XML/HTML for these scenarios.

phoe-krk · on Feb 8, 2018

without needing a specialized editor to locate the right position

Paren-matching is a commodity today in all sane programming editors. It is no longer anything you could call "specialized".

henryluo · on Feb 5, 2018

HTML has the problem of </div></div></div></div></div></div></div>. Lisp has the problem of '))))))))'. JSON has the problem of '}}}}}}'. And YAML has the problem of deep indentation.

When it comes to worse-case scenario, no one wins. :-(

singularity2001 · on Feb 5, 2018

for completeness: json5

  size: {width:100, height:100 /*yay*/}

  It is not clear to me how mixed content like <cite>Hello <i>world</i></cite> is expressed in Mark.

{cite "Hello" {i "world"}}

olavk · on Feb 5, 2018

Oh, that is pretty cool. I didn't know about json5. This would also be quite nice for config files. Regular json is not nice for config files due to lack of comments.

Json5 is still not as "editable" as it looks though. You need to separate values with comma (except the last value), so there is more syntactic noise. So you get:

  {
     size: {width:100, height:100 /*yay*/},
  }

This is not an issue when the text is machine-generated (as Json typically is), but is an issue when it is edited by hand as config files often is.

anentropic · on Feb 5, 2018

YAML is nicer still for config files

olavk · on Feb 5, 2018

Yeah, nicer to read and write. More complex to parse though. S-expressions are incredibly simple to parse. But I guess every language have a YAML-parser these days.

akvadrako · on Feb 5, 2018

I do prefer to write YAML, but the complexity is so bad every parser in every language is broken:

http://matrix.yaml.io

anentropic · on Feb 6, 2018

but for config you can stick to a self-documented subset of YAML that works

so in practice you don't notice any brokenness

it's not like a web browser that has to work on a diversity of third party sources

I'm not even sure how to read that matrix anyway, and it does say: > The YAML Test Suite currently targets YAML Version 1.2. ... some frameworks implement 1.1 or 1.0 only

akvadrako · on Feb 6, 2018

What's the point in using YAML if you're using an ad-hoc subset!?

nwellnhof · on Feb 5, 2018

Comparing {mark} to XML, it doesn't seem to support namespaces which makes the claim to be extensible somewhat dubious. How am I supposed to add custom objects without risking name clashes? Namespaces also make XML kind of fully typed without being tied to a single programming language.

Another strength of XML is support for mixed content which seems rather awkward in {mark}. The following

    <p>Some <b>bold</b> text</p>

apparently needs to written as

    {p 'Some' {b 'bold'} 'text'}

It would be more honest to mark support for mixed content as "verbose" in the feature table.

Besides, the name {mark} seems like a bad idea. How could you find relevant results when searching for {mark} using a search engine?

henryluo · on Feb 5, 2018

Current Mark design does not enforce a namespace standard. Namespace can be easily captured in Mark, e.g. {'ns:elmt' 'xml:attr':'value' ...}

XML Namespace seems to have a lot issues, thus Mark does not want to enforce something exactly following it.

Namespace in Mark, currently, is left upto the application user to define it.

We might be able to come up with a better way to define namespace.

As for the name, you can just use Mark. I use '{mark}' as an alternative name, to make it more graphical, more impressive.

specialist · on Feb 5, 2018

Please don't.

XML Namespaces is syntactic vinegar.

Less is more.

mgr86 · on Feb 5, 2018

it also "...does not have all the legacy things like DTD."

Ok sure, but does it have schematron,rng, or some sort of validation? How about transformations? Xpath?

henryluo · on Feb 5, 2018

Yes, I explicitly cut out DTD. I'm planning to developed a schema language for Mark, improving on prior art like XML Schema, JSON Schema, etc.

There's already a transformation library - Mark Template (https://github.com/henry-luo/mark-template) in beta release.

Mark at the moment supports CSS selector. I'm also thinking about a new Mark-specific selector.

Mark is very new. A lot to be done!

stephen · on Feb 5, 2018

I'm enough of a type-safety bigot that I would have started with schema-first, as I want schemas for all the things. :-)

FWIW, I would suggest avoiding the (IMO) mistake of using your markup language for the schema.

E.g. like json-schemas where we need a "properties" map, "type": "string" (how many times do I have to type "type"), all sorts of syntactical overhead.

Personally, I think IDLs are much cleaner, as you can design a purpose-specific grammar. More work up front, and you don't get a parser for free, but again personally I think it's more pleasant in the long-run for developers to read and write.

Granted, not sure how that jives with your lisp/etc. way of thinking, but my two cents.

Good luck!

Turing_Machine · on Feb 5, 2018

Hmm... while I'm not seeing any great advantage for {mark}, both of these appear to be 28 characters long. How is one more "verbose" than the other?

Finnucane · on Feb 5, 2018

I think 'less verbose' just means 'no end tags'. Which I guess is great if you don't mind a long string of brackets at the end of your document.

While it would be cool to have something that was like JSON but could deal with complex documents, I also don't see how this is a huge improvement over XML.

castis · on Feb 5, 2018

Needs a "Why was Mark created?" section because this appears more 'neat' than 'useful'.

henryluo · on Feb 5, 2018

Yes, I'll do that.

jasonjayr · on Feb 5, 2018

> The advantage of Mark over S-expressions is that it is more modern, and can directly run in browser and node.js environments.

There seems to be a ton of s-expression parsers in npm already, that can run in browser and in node.js: https://www.npmjs.com/search?q=s-expression

Besides being able to run in js environments, what else does {mark} bring over s-expressions?

voidfunc · on Feb 5, 2018

"Whoever does not understand Lisp is doomed to reinvent it"

- A wise man on the Internet once said

lolive · on Feb 5, 2018

Defining the type of objects is a must when you want to exchange things in a strongly typed environment (Java on the server, TypeScript on the client, for ex). So +1 for {mark}. Do you handle multiple typing? (We use that a lot in Neo4J, and we think it is really neat)

Another comment: Coming from a Semantic Web background, and using N3 as the exchange format and N3.parse() as my client-side lib, I would advise to have a UID parameter to uniquely identify objects, and a refId syntax, so any parameter can reference some other objects of the data structure. That helps when you want to transmit a graph [1].

My humble 2 cents.

[1]: I would add that it is also useful when you retrieve some refIds that are not defined in the current data structure. You can then ask the server to dereference these refIds, and send another (portion of the) graph, that you can connect with the existing data structure.

lolive · on Feb 5, 2018

FYI, an old discussion on N3: https://news.ycombinator.com/item?id=14475501

jsd1982 · on Feb 5, 2018

Can you clarify what is meant by "multiple typing"?

lolive · on Feb 5, 2018

Let's say you transmit an object of type Person, that is also a Student and a MartialArtist. Your inheritance graph may define that a Student is also a Person. So not sending the Person type could be fine. But would you define a common subtype for Student+MartialArtist, just because your data serialization handles only one single type per object? Obviously no! You want to send your object with types "Student" and "MartialArtist". I.e multiple types.

brianolson · on Feb 5, 2018

If you want a better JSON, try binary-json "Concise Binary Object Representation" RFC 7049 http://cbor.io/

imtringued · on Feb 5, 2018

You either go with JSON because everything talks JSON or you go with something that doesn't have an explicit parsing step like flatbuffers or capnproto.

If you don't care about parsing CPU efficiency then gzipped JSON beats protobuffers, CBOR, etc when you care about bytes sent over the wire.

If you care about CPU efficiency then protobuffers, CBOR, etc are worse than flatbuffers or capnproto.

There is not a lot of space for a new standard between these two existing categories.

kentonv · on Feb 5, 2018

> gzipped JSON beats protobuffers, CBOR, etc when you care about bytes sent over the wire.

Gzipped JSON does not beat gzipped Protobufs in message size. Comparing gzipped JSON to uncompressed Protobuf doesn't make sense.

henryluo · on Feb 5, 2018

Thanks for suggesting CBOR. Mark shall definitely have some binary representation, like BSON or CBOR, in future.

nailer · on Feb 5, 2018

My vote is for CSON: https://github.com/bevry/cson

bringtheaction · on Feb 5, 2018

I'd heard of CSON before. I didn't remember hearing about CBOR (but I had an implementation of CBOR already starred on GitHub apparently). However, given the following:

> CBOR is defined in an Internet Standards Document, RFC 7049. The format has been designed to be stable for decades.

I see no reason to go with CSON over CBOR. In fact just the opposite.

nailer · on Feb 5, 2018

I'd suggest something like CSON for display and editing (with Link items for binary data), CBOR for transmission.

rco8786 · on Feb 5, 2018

Interesting project but a little overboard with the self back-patting in the README.

rdtsc · on Feb 5, 2018

> The advantage of Mark over S-expressions is that it is more modern

Is it more modern because it is newer? There is mention of how adoption is limited, but wouldn't the adoption of a completely new syntax be even more limited :-)

There is even a canonical representation using length prefixes: https://en.wikipedia.org/wiki/Canonical_S-expressions

henryluo · on Feb 5, 2018

Being 'more modern' means Mark takes a JS-first or web-first approach in its design. Whether we like it or not, JS has dominated the web. JSON is successful, partly because it takes a JS-first approach. Mark inherits this approach.

djsumdog · on Feb 5, 2018

The biggest problem with these ideas is that json is already supported in the browser.

There might be a use case where your data is better represented in LDIF because it's hierarchical, but there's no built in LDIF support, so now you're importing a ton-o-javascript just to parse some new format.

At this point, we should realize json isn't meant to be human readable anyway. If you need to hunt through it, you put it into some type of json viewer so you can see the tree and query it. It's an interchange format, that's more compact than XML.

If you're shipping data between non-browser things like backend services, there are already binary formats like protobuff that have typing and can be optimized for small payloads.

chowes · on Feb 5, 2018

Looks awfully similar to Clojure's EDN

icefox · on Feb 5, 2018

I rarely code in Clojure, but I do use EDN and whenever I see new standards I always compare it to EDN.

zcam · on Feb 5, 2018

Actually EDN is much better by default (sets, extensibility, etc)

majewsky · on Feb 5, 2018

Besides the nonsensical "advantage over S-expressions" statement in the README, the biggest issue I have with this is that Mark maps only to JavaScript, not to other languages where dicts/maps/hashes and arrays/slices/lists are two different things. Makes me wonder if it just has not occurred to the author that there are languages != JS.

henryluo · on Feb 5, 2018

If all other languages have no problem supporting XML, they'll have no problem supporting Mark.

It just that in languages like JS and Lua, where an object can be an map and a list at the same time, they'll have the convenience of mapping a Mark object into just one object, instead of many.

henryluo · on Feb 5, 2018

Another way to support Mark in other languages, is just to use map for both properties and contents. E.g. in Java, the key in map can be integer. Of course, the performance will not be as good as primitive array. But it can be one man's quick-and-dirty solution.

General JS arrays (not those TypedArrays) are actually maps indeed.

henryluo · on Feb 5, 2018

Thanks for several comments pointing out the unclearness of what's being "more modern".

I've updated the README to be: "The advantage of Mark over S-expressions is that it takes a more modern, JS-first approach in its design, and can be more conveniently used in web and node.js environments."

Hope it's clearer now.

patrickmay · on Feb 5, 2018

Anyone who doesn't know Greenspun's 10th Rule is doomed to rhyme with it.

agentgt · on Feb 5, 2018

The nice thing about standards is that you have so many to choose from.

- Andrew Tanenbaum Computer Networks, 2nd ed., p. 254.

I think all developers go through some experience where they want to just "unify" everything because that will supposedly make it easier for them and other developers.

Overtime as you become more experienced or I guess jaded you realize that reality of a "GUT" technology platform or programming languages is a pipe dream and the effort to get people to use said new format/language/tech is more effort than what you get in return.

Anyway to be short about it I think most should just pick the best tool for the job and stop rebuilding things that don't need to. And if you do please make sure you have a plan to how you are going to replace all the old working stuff.

nfoz · on Feb 5, 2018

> I think most should just pick the best tool for the job and stop rebuilding things that don't need to.

I think you just contradicted yourself. Sometimes the best tool for the job is something new, something improved over what already exists.

I don't think the author intends to "replace all the old working stuff". But if this tool is better for new projects, then why not? I don't get all the negativity... do people here really love XML/JSON/YAML that much? There's a whole lot to complain about in all of those!!

agentgt · on Feb 5, 2018

I am not averse to new formats. I am averse to formats that try to “unify”.

And yeah I don’t have a problem with XML or JSON. Those two combined with some flatbuffer other men binary protocols cover most of my use cases... like really what’s with all the XML negativity.

ZenoArrow · on Feb 5, 2018

"XML...Fully Typed: No."

XSDs don't count then? https://en.wikipedia.org/wiki/XML_schema

henryluo · on Feb 5, 2018

XML is only semi-structured/typed without schema. JSON and Mark are always typed.

Full formal schema definition, as in XML, is often a burden to ad-hoc scripting, which is common in JS. JSON/Mark provides sufficient type info for these adhoc usages.

cakoose · on Feb 5, 2018

JSON is not "fully typed". It just happens to have different syntax for strings, numbers, and booleans. But the application code still needs to come up with a way to distinguish between timestamps, enums, different object types, etc.

XML uses the same syntax for strings, integers, and booleans, but it has mature schema/typing tools that make it easy to apply more precise typing, which you'd want to do anyway to identify timestamps, enums, and different object types.

drofmij · on Feb 5, 2018

Pros: At least its not JSONx :D

Cons: Not seeing any advantage over JSON. If you want a type for objects just add a type field and have your code read it. Then you can use any of the existing parsers.

nailer · on Feb 5, 2018

> It has clean syntax

You could remove every '{' with 0 loss of meaning.

henryluo · on Feb 5, 2018

Removing '{' would make Mark become YAML. But in order to be a superset of JSON, YAML adds support of JSON syntax. So '{' is back.

drofmij · on Feb 5, 2018

I was thinking this looked like YAML with {}

henryluo · on Feb 5, 2018

YAML does not have good support for mixed content.

greencurry43 · on Feb 5, 2018

I made my own little language called Geneva [0] for similar ideas but it acts as code and can be parsed as JSON. I also came up with a spec for doing this for HTML [1] (but no code to do this yet).

[0] https://github.com/smizell/geneva

[1] https://github.com/smizell/janeml

Jeaye · on Feb 5, 2018

Please just use EDN: https://github.com/edn-format/edn

rurban · on Feb 5, 2018

What he forgot to add:

Some disadvantages of Mark, comparing to JSON would be:

* Mark is insecure, JSON is secure.

* Mark is slower than JSON

Passing types directly to object.constructor is of course entirely insecure. https://github.com/rurban/Cpanel-JSON-XS/blob/master/XS.pm#L... (i.e. CVE-2015-1592)

henryluo · on Feb 5, 2018

Thanks for feedback on the security aspect. It is something that Mark definitely needs to consider carefully.

Current Mark implementation does not call arbitrary constructor during parsing. The constructors are created from scratch. But application users might want Mark to call their customer class constructor. I'm thinking passing in a callback function to Mark.parse().

rurban · on Feb 5, 2018

What YAML did in this aspect is providing a whitelist of allowed classnames.

henryluo · on Feb 5, 2018

When used for mixed content, Mark is not necessarily always slower than JSON. Many existing JSON-based DOM solutions, like JsonML and virtual-dom, need to use several JS objects to represent one element, but Mark uses only one JS object.

However, I don't have time to do some benchmarking at the moment.

staticelf · on Feb 5, 2018

I like the idea but I don't think the benefits outweighs the negative implications it would have to implement it.

I mean JSON as a data format for api stuff is just enough as it is and you'd need some serious reason why to change from JSON and these reasons just doesn't cut it.

zeveb · on Feb 5, 2018

> The advantage of Mark over S-expressions is that it is more modern, and can directly run in browser and node.js environments.

… with the right translator to JavaScript, which also happens to be true of S-expressions.

His table is incorrect, incidentally: S-expressions support mixed content (if I understand what he means) and are also fully generic.

He doesn't have a good example of the benefits of his proposal over S-expressions: 'more modern' just means 'undiscovered bugs.'

I respect his enthusiasm and hard work, but I believe what the world needs is hard work on existing things rather than hard work reïnventing the wheel.

jl6 · on Feb 5, 2018

> Mark utilizes a novel feature in JavaScript that an plain JS object is actually array-like, it can contain both named properties and indexed properties.

Where can I read more about this feature of JavaScript?

nsm · on Feb 5, 2018

JS objects are tables, with Arrays being a syntactic convenience with some extra properties. JS engines do heavily optimize for the array case with dense layout. https://docs.microsoft.com/en-us/scripting/javascript/object...

Calling it novel to JS is a stretch. Lua does this too and I’m sure there are other languages.

fiddlerwoaroof · on Feb 5, 2018

php does too

etu · on Feb 5, 2018

So it's basically just XML with curly braces or sexps...

And why bring YAML in the mix? Yaml isn't used for transfer I hope? Should be compared to TOML as well in that case that seems a lot better than YAML, especially for configs: https://github.com/toml-lang/toml

Or msgpack? Which also seems useful. Why not protobuf? Or just s-exps which is basically what this is.

adrianratnapala · on Feb 5, 2018

If I understand the grammar properly, all plain-text elements have to be quoted. That makes sense for object data, isn't really markup friendly.

geraldbauer · on Feb 5, 2018

Great initiative. I'd say why not improve the leading format, that is, JSON ;-) ? I'm collectiong all data markup flavors and extension (HJSON, JSON 1.1, JSONX, SON, etc.) in the Awesome JSON - What's Next page @ https://github.com/json-next/awesome-json-next Cheers.

henryluo · on Feb 5, 2018

JSON has a strong selling-point that it is compatible with JS in syntax.

It is very hard to make major extension to JSON and still be compatible with JS syntax. Minor changes are possible, like in JSON5.

Once it breaks JS compatibility, I don't think people will think it is JSON next any more.

geraldbauer · on Feb 5, 2018

Good point. If it's not JSON next but a complete new format than you will have to compete with JSON, all JSON next formats and all other alternative formats. Good luck.

geraldbauer · on Feb 5, 2018

PS: Answering myself - the leading data format might actually be the humble Comma-separated values (CSV) format! Love it really :-) Let's make it better and improve it - let's welcome csv,version: 1.1 -> https://csvalues.github.io

pmilot · on Feb 5, 2018

Isn't this what the author is trying to do? Mark seems to be a superset of JSON.

geraldbauer · on Feb 5, 2018

Looks like Mark is more in the tradition of YAML, that is, YAML is a superset of JSON too, e.g. it wants to be its own format (not just a humble extension). For example, my better JSON format flavor is called JSON v1.1 to make it clear its just humble JSON, but improved :-).

gregman1 · on Feb 5, 2018

Reinventing LISP is a nice thing.

I like this project.

alvis · on Feb 5, 2018

I like this syntax also for this reason, though I have to add that the hell of parenthesis would prohibit it to be widely adopted.

nfoz · on Feb 5, 2018

Is that really worse than the "hell" of angle-brackets in XML/HTML? Because those are pretty widely adopted.

avereveard · on Feb 5, 2018

> Mark utilizes a novel feature in JavaScript that an plain JS object is actually array-like, it can contain both named properties and indexed properties.

wouldn't that make introspecting objects very annoying?

henryluo · on Feb 6, 2018

In Mark implementation, care has been taken so that indexed contents are not enumerable. So e.g. when you run a for ... in loop on a Mark object, you'll only see properties, not the contents.

This is one of the difference between Mark object and an array. Array contents are enumerable by default.

gaius · on Feb 5, 2018

Doesn’t have a native date/time, so no advance over JSON really

tlocke · on Feb 5, 2018