Hacker News new | past | comments | ask | show | jobs | submit login
MSL: A New Programming Language for Text Editing and Fact Verification [pdf] (mimix.io)
95 points by dtagames on Jan 26, 2020 | hide | past | favorite | 25 comments



OP here. MSL is a new language for representing research and text editing as programming language expressions. Our company's goal is to replace the word processor with an improved set of tools based on this language.

The history of MSL can be traced from the goals of computer science pioneers going back to the 1940's. These additional documents further explain the development process and goals of the MSL language:

Mimix Whitepaper.[0] Explains the historical precedents for moving beyond word processing.

Getting to Xanadu.[1] Explains why Nelson's Xanadu did not achieve commercial success.

Everything Old Isn't New Again (Yet).[2] Shows previous attempts at moving beyond word processing.

Recipes for Research.[3] Explains how a new language could record text editing and fact changes over time.

Nebula: Simple and Flexible Apps with (msl) Data. [4] Shows how (msl) applications can be deployed.

I welcome your questions, ideas, and suggestions.

-- D

[0] https://mimix.io/wp-content/uploads/2018/08/On-Mimix-v1.3.pd...

[1] https://mimix.io/getting-to-xanadu/

[2] https://mimix.io/retro/

[3] https://mimix.io/recipes/

[4] https://mimix.io/nebula/


This reminds me a lot of the semantic web attempts back in the pre-2010 years. How do you plan to succeed where so many other have failed? Or will this end up as a glorified notetaking app like Notion?


It didn't "fail." It didn't take off. There are plenty of tendrils of the original ideas remaining, and ultimately its main idea, that information should be relatively typed so it's more precise and reusable, is likely to prevail. The requirements were obviously too much for the web environment of the 2000s, and we're still suffering with startup culture where everyone wants to get rich with a piece of the pie, so guessificial intelligence is pretending to fill the gap, but hopefully all the problems with stopgap approaches will finally make the mainstream switch back to widely intentionally creating information.


First, I agree and would also like to say the semantic web idea is roughly 25 years old, which is such a short time period that saying something is dead/failed seems a little silly.

Second, I would say the idea is more realistic now than ever.

Now we have:

- Git so you can cheaply & collaboratively experiment + evolve both data encodings and schemas

- Faster, more reliable type checkers and code/data refactoring tools

- Deep learning agents that are showing promising gains in question/answering challenges so I can see in the very near feature DLAs coauthoring semantic content alongside humans

- As always, more programmers, faster bandwidth, more data, faster computers, etc.

I think technology is getting to the point where the semantic web could happen relatively quickly.

That being said, I still don't know if it will happen because I now am not sure if there is are strong economics forces that would impede such a thing.


The semantic web ideas failed because the concept required open data integration. Companies don’t like freely exposing their data without branding or proprietary controls.


I think it's very cool. I didn't inspect it with a fine tooth comb, but did a few searches through the docs and seems promising. (one silly surface level search I do is this: do the words "car" and "cdr" appear? if so I shake my head at the design choice to use a word with much less meaning to pander to academics when there is a better one. happy to see that it does not appear in MSL.)

My biggest question of course is this: is there something built with this I can play with?

By my count there's been between 493 and 1323 Lisp-ish languages over the past ~60 years (or approximately 10-20 new ones every year), but only a few end up lasting. So my dilemma is I'd prefer to wait and try a product to see if it's really great before evaluating a language.


I read the first few pages of the language spec and skimmed the whitepaper and I cannot find any compelling use case for the system. Even worst, some ideas presented are actively anti-features. For instance the inclusion of all materials that was read: in theory it looks cool, in practice this is a copyright nightmare. Again, the ability the reply every edit is creepy as hell. As an academic (whose job is thus churning papers), hiding how the sausage was made is a nice property of a PDF. Also, important thing is idea laid out in papers, not the exact stream of text that express them.


> Again, the ability the reply every edit is creepy as hell. As an academic (whose job is thus churning papers), hiding how the sausage was made is a nice property of a PDF.

As somebody familiar with version control and open source this is common practice.

Perhaps the chain of custody of document edits scares people who do not value transparency, but open source demands that level of transparency and criticism.


Programming and writing documents are different activities. One reason you don't want to see everything you've ever written is because it is wrong, or maybe you changed your views in the process of writing, or no longer important to the point you're making. I don't see how transparency with respect to edits you've made to a document are relevant if you've explicitly decided that you don't agree with the part that was edited.


> One reason you don't want to see everything you've ever written is because it is wrong, or maybe you changed your views in the process of writing, or no longer important to the point you're making.

I'm not necessarily disagreeing with the main point, but all of these things happen with code too. I've had to counsel people before who were scared to commit code until it was perfect, and I've had to try and push into their head that it's not bad to have Git commits that they aren't proud of. Part of breaking them of that habit was teaching them to be OK with allowing people to see their bad code and bad ideas, but another part of it just came down to teaching them how `rebase` works.

On the text side of things, I already publicly commit any edits I make to blog posts, documentation, and any websites I make after publishing -- even if they're embarassing. Occasionally I'll also publish rough drafts if they're not sensitive content. Part of that is getting over the fact that people might see bad writing. If I was worried about people seeing my bad ideas, I wouldn't publish anything in the first place. 10 years from now, I hope that I'll have matured enough that I can look back at my blog and be embarrassed about some of my writing. If I don't have anything to be embarrassed about, I probably haven't grown much as a person.

But of course, for particularly sensitive or complicated topics I generally don't push rough drafts. And that's where rebasing comes in. Version/edit tracking isn't just about public transparency. Even with sensitive topics, I still put rough drafts of blog posts in local branches and commit edits while I'm building them. My workflow with private and public content is exactly the same. It's just that when I eventually push to a public repository, I first squash the branch down into a single commit.

Transparency with edits is relevant to me because edit history is a personal/org-wide organizational tool. Those edits don't need to be pushed to the public, sometimes they don't even need to leave my computer, but sometimes I like to look over my writing and see how a site/post/article has evolved.

History isn't necessarily static. It's very common in version control to rewrite history when needed, and to push different parts of history to different remotes where people have different levels of access.


> I've had to counsel people before who were scared to commit code until it was perfect

This is a better problem to have than coworkers who push code that literally doesn't run, e.g. has syntax errors. /rant


That's a question for PR review, not what intermediate states they can push to their own branches


It's worth reading Vannevar Bush on the value of associative trails: https://en.wikipedia.org/wiki/As_We_May_Think#Concept_realiz...


> I cannot find any compelling use case for the system

"Fact checking" is probably the magic word du jour for getting grants and funding.


I would assume many high profile publishers would be concerned in getting the facts right. Especially if working with freelancers, there might be interest to see in more detail the creative process. Just a random example for Google about a case, where journalist had been faking things [1].

Maybe there would be also use in areas like technical writing, rules and regulations or contracts. Having errors in these can actually have big financial consequences.

[1] https://www.theguardian.com/world/2018/dec/19/top-der-spiege...


Relotius produced exactly the right facts; they just weren't true.


How does this compare to a wiki[1] ?

[1] https://en.wikipedia.org/wiki/Wiki


A more accurate title would be 'MSL: A Lisp-like stream-based data format for history of text based on a hierarchical key-value store'.

The idea behind the format is that analysis can be done on it to uncover useful insights about the text.


Sure, but that's true of XML as well. There's got to be a point to the new thing, right?


The MSL specification is more specific on how the knowledge-containing text should be stored. It provides the 'schema' and the semantics, based on Lisp syntax. It looks like MSL is intended to be the foundation for an API that Mimix plans on building.


Is that so? After all, XML has Schema as well, and with XML you can have task-appropriate schemas.

Look, it's all S-expressions, and Lisp is certainly better than XSLT/XPath, so, sure. But there's a whole ecosystem around XML for this sort of usage that there isn't for Lisp. OTOH, Lisp is a general purpose language that lets you build DSLs as needed, so there's that advantage.


OP here, again. Thanks to everyone who read and commented! To answer a couple of questions:

Mimix doesn't force to you reveal everything included in your work ("how the sausage was made"); it's simply there if you want to. This feature is mostly for you, the author, to get to your "ah-ha" moments as Vannevar Bush called them. Right now, we don't have software that does a good job of connecting facts inside documents.

MSL is a language for building tools which we hope can eventually replace the word processor. We are at an early stage with that software which will be open sourced once it's ready.

The hybrid database on which MSL relies could be viewed as the ultimate version control system because it allows rewinding any single variable (a semantic assignment statement) to any value it had in its history.

Mimix certainly draws from the semantic web, but that's not our secret sauce. Today, Google and Stanford already have good libraries for figuring out what is a noun, what is a verb, etc. Where MSL shines is, as @MiniMax42 pointed out, in marking up that text so that it can later be analyzed or transformed in ways that are difficult with word processing documents.

While we hope to see this technology become ubiquitous, the best early use case is high-value documents like medical, scientific, legal and financial papers, as @jpalomaki said.

Thanks again! -- D


Minix seems to be a system that incorporates word processing, AI-based (machine learning?) fact checking and bibliography management. Is MSL a language for interacting with that system or something more?

Edit: *Mimix


Interesting work.

How is MSL related to the concept of version control/revision control? I didn't see any reference to (centralized or decentralized) VCS, but that seems highly relevant to MSL.


This is an exciting idea. I would hope that we can ultimately dispense with annotation and have the computer simply read the sentence and comment, or implement, or what have you.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: