Hacker News new | past | comments | ask | show | jobs | submit login
A source of knowledge for Language Server Protocol implementations (langserver.org)
129 points by 0x4542 on Nov 16, 2017 | hide | past | favorite | 59 comments



I still don't get it. We could create a parser framework that allowed a client to get some sort of generic AST and reference graph - which would enable all the use-cases of language servers but would also allow more use-cases in the future (e.g. IntelliJ-style inspections or language-aware diffs).

Instead, the standard requires running half a dozen processes with an API that doesn't provide anything except the exact data required for a number of handpicked use-cases.

Considering that even modern IDEs (again, see IntelliJ for examples) can do more, I don't see how this is a good way forward.


A generic AST and reference graph that shipped a common denominator of languages is not as easy to do as a set of common operations and queries

I think you may be underestimating the feasibility of what you propose.


You don’t need to do a generic AST. Have it be a protocol ... that runs in-process via an API. Because you know what, that’s what an API is ... a protocol for communicating with someone else’s code. But you don’t have to run extra processes or suffer context switches, and you don’t have to be in the business of debugging distributed systems in order to accomplish any tiny thing. Amazing!!!

This whole LSP thing is a mindbogglingly bad idea, brought to you by the same kinds of thought processes that created the disaster that is today’s WWW.


One of the design goals for LSP is that it is not in-process. Process isolation gives the text editor stability in the face of unstable plugins and allows multiple language servers to run concurrently without requiring the target language (which could be anything) to support threads and a C ABI - something which many languages have no need for.

Furthermore, many languages ship with a runtime which does not play nice with other runtimes. Try running a JVM, .NET VM, Golang runtime, and Erlang BEAM VM in the same process and see what happens. Better yet, try debugging it.


The downside to that approach is that it limits you to C calling conventions (most language communities are going to want to write in the target language) and means that the process will be as stable as the least stable plugin. Given how many bug reports were filed against every editor which has ever done that, it's easy to see the appeal of containment to the people working on an editor.

The other side of that is that we're not in the 90s with single core processors and there's been a lot of hardware & software optimization over time. Most people can run something like VSCode and a language server and still have multiple cores left over — and since the events which trigger language server interactions are generally made by a human with a keyboard it's not like you need to be in the microsecond range to keep up.


Try harder.

A PC full of programs built with these assumptions will probably grind to a halt all the time despite having very high spec hardware.


That’s a lot of fearmongering with no evidence. Do you have profiler data showing that the microseconds needed to pass a message between a process is a significant limiting factor in a program which is rate-limited by human text entry?

I mean, taking your argument seriously would mean everything should be hand-tuned assembly. Obviously we figured out that other factors like time to write, security, portability, flexibility, etc. matter as well and engineering is all about finding acceptable balances between them. Microsoft has been writing developer tools since the 1970s and in the absence of actual evidence I’m going to assume they made a well-reasoned decision.


It seems like the person was suggesting if all processes on your PC used a client/server model with message passing/RPC instead of the existing API model, the idle cores you speak about, would not be idle.

While you're right that productivity versus performance is a trade-off, and an editor is not necessarily a high performance application, its not clear to me whether future optimizations would reduce the gap, as much as optimizing compilers did vis-a-vis C and assembly.

In any case, that aside, the core guarantee of software stability with LSP remains to be seen.


> In any case, that aside, the core guarantee of software stability with LSP remains to be seen.

I don't follow this conclusion: haven't we already seen it with the way language servers crash and are just restarted without other side effects?


Fearmongering? That's a strange choice of words.

> Do you have profiler data showing that the microseconds needed to pass a message between a process is a significant limiting factor

What? I think you didn't get my point. Let me try again.

You can look at a single operation and say "oh, that's nothing, it's so cheap, it only takes a millisecond". Even though there's a way to do the same thing that takes much less time.

So this kind of measurement gives you a rational to do things the "wrong" way or shall we saw the "slow" way because you deem it insignificant.

Now imagine that everything the computer is built that way.

Layers upon layers of abstractions.

Each layer made thousands of decisions with the same mindset.

The mindset of sacrificing performance because "well it's easier for me this way".

And it's exactly because of this mindset.

Now you have a super computer that's doing busy work all the time. You think every program on your machine would start instantly because the hardware is so advanced, but nothing acts this way. Everything is still slow.

This is not really fear mongering, this is basically the state of software today. _Most_ software runs very slow, without actually doing that much.


I don't think this is true. Also any solution that involves humans just "Trying harder" is doomed to failure. History has demonstrated that over and over again.

The technologies that win are the ones that account for that.


Right, this seems obvious. Any idea why they opted not to go that way?

I guess the reason is that the original use-case was VS Code & TypeScript (?) and developing a lowest common denominator API for all clients would mean C, so they would have had to program to a C API even though both the client (VS Code?) and the server (Node?) are running JavaScript.

But then maybe the answer should be to provide better C invoke wrappers for high-level languages, not to use HTTP instead of C.


Knowing programmers, it's probably an absolutely irresistible concept that the provider for a given language must be easy (in the Hickey-ian sense) to write entirely in that language.


Not just a concept. Many languages are bootstrapped and therefore the canonical tools for understanding that languages syntax and semantics are in that languages stdlib. You would have replicate all of that in your IDE's language instead. It's DRY at work.


Seems a reasonable request to me.


I think the language server is not a solution to a technical problem; it's a solution to a social/political problem.

How to support intellisense for a language one time and have it work on many editors?


The best part is when I simulate bad network conditions by increasing latency and my autocompletion stops working (timeout 10ms) or becomes unusable.


Why would you need to simulate bad network conditions over localhost? It's probably you're most stable network connection in any situation.

You are simulating a condition that will almost never happen unless your machine itself is dying in which case you have bigger problems than auto-completion to worry about.


You have a custom communication protocol over UDP and discover that with high latency, transferring large chunks takes a long time. You have a laptop available for testing. What do you do?

Sure, could do some hyper complex setup with multiple VMs etc... or you could just run one command to temporarily increase latency on localhost.

If you know some better way of doing this, feel free to tell me.


Create a new loopback interface for testing? No reason you should be using your actual loopback for testing bad networking. Especially since the loopback interface is used for far more things than the one application you are testing these days.

Setting up a new interface is pretty easy on most unix OS's and then you can increase the latency on just that interface without impacting the interface most software on your machine is built to expect to be blazing fast. And you can mess with that interface to your hearts content knowing the only thing affecting it is the applications you are running on it.


That's a good idea actually, that didn't occur to me.


You mean overestimating the feasibility.


I might be, but the general concept of formal grammars are one of the most basic achievements of computer science.

I'd claim that for most languages you could define a formal grammar as an EBNF that would provide some basic utility.

Such a grammar would probably be overly permissive (it doesn't know anything about references, types or host environments) but it would provide you a baseline for syntax checking and autocompletion and could generate an AST that more language-specific rules could evaluate.


I think if you actually try this, you will find out why it's not a feasible concept (or why it's so vague as not to be useful). I think Hjelsberg and the C# team know all about grammars.

In particular, I know that using a BNF is not useful for shell, having ported the POSIX grammar to ANTLR.

http://www.oilshell.org/blog/tags.html?tag=parsing#parsing


Thinking that you can describe every programming language in BNF is similar to the RegEx problem: just as not every language is a Regular language that can be easily described as a single RegEx, not every programming language is (solely) a Context-Free Language with a grammar that can adequately be described in just BNF (or any other CFG description language) without ambiguity.

(Some programming languages are context-sensitive, some programming languages allow ambiguity in the main grammar and have precedence rules or "tie-breakers" to deal with those situations.)

"Universal grammar engines" and "Universal ASTs" are wondrous dreams had by many academics and like the old using RegEx in the wrong place adage: now you have N * M more problems to solve.


Right, CFGs are insufficient for most languages, and yacc is also insufficient for most languages. Yacc is simultaneously less powerful and more powerful than CFGs. Less powerful because it uses the LALR(1) subset of CFGs, and more powerful because you can embed arbitrary code with semantic actions.

I think the OP is looking for something more like this. This paper is from the same author as Nix, so he has some credibility. But I think this paper is not well written, and I'm not sure about the underlying ideas either. (The output of these pure declarative parsers is more complicated to consume as far as I remember.)

Still, the paper does shows how much more there is to consider than "BNF". Thinking "BNF" will solve the problem is a naive view of languages, and as you point out, is very similar to the problem with not understanding what languages "regexes" can express.

http://eelcovisser.org/post/135/pure-and-declarative-syntax-...

Mainstream parser generators pose restrictions on syntax definitions that follow from their implementation algorithm. They hamper evolution, maintainability, and compositionality of syntax definitions.


> I still don't get it. We could create a parser framework that allowed a client to get some sort of generic AST and reference graph - which would enable all the use-cases of language servers but would also allow more use-cases in the future (e.g. IntelliJ-style inspections or language-aware diffs).

The point is to move the use-cases to the server/protocol to avoid the "(m languages) x (n IDEs)" problem. The protocol is at https://github.com/Microsoft/language-server-protocol with contribution guidelines so it should "allow more use-cases in the future".

> Instead, the standard requires running half a dozen processes with an API that doesn't provide anything except the exact data required for a number of handpicked use-cases.

Why half a dozen? Two seem enough (IDE + language server). It's also the minimum given that IDEs are using different languages/runtimes and that lot of languages have most of their tooling written in the targeted language.


> Why half a dozen? Two seem enough (IDE + language server). It's also the minimum given that IDEs are using different languages/runtimes and that lot of languages have most of their tooling written in the targeted language.

If you only work in a single language, yes. However, it's quite common that the same project includes multiple different languages, linked or nested into each other. E.g., a web project might have HTML, CSS and (possibly nested) JS for the frontend (Replace with Coffeescript, React, SASS etc as needed) and PHP, Java, or yet another configuration of JS as the backend - not including build scripts and config files. So if you switch between them reasonably often, you'd have language server processes for each of them running in the background.

Alternatively you could have more coarse-grained servers that handle multiple languages (e.g. HTML+JS+CSS, node.js+npm, java+maven etc) but that would seem to make adoption even harder.


Even if it's 10 - is it a problem? Classical Unix shell tools and build tools spawn hundreds of processes and communicate with pipes between them. And still most people don't mind because the overhead is not noticeable. The amount of overhead for the few long-living language-server processes should be far lower than that.


I just wanted to second Matthias247's comment: why is this a problem? I'm sitting in front of a 4 year old iMac. I have VSCode open currently with 3 windows containing 3 projects and code in 5 languages. The total CPU load for the aggregate 22 process group is less than 1% and while it's not using no memory it's not using more than, say, Chrome or Slack use.


I'm missing what's wrong with having language servers that support multiple languages.


Multiple language servers supporting multiple languages, each able to communicate over something more than the C ABI.

Meaning if your language's compiler and tooling are self hosted, you can write the plug-in in that language and leverage the very tooling used at build/runtime.

In other words, you can choose the right tool for the job. One that already exists and is tested, stable, and full featured. Or, you could rewrite everything in C or something that speaks it, run them all in the same process, meaning one bad plug-in brings the whole thing down.

And if you have to reimplement, that ups the chance that your version will behave subtly different than the real thing.


Not sure if you're agreeing or disagreeing, but I had in mind (for example) a language server supporting multiple languages that run on the JVM and a separate language server supporting languages that compile to JavaScript, etc.

(This is assuming that reducing the number of processes improves performance; otherwise, a process per language might be simpler.)


And yes, sharing would be nice, and I don't see a reason you wouldn't be able to at a glance. I haven't implemented the spec though.


Sorry for not being clear, I was agreeing and expanding on the pros of that approach.


While not a real parser framework, I think Google Grok (now: Kythe) is at least more ambitious than LSP, but unfortunately, it seems rather dead.

https://github.com/google/kythe

EDIT: Looking that the graphs, the project is obviously not "dead", but it seems it didn't get any traction outside Google.


Here is a 90 day window, that shows how active it is, in greater detail:

https://public.gitsense.com/insight/github?r=google/kythe#b%...


> We could create a parser framework that allowed a client to get some sort of generic AST and reference graph - which would enable all the use-cases of language servers but would also allow more use-cases in the future (e.g. IntelliJ-style inspections or language-aware diffs).

I think a lot of infrastructure would be simplified if each language (or interpreter/compiler) would provide a flag or tool to convert raw syntax into some sort of s-expression (or JSON or XML if you prefer). This wouldn't be a "full AST", but just the "skeleton", enough for us to do tree traversals/transformations instead of having to lex, parse, etc.


It would be both simple and very slow. I don't think many people are okay with the idea of waiting several seconds on starting the compiler. At the very least you have to keep the compiler process running and you need a protocol to communicate with the compiler.

It already exists and is called Language Server Protocol.


> It would be both simple and very slow. I don't think many people are okay with the idea of waiting several seconds on starting the compiler.

Is there a specific reason you think it would be slow? Translating between text formats is pretty quick, and there's no need to do anything else that compilers normally do (e.g. building an AST; type-checking; resolving imports; optimising; generating code; etc.)

> At the very least you have to keep the compiler process running and you need a protocol to communicate with the compiler.

That may have a slight speed benefit, but for something as fast as parsing I think it would be over-engineering. Just pipe through stdio.

> It already exists and is called Language Server Protocol.

LSP is certainly interesting, although it's very new and seems rather limited.

I'm mostly curious why so many languages, after choosing (perfectly reasonably) to avoid an s-expression syntax, end up "throwing the baby out with the bathwater" by providing no "backend" syntax for tooling. Some tools represent code as e.g. XML for their own purposes, but I've never seen a tool-agnostic format, or a language "endorse" one of these as an interchange format. Instead, I've seen a huge amount of effort wasted on writing and maintaining a whole bunch of bespoke parsers and pretty-printers.


> We could create a parser framework that allowed a client to get some sort of generic AST and reference graph

I think this is assuming too much about how the language server is going to work. For example, does it parse everything greedily, or does it know how to parse stuff on demand? If the AST is in-memory in the editor, that might not work too well with giant projects that need lazy parsing. A language server might also want to do really fancy things like bust its file cache on INotify events.


So, we have a great concept and a bad implementation.

This is a text-book case for refactoring.


Why don't you implement it and see how it goes?


That's pretty interesting. I use Eclim (vim + eclipse headerless) everyday for my Java projects. Has someone already tried the Java lang servers, especially with Vim? How does it compares with Eclim ?


This is my concern with the LSP. It's a wonderful idea, but its widespread adoption also implies abandoning many well-liked language-specific tools like Eclim.


It's not really meant to replace well-performing existing editor-language integrations with a large user base and active development.

What it's awesome for is new editors and languages. If editors implement this interface they immediately have decent support for a decent number of languages, and if a language implements this it immediately has basic support in all editors supporting LSP.

Microsoft originally developed this for their Visual Studio Code and Typescript integration, both of which faced the typical challenges new editors and languages have with support.


I think it can work hand-in-hand; general support via LSP and tight coupling with additional augmentation.


I don't know how feasible it is but i agree, it would be a pity to have to throw away Eclim because of LSP.


It's surprising that there's no Go language server with code-completion support.

Isn't most of the machinery to extract that information from source text present in the standard library? https://golang.org/pkg/go/


There's a couple of popular CLI tools for code-completion and other "IntelliSense"-ish functionality that are meant to be editor-agnostic --- ie. "focus on just your editor-specific plugin plumbing and use these proven CLI tools for the underlying language intelligence".

(I made a list of useful, stably working ones about half a year ago when hacking on a custom VScode plugin, many but not all of them linters, haven't looked into latest developments: https://github.com/metaleap/go-util/blob/master/dev/go/devgo... )

Just that maybe none have fully implemented LSP. I guess even a from-scratch Golang LSP implementation might do well to just utilize these tools for underlying lang intel --- rather than these tools themselves bloating up by implementing some maybe-just-transient-maybe-the-future current-day MS-backed protocol.


I like the plethora of linters. It's generally straightforward to wire up custom linters and parse file:line:column from their stdout, since that's an age-old paradigm (overlap from compiler errors).

Code completion has a much higher bar for acceptable UI, though. I trust a Language Server Client with tons of users/contributors to deliver non-idiosyncratic UI sooner than MVP UI integrations of specific tools.

A good example of this is "Tern" for JavaScript. It (the backend indexing/completion engine) is absolutely killer, but the tightly coupled editor plugin/package just for Tern is decidedly "meh". I suspect because hardly anyone is developing, or indeed using, that specific integration.

---

Regardless of the overall merit of the Language Server initiative, the "matrix vs column" problem they articulate is bang on the money in terms of delivering quality (non-idiosyncratic) UI.


> Code completion has a much higher bar for acceptable UI, though. I trust a Language Server Client with tons of users/contributors to deliver non-idiosyncratic UI sooner than MVP UI integrations of specific tools

Apt for the client-side. But as for "there's no Go language server"---

It's still more sensible for a say Golang LSP implementation (the server side) to rely on the existing `gocode` tool and related tooling as underlying to deliver auto-completion, than reinvent the wheel here. These have already been battle-tested and encountered+fixed god-knows-how-many quirks that can occur when attempting to furnish a most-useful-for-current-cursor-context code completion.

Which kinda was my point. Anyone is free to wire up a say Golang LSP implementation without needing to redo all that the exising tooling already does well. If nobody felt like doing such an LSP server to-date, it's probably mostly because it's a young protocol still looking to find adoption beyond VS --- the ole' chicken'n'egg =)


Hey thanks. You prompted me to look into `gocode` again and I've got it working a lot better than when I last gave it a shot.


We are in the process of adding full support for language server protocol into our testing tool https://github.com/getgauge/gauge-vscode and it's been a great experience. In the past, we implemented these features for every IDE plugin (IntelliJ, Eclipse, Visual Studio) and it's tough to build, test or debug. With language server, we build all these features in our tools language i.e. Golang, and it works for any IDE that supports the protocol.


Notably and regrettably absent is Kotlin -

https://discuss.kotlinlang.org/t/any-plan-for-supporting-lan...

Java seems usefully far along however...


i wonder if this is a symptom of having a tools vendor (who only makes money via tools) create a language?


It looks like they maintain plugins for other IDEs - notably Eclipse https://github.com/JetBrains/kotlin-eclipse

The root cause seems to be the community. Few Kotlin developers use anything but an IDE so there's little motivation to bring tooling to other editors like LSP would


I suspect that they have a lot of bells in whistles in IntelliJ that are not possible over the language server protocol. But yes it from a business perspective it doesn't make much sense to devote a lot of manpower to making competitors more capable.


Microsoft seems to currently be investing a lot of effort in making developing .NET server apps on a competitor's platform more capable, so I think it's not as cut and dried as that. If you develop a "good enough" LSP server for Kotlin, and have the "best of breed" tools in IntelliJ, then you potentially increase the pool of Kotlin developers, which creates network effects making Kotlin development more appealing, and then you see more IntelliJ use. There can be benefits to making your platform play well with your competitors, if you can still monetize the platform.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: