It looks like an alternative to Jsonnet which has schema validation & strict types. IMO, Jsonnet syntax is much simpler, it already has integration with IDEs such as VSCode and Intellij and it has enough traction already.
Cue seems like an e2e solution so it's not only an alternative to Jsonnet, it also removes the need of JSON Schema, OpenAPI, etc. so given that it's a 5 months old project, still has too much time to evolve and be mature.
We're heavily using Jsonnet for our data modeling (https://github.com/rakam-io/recipes) and pretty happy with it. We also have plans to add support for JSON Schema which is adopted by many of IDEs so that VSCode makes us feel like we're writing Java, not a Jsonnet file.
Cue is Google's 6th attempt and given that Jsonnet already has traction and works well out of the box, I would invest my time into Jsonnet at this time.
Okay, so I've spent some time trying to understand this, and I think it's
actually really cool, but I found the "About" and "Concepts" documents tumid and murky.
Here is my understanding of the basic concepts:
- It allows a schema-like set of constraints to be declared for JSON (and
therefore for YAML & TOML) in a syntax that is an extension of JSON.
- Cue deals with types (sets of values) where JSON deals with single values.
Ordinary JSON syntax for a primitive value denotes a set containing that
one value. For example, `a: 1` means that the set of possible values for
`a` is {1}. Cue calls this a concrete definition.
- Cue provides operators for union (`|`) and intersection (`&`) of sets, and
inequalities for ranges of numbers, and so on. `1 | 2` denotes `{1, 2}`.
- Built-in names provide the types `int`, `float`, `string`, etc.
- Cue "struct" types look like JSON objects, associating names with sets.
Each name/value pair is a constraint, and all constraints must be met.
For example, `{a: int}` denotes "the set of objects that have a property
`a` with the value in `int`".
- Properties can be referenced by name; this allows a property defined in
one place to be used a type (set) definition in multiple places.
- When a name is bound more than once, the sets associated with each binding
are intersected. This means that enforcing a schema reduces to simply
combining the schema definition with the "concrete" bindings and throwing
an error when an empty set is encountered.
BTW, if I do have it all wrong and what I've described above is not an
accurate description of Cue, then I think I'll have to go build what I've
described.
Some reasons I'm afraid I'm off-base:
- What's with the lengthy discussion of lattices, and related terminology?
Sure, we can construct a lattice from the set of possible types and a
"subset of" operator, but that's another level of abstraction away from
the necessary concepts, so I don't see what value it adds.
- The `|` operator is described as constructing a sum type, when it seems
to me it must actually be a (non-discriminated) union. Elsewhere the
`|` operator is described as "computing the join", which to me would mean
finding an element in a lattice, but for this to make sense to me I have
to think of it as adding an element to the lattice (again all the lattice
or poset terminology serves only to obfuscate things).
> What's with the lengthy discussion of lattices, and related terminology?
It's how the author thinks about the values. All the operations move a value up (|) or down (&) the lattice and those operations are associative, commutative, etc. Moving down past the concrete values gets you bottom, the error value, so 1 & 2 is _|_. It fits into things like default values where (using # for * because HN formatting doesn't do escape sequences) a: int | #1 and a: int | #1 unify to 1 because #1 & #1 = #1 but a: int | #2 added would result in a: int because #1 & #2 = _|_ so there is no default anymore.
I don't think the extended discussion on lattices is particularly useful. A much better intro is the tutorial [1] plus the concepts page [2]
The motivation is clearly to build a tool for configuring kubernetes but I see the combination of data, validation, and order independence as being valuable outside that use. I've definitely had projects where it'd fit. The main reason I'd think twice is because it does add a LOT of concepts for something that can be pretty simple on most projects.
Some of the reasons I like the tool can't be found in the language spec. CUE (the tool) provides import facilities for existing configurations like yaml, json, openapi, protobuf or even go code into cue. this helps with adoption and time spent porting existing configs. Another feature of the CUE tool is the ability to create small tools that are able to operate on cue definition files. https://github.com/cuelang/cue/blob/master/doc/tutorial/kube...
If configuration starts becoming more complex than looking up key value pairs, why not just write it in the programming language you are using? More languages / serialisation just adds another layer of complexity. Config as code is actually really neat.
Some web projects validate inputs in JavaScript, then in a possibly different backend language, then (unusually) in the database with CHECK statements. Even when it's all JavaScript the validations are different because the frontend and backend frameworks are different.
And some projects have multiple backend services written in multiple languages.
Yeah, and it is not really new either. ioquake3 did that, it had a header file for some values. If you changed them, you had to recompile the QVMs. So then we added cvars, and some values such as HP, DMG, etc. stayed in the header file. I learnt programming C by fiddling around with ioquake3 and its forks, Tremulous especially. Good times.
I feel like that validation feature could theoretically save a lot of people that occasional 1 hour of their time that was wasted because of a typo in a config file leading to a cryptic error message.
The website is under active development at the moment and their are incomplete parts. For the code links please take a look at the following tutorial on GitHub at the moment.
I found this comment: "In V3 the hobby field is explicitly disallowed. This is not backwards compatibly as it breaks previous field that did contain a hobby field" in https://cuelang.org/docs/usecases/datadef/
So if you add a field, you break existing code that doesn't know about the field.
This is wrong. CUE has optional closed schemas marked by a double colon. The V3 entity you’re talking about is explicitly declared to be a closed definition and therefore disallows unknown fields in entities that claim to accord to the V3 type. Not all definitions are closed.
Even for those that you choose to close, it’s a matter of having different code for different definitions. The claim that it just automatically breaks isn’t true even when closed definitions are used.
BTW, this feature speaks to CUE’s intended purpose as a configuration language. It is (or at least can be) nice to ignore unknown fields in transmitted payloads for forwards and backwards compatibility. But if I’m trying to configure some software and misspell a field, I probably want the configuration file to fail validation, not have the software run with an unintended configuration.
Are you familiar with the concept of context? It's helpful because it allows easier communication by not requiring words to mean the same thing at all times.
In this case, it is helpful because we can scale our understanding of the poster's disappointment to allow us to not have to consider how it might be disappointing relative to, say, global thermonuclear war or a first kiss, but only need to think about it in relation to the other topics of discussion.
I highly recommend using context whenever communicating.
I can be disappointed when viewing a web page while simultaneously being disappointed in humanity's collective response to climate change. Context matters.
> A key thing that sets CUE apart from its peer languages is that it merges types and values into a single concept. Whereas in most languages types and values are strictly distinct, CUE orders them in a single hierarchy (a lattice, to be precise).
I can’t keep upper and lower straight, but `“foo” | 5` would allow either “foo” or 5, and `”foo” & 5` would be “bottom” (also spelled “_|_”, essentially an error)
I'm feeling both really pissed and validated right now. I thought this was going to just be a normal thoughtless config language that would only be successful as a Google project. Then I looked at the theoretical basis page. I have no formal proof about this, but I've been talking about this type of a type system with my parents and high school CS teacher for a while now!
My idea sounds like this: a value is simply an actual binary string. The "type" that classifies any data is described as a formal grammar, where the binary string is a formal language. If the language can parse a given binary value with the formal grammar, that value is an element of the set (type). This naturally leads to a structural type system which can be described using existing set theory and implemented using existing parsers and formal language theory. Of course, this leads to type -> type functions which describe dependent and refinement types naturally.
It's great to see this idea being broadcast on the front page here, as I can see it very clearly as a superior type system. I'm really regretting not writing a formal article about it sooner!
Why feeling pissed? It is unlikely this Cue project implements the very same semantics the way you think about them. I'd go ahead and just write that paper. Looking forward seeing your reference implementation :-).
Yes, I mistyped that bit. I meant to say that because types (as a grammar) are values, they can be inputs and outputs of functions. The functions can fill in the hole that dependent/refinement types fill, by taking context into account (the grammars which describe simple types being context free).
Type -> Type fill in for type constructors like this infinite list:
InfiniteListOf = Type ->
Type & InfiniteListOf Type
That function just morphs a simple grammar into another grammar. Imagine now if we could calculate something in between:
IncrementingInfiniteListFrom = number ->
number & IncrementingInfiniteListFrom number+1
That's where the dependent types come in, naturally.
Seems really interesting after a quick read-through. Specs that allow range-based validation look useful, and the structural declarations also feel like they'll help reduce a lot of boilerplate and repetition. I wonder how this compares with Dhall and Jsonnet, both of which I've been looking into as a safer alternative to templated YAML. With Google putting its weight behind this I'm curious if it'll start finding its way into K8s.
Folks upset at the sibling comment weren't here at the time. The reaction was swiftly negative, and there are few charitable reasons to be found for that.
CUE improves in Jsonnet in primarily two areas, I think: Making composition better (it's order-independent and therefore consistent), and adding schemas.
Both Jsonnet and CUE have their origin in GCL internally at Google. Jsonnet is basically GCL, as I understand it. But CUE is a whole new thing.
jsonnet is basically GCL, but after like five failed attempts to rewrite, replace, or fix GCL, which included creating a formal semantics of GCL pointing out all the problems. I believe there was a final successful attempt to fix it.
I had my own qualms with the language itself too. My team had very complex Borg configs, so complex they took more than a minute to evaluate. Luckily the GCL team at the time was working on a new interpreter (gclx? IIRC), which was indeed much faster (I wrote a mandelbrot PNG generator in pure GCL to prove the big speedup and convince my team that switching was worth the effort).
Unfortunately there was no formal spec of the GCL language, so the new impl was based IIRC on reverse engineering the spec from the first implementation of the interpreter. It turned out our configs hit several cases where the original behaviour was either unsound (and thus the new impl sacrificed backwards compat) or we hit a bug in the new impl.
The main problem with the language (as opposed as issues with the implementations) was that finding the root of those behaviour differences was very hard. It was very hard to follow where the variables came from. The GCL scoping rules (and lazy evaluation) were indeed very unfriendly for debugging.
This was an extreme case of a pain that was felt on a daily basis by a lot of people I've been talking to.
Talk to the borgcfg team, and let your organization's tech leaders know as well.
A few executives are not friendly to borgcfg. Their agenda, appeared to me, has been to deprive it's resources so it can die from rotting. That's bad engineering and totally unnecessary. A healthy BCL/borgcfg will die easier, because they'll allow an easier path migrating to something new.
I left the company half a decade ago.
Just to be clear I actually loved borgcfg, just shared a war story. I'm happy to hear that the tooling improved. I'm unsurprised to hear that many still have mixed feelings towards GCL and ecosystem.
you are probably right. Felt like the documentation for borgcfg was extremely hard to just find. Maybe I'm wrong about this. This is also not the place to discuss this probably.
But GCL made it so whatever variable was actually being used by borgcfg was obscured by layers upon layers of imports.
well, if people testing their BCL, like what they do with c++, things will be better. But you know what, if they do that google officially will be a company built on BCL...
I was going to say, this looks a lot like GCL. Dynamic scope, recursive lookup in parent scopes, templates [1], everything. GCL is neat and all, but I'd almost rather write my job configs by building thin python or lisp scripts to emit json or protos.
Well, why don't you? Job configs in borg are protobufs. GCL can produce the required protocol message but so can any other language or tool. You could use Guile or whatever your heart desires.
A lot of these systems ignore the querying side of a schema i.e. in graphql you can define a schema then query only certain parts of the schema so only parts of the schema are enforced at runtime
I don't think a gui to work with cue files is a good substitute for a gui for regular users to configure whatever you are configuring because its unlikely to be able to define valid data to the same degree as an actual app.
If its for a technical individual to configure your software I don't know that such a gui would be superior to your favorite editor.
There is a large difference between battle-tested tech maintained by a properly staffed team and a proof of concept built by someone in their spare time.
The title certainly made me think that it was the former, even though it's the latter.
Just because you don't like doesn't mean it's not built and supported by a large team following some VP or PM vision. Boondoggles cost a lot of resources.
Perhaps I have a different definition of "maintained" than Google does. For me, it means that it's not just online, but also has bugs regularly fixed and features added.
Most Google services (off the top of my head: Voice, Talk, Reminders) seem to reach v1 and then stop dead in their tracks. They're online, but that's it. Thousands of people request fixes or features, and they go completely unheard.
Google employees on HN have confirmed this, saying that the company rewards new products that drive ads, but not the work involved in improving and maintaining existing products. That explains why Google's released the following products for messaging, and none has been amazing: Talk, Hangouts, Voice, Wave, Allo, Hangouts Chat, and Messages/RCS.
Most of these overlapped at some point, and if you've used any of them, you wouldn't describe them as "maintained". They're more like "abandoned without publicly announcing anything".
If there's a VP or PM vision anywhere at Google that lasts for more than a year, I'd love to know what it is. It seems like a company with a thousand committees and no real creative leadership.
According to Googlers, much of the battle tested tech Google's critical systems run on is not maintained by a staffed team. Stuff that works well enough to not scare a VP and isn't a
10x moonshot doesn't get funding.
For 20% projects, Google typically doesn't make any commitments on the project's development. If you're just being sarcastic on Google's infamous product longevity, you should've used more than 3 words.
We used to joke that open-sourcing Borgmon would be an industry-disabling move, but Prometheus is very popular which just proves there's a lot of people with poor taste in software.
FWIW Piccolo has also leaked into the industry in the form of Pystachio.
I realize this was meant to be sarcastic. However, I whole-heartedly and unironically agree. For writing small bits of configuration, just about any language will be fine. For large amounts of configuration, as is required for... oh, I dunno, let's say deploying software in the cloud, all the commonly used languages are a disaster.
Data serialization languages (like JSON, INI, XML, etc) lack the power to describe large configurations. There's no way to define an abstraction and then use it in multiple places. There's no way to constrain what is considered correct. You end up with tens of thousands of lines of very, very repetitive structures that are very fragile and hard to change.
General purpose languages are also bad for writing configurations. Yes, they're very powerful. That's not a feature when it comes to configuration. If you have a program that emits a configuration, the only thing you can do with it is run it, then inspect the resulting configuration. You can't inspect or transform the program at the level of the configuration semantics. You can't ensure that the configuration will have specific properties. You can't even ensure that the program will, in fact, emit a configuration.
So yeah, we need new configuration languages that lie in between data serialization languages and general purpose computing languages. We're starting to see them. HCL is awful, but it was an attempt at solving this problem and a move in the right direction. Jsonnet is maybe better better? I dunno, I've never tried it. Dhal is interesting, though difficult for non-Haskellers to approach. CUE is also interesting. Bravo!
We have all we need. If you want a full featured language, use that. Beyond that, we have JSON, YAML, INI and more. If you want something more complicated you can create your own DSL for your app.
- 'dumb data' (json, yaml, ...) is machine-, but not human-friendly. it's often tedious to write/read, and you can't abstract common parts out
- DSLs are something you have to write, debug and maintain yourself. is the bug in your config or is it in your DSL's implementation? who knows!
- full featured languages require a full blown interpreter and aren't tooling-friendly. as an example: with a Python package, it's not really possible to statically determine the dependencies , because its setup.py can declare anything it pleases depending on, say, the time of day. also you can't really run untrusted configs because they might launch some missiles
---
there's a decent middle ground – write a program that generates a 'dumb data' config. you write a small amount of code (friendly for humans) and run it to get a static, easy-to-process config (friendly for machines). however in practice (in most languages) the program won't be pretty – probably about as easy to read/write as an implementation of a macro that directly manipulates ASTs (i.e. not very). this can sometimes be ameliorated with some EDSL trickery, but that brings back all the problems DSLs have
and so generating configs is what projects like Dhall/Cue aim to improve. i'd say they're aiming to be something like the regex of config generation - do a limited amount of common&useful things, and make them easy to express.
We do indeed. XML is awful, YAML is a clusterfuck, JSON is often insufficient, and so is INI. TOML looks promising, but might not make it out of its niche.
If Cue is basically a better YAML with built-in schemas, that sounds pretty good.
I was hoping to like TOML, but my first experiments using a Python package for parsing it were disappointing. I found the syntax cranky, and the Python package I was using did a terrible job at providing diagnostics — very poor exception propagation.
And that seemed to be the most mature TOML parser that I could find for Python, so I decided TOML isn’t likely in my future.
TOML is extremely verbose for lists. It’s ridiculous. Sure it had less features than YAML but it also looks considerably worse in all the common use cases. Looks like INI
Yeah, I was fighting the list problem. And not able to mix numeric and string data in a list just made the whole thing unwieldy and frustrated me.
I’ve used JSON for configs, which is not great, but at least readable. The biggest problem is lack of comments, but I added a quick hack to ignore any dictionary key starting with an octothorpe. Not ideal, but is actually handy because it makes it easy to comment out keys temporarily as well as add arbitrary commentary.
Also, the Python standard library JSON parser is very flexible and yields precise exceptions. I was able to turn those parsing exceptions into meaningful error messages with precise line and column numbers. Which came as a bit of a shock to some of my users, because a different tool written before my time but used by the same people had exactly one error message for mispelled YAML: segfault.
I never really understood this. This seems to be an oft repeated truism from mid 2000s with little backing it up.
The only awesome thing that JSON did, was lose type information as well. In fact, the only thing that I can see that JSON brought to the table was easier editing by those who didn't have IDEs, at the expense of losing type information.
XML and especially XML schema are hugely complex and almost laughably difficult to bind to any mainstream programming language. It took Java over a decade to produce a binding that could (perhaps) handle an arbitrary schema. Types boil down to sums and products, I'm not convinced the designers of XML/XSD understood this. XML has so many overlapping concepts: elements, attributes, enumerations, choices, unions, sequences, lists, element/attribute groups, substitution groups, facets, simple types, complex types etc etc
IMHO it's ad-hoc and ugly; we deserve better.
My main complaint with XML is that it’s far too complex for most use cases (like config files), and thus requires an unhealthy amount of tooling to work with.
I can’t just load the file and run it through a parser and get an easily accessible object structure back, I’ll need to navigate the document with DOM or XPath.
JSON with comments (aka JSONC) is in my opinion the best format in wide use today. It has structure and types, but not too many, and not a lot of magic, like YAML has, but for the most complex cases, JSON(C) falls short with its lack of extensibility and inheritance.
We could have a URI that gives you a schema and some tools to generate a skeleton for that schema for you. Then you could put a header in your request for what you want to do and send that payload.
Like a simple configuration access and management (SCAM) protocol.
can you back what you are saying? i personally think we don't. most continuous integration pipelines need static configuration files, not languages. to be honest, i have no idea where this language fits in a software development process...
The problem is not that there are too many configuration languages. The problem is that none of them are any good. The way to solve that is to keep trying new ones until we find one that is good.
Or the premise is a false one, that a configuration language is not the right approach. And everytime yet another such language falls flat only serves to reinforce this very point.
Configuration languages succeed and work just fine most of the time, despite their inadequacies and quirks, but no one ever complains about them when they work.
Go ahead and write your configuration as a fully Turing complete sub-application in whatever language you like. It will do everything you could possibly want, and in a few years it'll grow so complex and hairy that it will need its own fully Turing complete sub-application to configure it, lather, rinse, repeat. If you're really clever, every configuration layer will be written in a separate language with its own dependency tree, test framework, and toolchain.
Personally, I prefer not having to recompile just to change some variables and settings. I'm fine with INI or JSON (although I prefer Lua tables) when they're appropriate. The problem is not that configuration languages are a bad idea, the problem is interminable Turing creep and developers wanting every aspect of their applications to be as flexible and powerful as possible.
The premise of having a separate config language is fine - somewhere, somehow, inevitably, you're going to need a read-only data store for globals and references to system settings. You can hardcode all of those variables in your application or put them elsewhere.
> Configuration languages succeed and work just fine most of the time, despite their inadequacies and quirks, but no one ever complains about them when they work
I'm genuinely intrigued to see a non-trivial example of this, a configuration in use by an organisation with complex needs, describing its applications and its cloud infrastructure, while avoiding the pitfalls you describe that bedevil the use of Turing complete languages.
Let’s raise a glass to our blue eyed comrades who’ll adopt this fully, only for google to decide in 2-3 years to completely and aggressively kill it again for no apparent reason.
Pardon for slamming my ignorance down on the table here: I know Google has a tendency of killing products but I’m not familiar with many of their engineering tooling being similarly sunset with comparable frequency.
Any named examples? I’m sure there are-and likely they just elude me at present, maybe seeing their names will jog the memory probably?
You don't hear as much about developer tools being sunset because they rarely impact as many people and that makes less interesting news.
Examples include: AngularJS (replaced by a rewrite of angular effectively), GWT (donated as 'open source' with minimal continued google involvement), basically 90% of all 20% projects by googlers that weren't official google stuff (too many examples to name, practically all of them), the xmpp api for google talk (and all associated library code, including some open source xmpp extensions, libjingle), tons of chrome and android libraries that were killed/deprecated as part of new versions not using them, ARC (https://en.wikipedia.org/wiki/Google_App_Runtime_for_Chrome), the caldav API for google calendar and any associated libraries...
I could go on, but the majority of the relevant examples are the 20% projects that never made it, and I'd rather not list any of those since they're largely single-person projects and it's kinda personal to comment on any of em.
How much support does the configuration language that basically converts to json need? If your <insert language> here supports json then the support required is a library to convert cue -> json or at worst if nobody cares about the format anymore a program that converts one time from cue -> <insert new format here>.
Worst case scenario it has a short life and is forgotten for a better OSS alternative but remains the default for several Google projects and a number of developers are forced to learn and maintain it as it becomes brittle with age.
With that as the worst case scenario, it's more than worth the try.
What i want to see, is a company which will provide as a service, basically devops deployment, monitoring and provisioning regardless of the cloud provider.
Such that i can say “deploy this” and it will eval, track and monitor the cost of the deployment across aws, azure, gcp, etc and i can click controls to see/kil/scale wherever...
Thus, i dont care about cloud provider deployment language, etc...
If the authors would like to discuss how Tree Notation may be a better syntax for this language, please feel free to get in touch: breck7@gmail.com or yunits@hawaii.edu.
Fact: I've commented on fewer than 1% of HN posts today alone.
Fact: my post is very relevant to the OP and I'm offering to help them.
Fact: I've been a member of this site for over 12 years, and never comment on a post unless I think it adds value to the discussion or would be helper to the parent.
That's a good suggestion. I didn't think of that. But thinking about it, I guess I don't want to bother the OPs inbox. If they are interested, they can get in touch.
> added literally nothing to the discussion.
I disagree. When I post a new language, and someone shares a link to a related language, those are often the most valuable comments.
Validating, defining, using data: sound like things Tree Notation syntax is perfect for. Cue's semantics are great, and presentation and execution, I just think potentially a syntax switch is worth exploring. I understand the strategy to be able to parse JSON as cue, and that's probably the way to go for now, but in the future Tree Notation syntax might offer compelling advantages.
Cue seems like an e2e solution so it's not only an alternative to Jsonnet, it also removes the need of JSON Schema, OpenAPI, etc. so given that it's a 5 months old project, still has too much time to evolve and be mature.
We're heavily using Jsonnet for our data modeling (https://github.com/rakam-io/recipes) and pretty happy with it. We also have plans to add support for JSON Schema which is adopted by many of IDEs so that VSCode makes us feel like we're writing Java, not a Jsonnet file.
Cue is Google's 6th attempt and given that Jsonnet already has traction and works well out of the box, I would invest my time into Jsonnet at this time.