Hacker News new | past | comments | ask | show | jobs | submit login
Q – A Data Language (q-lang.io)
220 points by miloshadzic on March 3, 2014 | hide | past | favorite | 81 comments




Also, naming a programming language with a single letter is not very search-friendly.


As someone learning and using R I agree 100%


With q.js I could feel the same as well.


Did you find Rseek? http://www.rseek.org/


Non-rhetorical question: Why do people continue to do this? The number of unique names still available for a programming language is basically infinite, yet we get stuck with Go and R and C#.

Google is pretty smart about categorizing searches for languages like Ruby and Java (both common words), but with R especially it's often quite difficult to get relevant results.


R predates Google, and derives its name from S


I vote for the APL derivative.


It is probably the most astonishing combination of simplicity, efficiency, and beauty that I've seen in computing.

Unfortunately, it's so out-there that nearly everyone dismisses it as unreadable and daft. In a better world, it would be taught and studied as an exemplar of design. (On the other hand, they've succeeded financially so how cool is that.)


My experience with Q was somewhat less inspiring than yours. It has some powerful core primitives but is generally poorly designed and terribly implemented. Single letter function names (and error codes) don't actually make a language any more expressive.


I can understand why you'd say poorly designed, because the tradition it comes out of is so austere. But "terribly implemented"? That detracts from what you're saying. The thing achieves performance everybody else dreams of.

No doubt its error handling leaves a lot to be desired, but single-character names are a different matter. That's part of the APL style, and it's a mistake to reject it out of hand. The trouble is that nearly everybody does, because it's so beyond the pale.

Edit: Perhaps I should explain what I mean about the APL style.

Languages in this style are not "unreadable"; rather, they trade lexical readability for whole-program readability. Their emphasis is not on names, but on operators; specifically, operator composition. Their advantage is the astonishingly high-level power of their operator strings—sequences of composed operators, each of which passes its output to the next, like Unix pipes.

After a while, typical operator strings become recognizable as idioms, making them leap out at the reader. This allows one to comprehend quickly what would take many lines of code in most other languages. True, each of those many lines would be more lexically "readable" in the sense that you could grok its individual tokens more easily. But that's not how you comprehend what a program is doing. To do that requires grokking the intent of a bunch of lines together. In other words, program intelligibility is not primarily a matter of token readability but of whole-program comprehension. We're just so used to one way of doing it that we reject all others as "unreadable". We forget that variable and function names are not an end in themselves, but a means to an end. APL-based languages have a different means to that end.

If APL-style programs were to use long expressive names like other languages, the shape of their operator strings—the most important thing—would be obscured, and you would lose comprehensibility rather than gain it. The role that names play in such programs is different. They are placeholders. Like colored beads, they occupy certain positions in the operator strings and demarcate what the code is doing. If these names were even half as long as what you use in conventional languages, they would overwhelm the code so much that it would consist of nothing but names with a few operators scattered here and there. The program's structure would then be less accessible.

Another way of saying this is that names are so much shorter because everything is so much shorter, which is what makes the operator-oriented style so powerful.

It's true that such a language is symbolic, even cryptic, compared to most. But that is a bad reason to dismiss it, especially before one's eyes have adjusted. All programming languages are symbolic and cryptic before one has learned any. The reason why programmers take for granted that conventional languages are more "readable" is that these languages are related to each other and we all know one of them. We are like Spanish speakers saying that French is more readable than Russian. Since we all (to extend the analogy) know at least one Latin language, we take this view for granted when it's really just relative and—if one wants to push the point—false.

Now where are beagle3 and silentbicycle to back me up.


I think this article helps to better understand this point of view:

http://kaleidic.com/2010/readable-code/

J is a language, like Q and K, also derived from APL [1].

Instead of depending upon layers of abstraction through variable and function naming, this style has a fundamental set of vocabulary (operators) that is insanely expressive in collusion. This allows for concise formulation and expression of algorithms.

However, these languages do not excel in every domain. For example, they're incredible for manipulation of homogeneous data sets, but I haven't heard of an idiomatic way to do heavy socket programming (yet).

[1]: http://www.jsoftware.com/jwiki/Studio/TasteofJPart1


Interesting piece. This line:

More powerful programming languages rely less heavily on naming

... reminds me of section 1.1. of Compiling With Continuations:

The beauty of FORTRAN—and the reason it was an improvement over assembly language—is that it relieves the programmer of the obligation to make up names for intermediate results.

I remember going "Wha?" when I read that.


It makes a certain amount of sense though: Just think of all the times, in all seriousness, one uses `tmp` or god forbid, `tmp2` as a variable name.


> The thing achieves performance everybody else dreams of.

Citation needed? In experience (implementing simpler ML algorithms across Q, Python, and Matlab), the performance of non-trivial programs (using multiple operators together) is middling at best, even for an interpreted array language-- and orders of magnitude slower than an optimized native implementation.


Did you post your implementation to the mailing list?

I highly doubt Python or Matlab beats q, but I'm biased.

I'm not sure I understand the comparison to a "native implementation". You mean C? How about assembly? Why would you compare an interpreted language to those? Just use C, if you can.

If you don't tell us exactly what you were trying to do and how you approached it, then there's no reason to give any weight to your comment.

In other words, where's _your_ citation? Give us the problem you were trying to solve.

If you do not give anyone else a chance to offer a solution, then you never really know if you have tried the best approach.


Ok, clearly not everybody. But I'm surprised to hear that. You're the first person I've heard say anything like it.


Is implementation really about what you name the functions?

I thought naming functions was "programming style".

And implementation was about the math.

The quality of the latter being measured by profiling and performance.


I found that APL/Q/J and vim share the same required mindset. If someone loves one, it is likely he/she will like another.


I'm not sure (I'm actually not sure what I think of APL/Q/J yet) -- maybe someone that likes vim script would also like APL. Not sure about vi(m) the editor/modal editors.


That's what I thought clicking through here.


There's also a tool named q to query text files: https://github.com/harelba/q


And a promise library: https://github.com/kriskowal/q


Kx's Q is used widely in finance. Pretty funky, but I used it a bit in my last job.


Q has a successor called Pure, which is a very interesting language: http://purelang.bitbucket.org/.


And also we are releasing a platform soon called Q

http://github.com/EGreg/Q


Then it's not too late to change the name to something that people will actually be able to find information about. Be glad you read all this feedback today and not the day of your release.


I am pretty sure "Q platform" will be unique


Why downvoted?


STOP NAMING THINGS Q IS WHY.


NO!


I am surprised that EDN is not making more progress as a data interchange format, especially now that it has fantastic validators and coercion libraries, such as Prismatic's Schema:

https://github.com/prismatic/schema

This is, by far, the most intelligent data exchange system I have ever seen. It offers a richer language of types than what is offered by JSON, it is as readable as YAML, it is far more concise than XML. And yet it seems unable to break out of the world of Clojure. But it deserves to be taken seriously as a data interchange language.


EDN [1] has one flaw. It is not JSON. I will just repost the response [2] I received when I mentioned wanting EDN support in PostgreSQL.

> So the same argument as (say) YAML, Lua tables or TNetStrings. Or, if you include binary representations, stuff like Thrift, XDR and ASN.1. You can embed a JVM in PostgreSQL and write the functions yourself (or use PL/Scheme I suppose), but I would be basically amazed if anyone decides to do it for you. JSON gets the nod because it's understood by billions of systems. Other formats are going to struggle.

While I do not agree with his position, it is shared by many people.

[1] https://github.com/edn-format/edn

[2] https://news.ycombinator.com/item?id=5548404


Yes, JSON is "understood" in the sense that it is easy to parse, yes. That doesn't mean actually working with the results is any easier. (1) Without a schema, doing so can actually be harder. (2) JSON leaves out many useful data types and it is not extensible. So, perhaps JSON has "succeeded" by lowering expectations of data interchange. :)


I really like this. In general, a well-supported non-crappy schema language for modern data exchange formats is long overdue (e.g. JSON Schema is horribly un-human-friendly). The fact that they aim higher and allow for a much more detailed set of validations is great, IMO.

I hope that this becomes a success and gets implementations in many programming languages.


For "human-friendliness" (eg. hand-edited configs), I'm all in with YAML. AFAIC, JSON is strictly for machine to machine data transmission.

While I think the idea is neat, a data language should not include processing logic. It's a wash where the constraints get specified, and in my experience, a single point of implementation is far more maintainable.


> I really like this. In general, a well-supported non-crappy schema language for modern data exchange formats is long overdue (e.g. JSON Schema is horribly un-human-friendly).

I think I'm a bit confused, is Q the actual schema or a validator of the schema? I can also validate a JSON document using JavaScript and respond with a true or false based on whether it meets certain criteria. So I think that there's something amiss with your statement above, specifically that "JSON Schema is horribly un-human-friendly". Doesn't Q take JSON and let the user know if it's correct or incorrect? WOuldn't JavaScript, or any other language at that, do the same?


JSON Schema refers to http://json-schema.org/

I don't personally understand what is so "un-human-friendly" about it. It is easy to use and understand in my opinion.


I know what JSON is :)


JSON Schema != JSON.


have you seen orderly?

http://orderly-json.org


For data exchange, what is wrong with using an ASN1 specified, BER encoded file? Genuine question...


The binary encodings are cumbersome and difficult to perfect.


Isn't it simply a matter of delegating it to a library -- the encoding and decoding I mean? I have had occasion to use Java and Python libraries for this and faced no problems.


I got excited at first - I thought they were referring to the Q language from KDB. This reminds me quite a bit of Google's protocol buffers.


Same here. I'm interested in figuring out q/kdb, but it seems a bit opaque.


As someone who had to teach myself kdb/Q recently, I can recommend the Q for Mortals book: http://code.kx.com/wiki/JB:QforMortals2/contents


This is a bit of a plug but anyone interested in learning about kdb/Q can take a look here: http://www.timestored.com

They have some interactive, online kdb/Q tutorials with programming questions and answers to try out the language and get a feel for it.

There is also an IDE for day-to-day users of kdb/Q with lots of convenient features


It is opaque - but somehow big banks and other clients dish out millions of dollars in licensing fees to bring this in... and the reason is that it works really well as a time-series tick-data store ...


A "data language", in this context, is a language for representing and validating data. I'm not clear on why it's called "Q", or why it calls marshalling/unmarshalling undressing/dressing. The language itself looks alright, it's dependently typed. It doesn't say anything about the speed of the existing JS/Ruby implementations.


Perhaps the name is based off the R language (which came from S), a major data/stats language. Data representation comes before analysis, hence Q?

Renaming marshaling, couldn't tell you.


It's strange to me how dependent Q becomes on its host language. Instead of defining its own types in order to sit between various serializations and host systems, Q seems to simply augment the Host with a few extra types.


I agree. This is something that XMLSchema got both right and wrong. The XMLSchema basic datatypes [0] are well defined and for the most part fit nicely in host languages. However, they really screwed up with the complex types making it difficult to map onto programming languages for multiple reasons (sequence/choice can't be easily modeled in most programming languages without intermediate types, extension versus restriction, derivation versus substitution groups.) For a good write up of the design philosophy behind XMLBeans, see:

http://davidbau.com/archives/2003/11/14/the_design_of_xmlbea...

[0] http://www.w3.org/TR/xmlschema-2/


In Clojure you can use schema (https://github.com/Prismatic/schema) which does the same thing without needing a separate language.


The point here is to have a separate language. Data interoperability is about having multiple languages, not having only one to rule the world.


I've done (limited) schema validation in Python. Schema exports reasonably well to EDN (it is just clojure after all). One can read EDN pretty well. I don't have a full-blown Clojure interpreter that would be necessary for arbitrary Schema, but I can handle the core types as well in Python (or random other language) as I could in Clojure/Clojurescript.


This is the second project named 'Q' that's been on HN in the last week or so. All cool ideas, but some Googling first by all parties would probably ease confusion.[0]

[0]https://news.ycombinator.com/item?id=7290655


It's interesting, but is it powerful enough? There's a great talk by Zed Shaw about implementing authorization code. I think the same issue holds here.

A language might work for simple and more complex type constraints, but at some point the constraints might become so weird that you really need a full blown turing complete language to define them.

And that point might be sooner than you'd like. Why not instead of using expressions in this constrained language, use something like Haskell?

It's got the super powerful typesystem, and you're certain that it will be able to express any constraint you throw at it.

That said, this language does look nice, and I haven't really tried to find out if there's any obviously important constraints you wouldn't be able to build in it.


> Why not instead of using expressions in this constrained language, use something like Haskell?

Haskell types can't depend on values so there are limitations. If you go that way, you might as well go whole hog and use a dependently typed language (e.g. Idris)


(a) As others have said, Haskell's type system is certainly now powerful enough to express any constraint you throw at it.

(b) Almost directly related to that, the less powerful your constraint language the more analyzable it is. I don't necessarily know that Q has these properties, but by constraining how complex the restrictions it encodes are it becomes more amenable to analysis. At first blush, it looks like you'd probably be able to throw a SMT solver at it quite easily.


> is certainly now powerful enough

Small typo there.


Ha, small vital one—obviously meant 'not'. Thanks!



Yes that's the one :)


> It's got the super powerful typesystem, and you're certain that it will be able to express any constraint you throw at it.

You can go pretty far with, eg, Java, if you're ready to pay the cost in verbosity, by making types for everything and having read-only objects. Haskell is not dependently typed, you can't use the type system to make sure a value is in a given range. You could use phantom types to make sure your type has been validated before, but this kind of control is not really first-class.


You might want to embed this inside a larger language. That wouldn't necessarily preclude using a Turing Complete language, but you wouldn't want it to be a complicated one.


Given that you use a powerful rdbms, such as postgres, you don't need anything more than sql constraints. For example, the constraint to check the temperature would be something like CHECK(temp between 33.0 and 45.0). To check that no two user accounts share the same alias, a simple UNIQUE(user_alias) will do. More complex constraints can be enforced using plpgsql.

Then name your constraints. Then when the sql layer throws them, it's trivial to remap them to user readable error messages. It works especially well in newer postgres versions where the error reporting is much more detailed and the messages easily machine parsable.


SQL constraints are a given; always use them. However, it's best practice to also validate your data before it gets to the database for several reasons.

A) Depending on a database check constraint failure is analogous to using a try/catch when a pre-emptive check would work; it's dirty.

B) You're putting unnecessary load on the hardest system to scale.

C) Users/systems will get instant feedback with a client-side validation. Depending on the data, this could be an enormous bandwidth savings.


I am not sure I would say "best practice" but I would rank it as "often useful." I'll explain.

I disagree with your characterization of using exceptions when a check condition behave identically. It's not dirty, it's exception oriented.

I disagree with your characterization of the database as the hardest system to scale. What's hard is fixing a performance problem in the database after an organization has effectively put the schema in a cast by developing 30 apps/scripts directly coupled to the core of the (broken!) schema.

I DO however agree that client-side feedback is important. Validation on the client might include a lot of data used for prompting, or perform validation only the user-interactive application can do. This exceeds the role of the data constraints you might implement in your database for obvious reasons.

I'm not sure Q does more than duplicate the constraints I would put in my database to start with. There's a lot more to collecting input than enforcing closed-form rules.


I disagree with most of what you said.

"exception oriented" programming is dirty. It's especially filthy when your expecting exceptions from a database for these reasons

a) Returning errors to the user is asking for a slipup handling the different exceptions, thus returning more info than desired. Cue the injection hacks!

b) Depending on the DBMS and its configuration, an upsert destined to fail may block reads and other writes from executing!

I had a feeling somebody would gripe about duplicated validation rules; it's a problem that needs an elogant solution on your own stack.


In real world situations you already need to go to the database to do validation. For example to check that a username is not already taken. And because of the race condition, you still need to be prepared to handle a duplicate-key integrity error when actually trying to create the user row.


I like it because you can practically use this to define a grammar for parsing a programming language, because:

1. Each data type can inline express its own value constraints (not just type constraints, but rather value constraints) which is key to source code parsing

2. With subclasssing you can combine data types into higher level objects that inherit all of the lower level value constraints. This is similar to how a recursive lexer drills down from a high level statement into the different parts of each statement.

3. The data type value constraints support sequences, alternatives, and unions, most of what a lexer needs to parse a language based on a grammar spec

With these ingredients you can actually parse code, and thereby write a code interpreter (or parser at least). You begin with a grammar rule (a high level data type) that defines a statement, and a statement is defined as a list of alternatives forms of a statement, and each statement form is defined as a sequence of keywords and supported value types, and so on down the language primitives.

... but I'm not sure if supports recursion, and thereby recursive descent parsing. Can you say something like "Sum = [Sum, Operand, Sum]" If not then you're limited to immediate values only and no nested expressions.


Am I missing something fundamental here? I think in the home page Q shouldn't be compared to just JSON as they are in principle two different things. If I were to write a 'data language' or to extend an existing language to cater for data, wouldn't I write my validators in that said language? If I am correct, then I was expecting Q to be compared to the closest other 'data language', not be compared with a document/schema.

For example, if I was doing it in JS, I'd say: var validDoc = true; var Temp = function (t) {return t >= 33.5 && t <= 45.0};

then when I get the doc, I loop through my conditions, as such:

validDoc = Temp(doc.temperature);

// reject if !validDoc

From here, the benefit of Q becomes that it's much easier to do validations, instead of writing them from scratch, considering that my example above only returns a true || false, as it doesn't have the error handling to let me know what went wrong and where.

One could also achieve some of what Q is doing by using an ORM if said data is going to a database/document store.

So, HN, am I missing something here?


Interesting... here is a tool, which is also going twoards data oriented approach.

"Drake - a kind of make for data"

http://blog.factual.com/introducing-drake-a-kind-of-make-for...


I can't count the number of times I have wanted to set up re-usable model definition and validation, but haven't been able to find a good portable library that isn't part of something much larger I don't need. This definitely addresses a space in need of more options.

My answer was modlr (https://github.com/jdc0589/modlr), which I have been ignoring, but Q looks like it could do well. I'm not super pumped about the name though, conflicts with super popular promise library lots of people already use.


I'm not very good at naming things. I'm aware of the name clash with many other tools and would really like to find another name. If you have any better idea, just let me know.


How about Scheme! ...er, no wait... Or Grammeme, the well-known combination of grammar and meme (https://en.wikipedia.org/wiki/Grammeme)?


Oh dear, this might conflict with a SQL variant we cooked up at my old company called Q-Lang. Where you could write a SQL statement, leave bits of the WHERE clause "blank" and it would generate a GUI interface with drop downs and calendars and everything to let people fill in those bits without too much effort. Nothing terribly fancy, but it let you go from a working example SQL statement to an integrated query interface in the app in about 10 minutes.

It had lots of limitations, but most of the time it absolutely annihilated some of our competition's multi-month integration engagements.


Q is trying to do both data and validation, which are very different problem spaces. I think designing for both involves serious tradeoffs for the individual parts.


Finally a new language that does something remotely novel. I like the blur between data and code here; (very) vaguely lispy.


where is all the theory behind q-lang coming from? are there theoretical principles behind this new schema language?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: