Hacker News new | past | comments | ask | show | jobs | submit login

I have just spent the last 13 months significantly reworking a compiler that turns Excel-like formulas into JavaScript. The compiler is used by an app designer, and the formula language is meant to be easy to pick up for people used to spreadsheets. A significant difference compared to Excel is that our compiler features static typing, enabling helpful error messages to be produced at build-time.

(Spreadsheets, on the other hand, tend to try very hard to make sense of types at runtime, using type coercion to turn values of the wrong type into something of the expected type they can work with. That produces surprising results and confuses users. =IF(2, 3, 4) will generally produce 3, because 2 -- which should be a logical/boolean value -- is interpreted as TRUE. We think it's a better idea to warn the user that the first parameter is wrong, and to do that as soon as possible.)

By far the largest challenge with reworking the compiler has been the type system. I have expanded our type system from the four standard spreadsheet types to encompass more than 80, as well as added support for lambdas (for use with functions like SUMIF and FILTER), type inference, parameterized types and union types. The biggest addition is probably an imperative mode, that executes formulas with side-effects in response to events being fired. This is all in service of features we have wanted to add for years, that have been held back by our (previously) simplistic type system.

I did take a compiler class at university, more than 15 years ago, but while that proved very useful when writing the formula parser (an operator-precedence parser, initially straight out of a textbook), type theory was not part of the class. I had to pick up bits and pieces by reading the Java Language Specification (JLS) and by experimenting.

The Cornell class doesn't seem to feature type theory either, which I think is unfortunate. Now that the pendulum has swung, yet again, firmly in the direction of static typing, perhaps static typing will become a bigger part of CS compiler curriculums.




Your project sounds like fun. What/for whom is the project? What level of Excel formula compatibility are you shooting for, e.g. just in the neighborhood like Google Sheets, reasonably close as in Libre Office, or full compatibility with the latest version? If the latter you must have one gnarly test suite. Would love to hear more about it.


OK, I see it's https://www.calcapp.net which looks like a smart product idea. My best to you. Would still love to hear more about the other questions.


Thanks!


The compiler is for Calcapp, an app designer for Excel-savvy users. Our aim is to offer a function library that is easy to pick up for our target audience. That means that we generally follow Microsoft's lead in terms of naming and parameters, but we often offer extensions. Here are a few examples:

* Functions like SUMIF operate on a range or an array and use a textual mini-language to determine which values to operate on. =SUMIF(B1:B2, ">3"), where B1 contains 2 and B2 contains 4, returns 4, because only B2 satisfies the ">3" condition. We're not big fans of that mini-language (which expects you to use the string concatenation operator & to handle dynamic values). We support it, through an overloaded function, but also offer a version which takes a lambda as the second parameter. In Calcapp, SUMIF({ 2, 4 }, Item > 3) and SUMIF({ 2, 4 }, E -> E > 3) (with a named lambda parameter) are equivalent to SUMIF({ 2, 4 }, ">3"). The lambda syntax enables much more complex conditions to be expressed.

* Some functions like SORTBY take a sort order parameter, which is an integer where 1 means sort in ascending order and -1 means sort in descending order. We support functions with integer sort order parameters, but also offer an overloaded version taking an enumerated value instead (SortOrder.ASCENDING or SortOrder.DESCENDING), which we think helps with readability.

* Excel offers logical functions like AND, OR and NOT. We like the C-style operators &&, || and ! better, so we support them in addition to Excel's functions. https://www.calcapp.net/blog/2017/09/16/logical-operators.ht...

* Excel's logical operators (like =, <> and <=) return arrays when applied to arrays and ranges. For instance, { 1, 2 } = { 1, 3 } returns { TRUE, FALSE }. (This is easier to see in recent versions of Excel, where array results "spill," meaning that a return array with two elements spills to occupy two cells of the grid. In earlier versions, as well as Google Sheets, you'll have to use INDEX to tease out the results.) We support = and !=, but also == and !=, where the two latter operators always return scalar (non-array) values. == enables users to determine if two arrays are identical, without having to use AND on the result array (which is the standard Excel trick). Also, our == operator is more traditional in that string comparisons are case-sensitive (unlike =).

* Experienced Excel users like to use double negation to turn logical arrays into number arrays. We have a specialized operator, --, for that, because allowing two unary operators to follow one another would introduce other issues. Here's an example: https://exceljet.net/excel-functions/excel-sumproduct-functi...

* The Excel operators * and + can be used with logical arrays and ranges to signify logical AND, as well as logical OR, respectively. (https://exceljet.net/formula/sum-if-cells-contain-either-x-o...) We have overloaded versions of those operators that take logical arrays for that reason. (Again, Calcapp uses static typing, which makes this a lot more complicated for us than it is for Excel.)

* The latest versions of Excel have great support for dynamically-allocated arrays (along with new array functions like FILTER and SORT). Their new calculation engine supports users providing arrays where, traditionally, non-array parameter values are expected. If arrays are provided for these parameters, then an array is returned. For instance, =SQRT({ 4, 16 }) returns { 2, 4 }. Microsoft internally calls this "lifting" parameters. We refer to these parameters as being "array-capable." In our type system, an array-capable type is a union type (T|Array<T>), with the notable difference that if an array is provided to an array-capable parameter, then the return type is no longer T, but Array<T>.

We do have a test suite for the runtime formula function implementation with a couple of thousand unit tests. Given the number of supported formula functions, though, we should probably have many more.

Most of what I have described here pertains to a new Calcapp version, which hasn't been released yet. In particular, while I am done implementing the compiler, it still needs to be documented (lots of Javadoc), user-facing documentation needs to be written and the runtime system needs to be implemented, so we still have a few (fun!) months to go before everything is ready.

Thanks for taking an interest in what we do.


Looks like you’re doing everything right. That is just a fun project. Thank you so much for sharing.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: