Clue: an ANSI C Compiler targetting high-level languages

forgotusername · on May 3, 2010

I can't help but wonder how much library/server side legwork would be required to get a functional version of mutt running in a browser.

sparky · on May 3, 2010

Their C backend, at least, uses doubles for all numbers. I initially thought many programs written that make nontrivial use of ints would be broken, but thinking about it a bit more, I suppose it's not a problem for 32-bit ints, since doubles have 52 mantissa bits. What happens if someone makes good use of uint64_t's though? Seems like floating-point imprecision could bite you in the ass.

DrJokepu · on May 3, 2010

Actually, IEEE-754 floating point numbers (which C doubles are in most circumstances) have a sign bit while most modern architectures / compilers use two's complement for denoting negative integer numbers, so even doing non-trivial sign-related things with signed 32 bit integers in C code would potentially break.

_delirium · on May 3, 2010

True, but that's not that common, is it? The C standard doesn't let you assume two's complement, and while it's nearly universal that you can assume it in practice, I also don't frequently see apps doing manual bit-twiddling that relies on the representation of negative numbers. Code that gets run through lint would also have that flagged.

azim · on May 4, 2010

One thing that's actually extremely common, however, in C code is using ints for bitmasks. I'm not sure how operators like &|! would be implemented with doubles, but they'd need some fancy software implementation.

_delirium · on May 4, 2010

In the C backend at least, it appears to cast to an int, perform the bitwise operations, and then cast back.

Just trying it out now, the function:

  int foo(int bar)
  {
     return bar & 0x01;
  }

compiles to:

  clue_real_t _foo(clue_real_t fp, clue_optr_t stack, clue_real_t FLOAT0) {
  clue_real_t sp;
  clue_real_t FLOAT1;
  clue_real_t FLOAT2;
  sp = 0;
  sp =  fp +  sp;
  FLOAT1 = 1;
  FLOAT2 = (clue_realint_t) FLOAT0 & (clue_realint_t) FLOAT1;
  return FLOAT2;
  }

mhansen · on May 4, 2010

Javascript faces similar problems - it has no int type - all numbers are represented as 64-bit doubles. &|! operations are done by converting the double to an int, performing the operation, then converting back to a double.

celoyd · on May 4, 2010

Just to be clear, when you say they are done like that, those operators are in the spec and that’s how they act; it’s not just a standard hack. So there is (for example) a ToInt32 function in the spec but not in the language. This surprised me. (I was playing with cryptography in JS and needed to do lots of bitwise operations on hand-made bigints. It was easier than I expected.)

mhansen · on May 4, 2010

Can I see your work somewhere? I've been wondering about crypto in javascript. I checked your profile for an email, but it doesn't show.

celoyd · on May 4, 2010

It was an implementation of Trivium, an extremely simple stream cipher. I was also thinking of doing Salsa20, but I got distracted.

The code: http://basecase.org/trivium/trivium.js

Demo: http://basecase.org/trivium/

Honestly, I lost interest once I got it working with simple test strings. If you do something cool with it, let me know!

Edit: Looking at it again, my code is badly undercommented – sorry. But if you follow along with the Trivium specification (http://www.ecrypt.eu.org/stream/ciphers/trivium/trivium.pdf), which is a model of clarity, it should all make sense mod renaming of variables etc. The outstanding bug is that it assumes that any sequence of 16-bit values can be taken as a UTF-16 string, which is not true. Will e-mail you some general thoughts when I have a few minutes.

DrJokepu · on May 3, 2010

Yes, I don't think it's that common at all. In fact, the only place I would expect to find such things is the source code of 90's video games where developers tried to grasp every bit of performance they could with series of unholy tricks.

sparky · on May 3, 2010

Yeah, the difference in representations could be a problem. I think a lot of it could be dealt with (with much difficulty) by the compiler -- for instance, you might be able to use the mantissa bits as a two's-complement int and synthesize proper arithmetic and comparison behavior out of unholy combinations of floating-point operations -- but in general, if someone depends on the bit pattern explicitly (as in compressed data, bloom filter, bit flags, ...), you're going to have a hell of a time getting correctness, much less performance.

caf · on May 4, 2010

It says it only supports C89, and the largest type that C89 specifies is `long`, with a minimum size of 32 bits. So I presume it just doesn't provide `long long` (or `[u]int64_t`)

stcredzero · on May 3, 2010

They could be a bit more flexible in their treatment of numbers. With Smalltalk bytecode, they could actually compile to 16 and 32 bit int. This would result in faster runtimes as well as more programs that would work.

modeless · on May 3, 2010

Also see Adobe's work on a C-to-ActionScript compiler: http://labs.adobe.com/technologies/alchemy/

I'd love to see an updated version of that benchmark table with scores for V8, Nitro, and TraceMonkey.

wendroid · on May 4, 2010

Newest Files clue-0.5.tar.bz2 119.5 KB 2008-12-14

hernan7 · on May 3, 2010

> Why? What do you mean 'why'?