Hacker News new | past | comments | ask | show | jobs | submit login
Brainfuck beware: JavaScript is after you (2012) (patriciopalladino.com)
123 points by SworDsy on June 2, 2014 | hide | past | favorite | 37 comments



It's been a while since I last brushed up on the Geneva Conventions, but I think this violates most of them.


Yet another example of the horrible mess unprincipled automatic type conversion causes.

What prevents language designers from avoiding all these lurking bugs with a generic type-conversion operator? E.g. here's how it might look in a Python-like language:

    >>> x = 1
    >>> y = "2"
    >>> print(x + cast(y))
    3
    >>> print(cast(x) + y)
    "12"
    >>> print(cast(x) + cast(y))
    Exception: ambiguous type


How about something like this

    >>> print(str(x) + y)
    "12"
    >>> print(x + int(y))
    3
    >>> print (x+y)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: unsupported operand type(s) for +: 'int' and 'str'
That's already Python


you can already do exactly that in Python with a bit of evil magic:

  $ cat evil.py
  class cast(object):
      def __init__(self, x):
          if isinstance(x, cast):
              self.x = x.x
          else:
              self.x = x

      def __add__(self, other):
          return other.__class__(self.x) + other

      def __radd__(self, other):
          return other + other.__class__(self.x)
  

  $ python3 -i evil.py 
  >>> a = 1
  >>> b = "2"
  >>> a + b
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
  TypeError: unsupported operand type(s) for +: 'int' and 'str'
  >>> cast(a) + b
  '12'
  >>> a + cast(b)
  3
  >>> cast(a) + cast(b)
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "evil.py", line 9, in __add__
      return other.__class__(self.x) + other
    File "evil.py", line 9, in __add__
    ... repeated many times ...
    File "evil.py", line 9, in __add__
      return other.__class__(self.x) + other
  RuntimeError: maximum recursion depth exceeded


That's awesome. Have a unicode trophy (&#127942;) as a prize:

🏆


Idris has a type class that is pretty much this: https://github.com/idris-lang/Idris-dev/blob/master/libs/pre...

Here's an example interaction. There's a difference, though, since + is only for {Num instance,Fin} addition while ++ is {String,List,Vect} concatenation.

https://gist.github.com/reynir/b3d32f07d69366dd2bc3


> Yet another example of the horrible mess unprincipled automatic type conversion causes.

I think you can abuse nearly every language to write crazy code in one way or other.


In some dynamic weakly-typed languages you don't even need to try that hard to achieve the abuse.


Why is this preferable to a set of functions that return a particular type? Most uses of such a cast function would be something like

  cast(unknown_type) <op> known_type
The programmer could just write tostring or tonumber instead of cast, if the type of the other operand is known >_>


Automatic type casting remains popular because it reduces the syntactic and cognitive overhead. I'm suggesting that a better solution would be to retain explicit casting, but minimise the overheads.

So, to take things a step further, lets use, say, the $ operator for casting. Then we have:

    >>> x = 1
    >>> y = "2"
    >>> print($x + y)
    3
    >>> print(x + $y)
    "12"
    >>> print($x + $y)
    CastError: ...
What do you think?


Python example with str and int are IMHO much better. They require only little more typing, but are very readable and explicit.


The proper way to do this is using something like TryParse() and ToString() and to handle the case of failing to parse the string as a number and maybe even specifying the locale for both operations. If you are absolutely sure that parsing can not fail you can use Parse() and omit the error handling.


I'm curious on what the performance implications are. If I have some JavaScript on a webpage that I want to "obfuscate", would using this accomplish the task?


The performance seriously decreases. I haven't measure how much, but I guess it'd only work for very very small scripts.



once the eval() invocation is complete and the code generated/compiled the performance should be the same as the code normally written. Basically it's only compiler from []{}!+ to javascript.


However, your js files would get really big, and download latency could actually be a performance drop.


True, however they should be very well compressible by 'deflate' too, the main concern is the initial parsing grok by the JIT. Yet, on runtime the scripts will be good as any.


This isn't really obfuscating, as it is very easy to get the code back into normal js.


I've seen this before here: http://www.jsfuck.com/.

It's nice that this guy actually explains how it works


> For instance, here is 4: !+[]+!![]+!![]+!![].

Gotta love this language... It's a wonder we can get any work done with it. But, used with discipline, it's not too bad.


I am not a programmer so this is part curiosity and part criticism. Why do programmers seem to enjoy creating programs that satisfy some syntactical constraint? Is it a fun mental exercise, or or can you just admit that you're showing off on something that really does not matter?


«It is hard to write a simple definition of something as varied as hacking, but I think what these activities have in common is playfulness, cleverness, and exploration. Thus, hacking means exploring the limits of what is possible, in a spirit of playful cleverness. Activities that display playful cleverness have "hack value".»

https://stallman.org/articles/on-hacking.html

«The MIT group defined a hack as a project undertaken or a product built to fulfill some constructive goal, but also with some wild pleasure taken in mere involvement.»

http://en.wikipedia.org/wiki/Hacker_ethic


The author here: as mentioned in the article, it started as part of a security research, but once started I couldn't help myself from pushing it further and see how far I could get. So I guess it's curiosity and a fun mental exercise, and maybe probing yourself that you can do something that you thought was impossible.


Doesn't matter? There are still lots of places that try to filter user submitted html and javascript to "sanitize" it. And a lot of those filters are blacklist based rather than whitelist. But here we have an example of how it's possible to create any javascript program with purely non alpha-numeric input. I can guarantee you that such a result is immediately applicable to a lot of places around the web. It constitutes an attack vector making it possible to execute arbitrary javascript code in areas where that is allegedly blocked. Exactly how much of an impact that vector has is currently unknown, and hopefully not very large because many devs have realized the futility of trying to filter such things.

However, if there were a magic wand that you could wave which would show the maximum impact of a particular vulnerability and identify all of the sites in the world which were vulnerable and notified all of the site owners instantly with a full report I can guarantee you that there would be a lot of people out there with some sleepless nights and a lot of emergency work ahead of them.


I'd say both. But it does matter. It could for example be used in obfuscators. Brainfuck itself has formal proof of being turing complete, so the easiest way today to prove that a language is turing complete is to implement a brainfuck interpreter. Which is just a few lines of code instead of this: http://www.iwriteiam.nl/Ha_bf_Turing.html


He did indicate he is a security researcher and this technique would allow bypassing certain checks. No alphanumerics but then I would guess it would be easy to detect these scripts as malicious anyway if someone wanted to since most of this stuff is questionable in a standard script.


Waiving aside the practical applications for a second, remember that this is an art to us. We have a canvas and a set of paints, and we make stuff out of it. Exploring the limits of that medium is part of the fun, but also part of understanding our art. What are the consequences of this? How far can this envelope be pushed?

There was a guy who wrote an entire novel that never used the letter "e" [1]. Is that showing off? Maybe, I guess. It's also a really interesting exercise in writing.

[1] https://en.wikipedia.org/wiki/Gadsby_(novel)


I'd say it must be a fun mental exercise. While constraints in a language aren't 'fun' for me, choreography of the CPU (i.e the way the program is architected, written and executed), is.

Compare it to art where an artist only uses a pencil, or limited colour pallet, or limited materials.

Or even art from just a single, constant weighted line: http://www.ignant.de/2013/08/12/one-line-drawing/

There are other benefits to this program in general, one in the area of security and exploring the options for different attack vectors.


It is a fun mental exercise. I love this stuff, and that's the motivation.

I once wrote an emulator for a 4 bit microprocessor in Befunge (a 2D esoteric programming language). Then, I was definitely showing off something that really does not matter. 100% useless.

This is a little different. The motivation is the same, but it also proves that you cannot sanitise Javascript by removing letters or words. It's very easy to assume that such sanitisation works, and such an assumption can be a security-critical mistake. I've actually read this article before because I needed to solve such a problem.


One of the original aims of the Brainfuck language was to implement an interpreter that had a tiny memory footprint (one Brainfuck interpreter took up 186 bytes of memory). This has been a big concern among programmers for decades, although not so relevant these days.

Apparently BCPL [0], which dates back to 1966, had a compiler that took up 16k of RAM, some bootstrapped. BCPL influenced B [1], B influenced C [2].

[0] http://en.wikipedia.org/wiki/BCPL [1] http://en.wikipedia.org/wiki/B_(programming_language) [2] http://en.wikipedia.org/wiki/C_(programming_language)


this can be used to get around some forms of antivirus checkers/content inspectors.


I once did something similar in PHP.[1] View the semi-useful writeup[2] as well.

[1]: https://gist.githubusercontent.com/nubs/5849633/raw/78bae58f...

[2]: https://gist.github.com/nubs/5849633


This method could also be used to evade XSS filters.

For e.g. I believe encoding code using something similar is the only way to solve this challenge - http://escape.alf.nu/9/


Encoder for a similar subset: http://utf-8.jp/public/jjencode.html


This was a brilliant article, I learned a ton, thanks.


{Brace} yourself before (and after) reading this.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: