Hacker News new | past | comments | ask | show | jobs | submit login
Google Sawzall now open source (code.google.com)
94 points by mokeefe on Nov 3, 2010 | hide | past | favorite | 25 comments



I took a Sawzall class for a week at Google along with my team, our manager had requested it because there was a lot of things that we wanted to do with logs in the long-term, which were quickest to do with Sawzall. The people who taught the class were awesome, but I can't help feeling that they were a bit apologetic for 1) knowing so much about the language, 2) having to teach us, and 3) not being able to change the language in any way. But maybe that's hindsight and altered memories based on the dinner and drinks we shared after the week of classes.

The opacity of the language lead to a shared conclusion of those in the class (some of whom were taking it as a refresher), that all of the unique Sawzall code at Google had already been written in the first few months of it's use, and everyone else had just been copying and pasting snippets from everyone else's scripts.

I can understand why Google released it, it's the start to a halfway decent map-reduce implementation, having a low-overhead startup, quick runtime, etc. (compared to Python, which had been an initial logs processor, but which was punted for logs-processing mapreduces thanks to it's relatively high startup costs compared to processing time). But with things like Hadoop (and it's support for arbitrary languages for operations), I can't help but feel like this is a little late to the open source game.

Also, back at Google, I had the start to a project to translate a subset of Python to Sawzall in order to allow for people to not have to suffer, and potentially to write better logs processing code. Left before even getting close to finishing it.


I'm wondering if this is the start of a set of open source lunching from google of its core tools. AFAIK, sawzall without mapreduce is like a car without the engine, but anyway I'm very pleased to read the code and start trying the language. Kudos to Google!


Google's implementation of MapReduce is so tightly bound to their internal infrastructure (GFS, BigTable, etc.) that opensourcing wouldn't do anybody much good.


Do you mean an engine without a car? It seems like you mean sawzall is only useful for mapreduce.


Sawzall as a language is quite a bit uglier than the vast majority of general purpose languages. Couple this with the read/emit nature of the language, and it's either useful as a stream processing language, or as a step in a mapreduce chain.

Given how easy other languages are at processing streams, tagging output, etc., and that Sawzall doesn't really have an idea of shared state between "records" (aside from data emitted), it's hard to find things that Sawzall is good at other than mapreduce.


Also, the language was purpose-built to be used as a step in mapreduce, and not as a general purpose language.


You can always use it with Yahoo!'s s4 (released yesterday). http://wiki.s4.io/Manual/S4Overview


Or I could use any one of a dozen other languages that are more convenient to use, already available on my system, already works with S4, and with a syntax that doesn't make me want to cry :P


Buzz post from a Googler on the release: "We have inflicted Sawzall on the world."

http://www.google.com/buzz/100587561646339426146/2Ehn2s7Nf1D...


Sawzall Language spec: http://szl.googlecode.com/svn/doc/sawzall-spec.html

You might also want to take a look at Cascalog. It offers a much nicer API — just list the various predicates and it'll take care of the rest for you, e.g. to find all the guys following Emily:

  (follows "emily" ?person) (gender ?person "m")
http://nathanmarz.com/blog/introducing-cascalog-a-clojure-ba...


blech. html with content-type: text/plain == :(


That's because it's being served as raw text data from the Subversion repo.


That's no excuse. You can always set the svn:mime-type property.


Haha, I think it's a great excuse. But I just don't see the value in telling my version control software about the MIME type of every single file in my project.



If you want to compile on a Mac and you use Homebrew:

  brew install binutils
  cd /usr/bin/local; ln -s gobjdump objdump
  brew install icu4c
  brew link icu4c
  ./configure
  make


It's worth noting that you can trace parts of Google Go's lineage to Sawzall.


Interesting. Can you elaborate?


R. Pike was involved in both projects, if that's what is meant by lineage. I am basing this on the author list in the sawzall paper, I am not working at google.


Why has Sawzall not experienced the Firebird moment yet?

http://news.cnet.com/2100-7344-5156101.html


Before Google's Go, there was another language named Go. "Chrome" is the name of the UI outside of the content in Mozilla's browser, email client, etc. Google has a smaller-scale ad-hoc SQL-inspired language and system for log queries called Dremel (which is awesome to use, I miss the hell out of it).

Sawzall as a physical tool falls into the same category as Dremel, which hasn't seemed to have been an issue. Never mind the "Go" and "Chrome" names.


If you miss, dremel, you might be interested in BigQuery ;) It is essentially hosted Dremel. Check it out: http://code.google.com/apis/bigquery/


What happened to Dremel (software)?


As far as I know, nothing. But since I no longer work for Google, I would need to set up a dremel-like system (which would minimally include needing to inject logs, etc., into an new or existing hosted system), which increases the barrier to entry until I have the time to do so.


Anyone got an idea how hard it might be to port this to work on top of Hadoop?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: