Google Sawzall now open source

DrJosiah · on Nov 3, 2010

I took a Sawzall class for a week at Google along with my team, our manager had requested it because there was a lot of things that we wanted to do with logs in the long-term, which were quickest to do with Sawzall. The people who taught the class were awesome, but I can't help feeling that they were a bit apologetic for 1) knowing so much about the language, 2) having to teach us, and 3) not being able to change the language in any way. But maybe that's hindsight and altered memories based on the dinner and drinks we shared after the week of classes.

The opacity of the language lead to a shared conclusion of those in the class (some of whom were taking it as a refresher), that all of the unique Sawzall code at Google had already been written in the first few months of it's use, and everyone else had just been copying and pasting snippets from everyone else's scripts.

I can understand why Google released it, it's the start to a halfway decent map-reduce implementation, having a low-overhead startup, quick runtime, etc. (compared to Python, which had been an initial logs processor, but which was punted for logs-processing mapreduces thanks to it's relatively high startup costs compared to processing time). But with things like Hadoop (and it's support for arbitrary languages for operations), I can't help but feel like this is a little late to the open source game.

Also, back at Google, I had the start to a project to translate a subset of Python to Sawzall in order to allow for people to not have to suffer, and potentially to write better logs processing code. Left before even getting close to finishing it.

andrix · on Nov 3, 2010

I'm wondering if this is the start of a set of open source lunching from google of its core tools. AFAIK, sawzall without mapreduce is like a car without the engine, but anyway I'm very pleased to read the code and start trying the language. Kudos to Google!

michaelfairley · on Nov 4, 2010

Google's implementation of MapReduce is so tightly bound to their internal infrastructure (GFS, BigTable, etc.) that opensourcing wouldn't do anybody much good.

eof · on Nov 3, 2010

Do you mean an engine without a car? It seems like you mean sawzall is only useful for mapreduce.

DrJosiah · on Nov 4, 2010

Sawzall as a language is quite a bit uglier than the vast majority of general purpose languages. Couple this with the read/emit nature of the language, and it's either useful as a stream processing language, or as a step in a mapreduce chain.

Given how easy other languages are at processing streams, tagging output, etc., and that Sawzall doesn't really have an idea of shared state between "records" (aside from data emitted), it's hard to find things that Sawzall is good at other than mapreduce.

DrJosiah · on Nov 4, 2010

Also, the language was purpose-built to be used as a step in mapreduce, and not as a general purpose language.

eagleal · on Nov 4, 2010

You can always use it with Yahoo!'s s4 (released yesterday). http://wiki.s4.io/Manual/S4Overview

DrJosiah · on Nov 4, 2010

Or I could use any one of a dozen other languages that are more convenient to use, already available on my system, already works with S4, and with a syntax that doesn't make me want to cry :P

msbmsb · on Nov 4, 2010

Buzz post from a Googler on the release: "We have inflicted Sawzall on the world."

http://www.google.com/buzz/100587561646339426146/2Ehn2s7Nf1D...

tav · on Nov 3, 2010

Sawzall Language spec: http://szl.googlecode.com/svn/doc/sawzall-spec.html

You might also want to take a look at Cascalog. It offers a much nicer API — just list the various predicates and it'll take care of the rest for you, e.g. to find all the guys following Emily:

  (follows "emily" ?person) (gender ?person "m")

http://nathanmarz.com/blog/introducing-cascalog-a-clojure-ba...

tedunangst · on Nov 3, 2010

blech. html with content-type: text/plain == :(

mccutchen · on Nov 3, 2010

That's because it's being served as raw text data from the Subversion repo.

mbreese · on Nov 4, 2010

That's no excuse. You can always set the svn:mime-type property.

mccutchen · on Nov 4, 2010

Haha, I think it's a great excuse. But I just don't see the value in telling my version control software about the MIME type of every single file in my project.

vomjom · on Nov 3, 2010

The paper: http://research.google.com/archive/sawzall.html

On a related topic about another Google project: http://sergey.melnix.com/pub/melnik_VLDB10.pdf

swah · on Nov 3, 2010

If you want to compile on a Mac and you use Homebrew:

  brew install binutils
  cd /usr/bin/local; ln -s gobjdump objdump
  brew install icu4c
  brew link icu4c
  ./configure
  make

tav · on Nov 3, 2010

It's worth noting that you can trace parts of Google Go's lineage to Sawzall.

supersillyus · on Nov 3, 2010

Interesting. Can you elaborate?

cdavid · on Nov 3, 2010

R. Pike was involved in both projects, if that's what is meant by lineage. I am basing this on the author list in the sawzall paper, I am not working at google.

chopsueyar · on Nov 4, 2010

Why has Sawzall not experienced the Firebird moment yet?

http://news.cnet.com/2100-7344-5156101.html

DrJosiah · on Nov 4, 2010

Before Google's Go, there was another language named Go. "Chrome" is the name of the UI outside of the content in Mozilla's browser, email client, etc. Google has a smaller-scale ad-hoc SQL-inspired language and system for log queries called Dremel (which is awesome to use, I miss the hell out of it).

Sawzall as a physical tool falls into the same category as Dremel, which hasn't seemed to have been an issue. Never mind the "Go" and "Chrome" names.

jbeda · on Nov 4, 2010

If you miss, dremel, you might be interested in BigQuery ;) It is essentially hosted Dremel. Check it out: http://code.google.com/apis/bigquery/

chopsueyar · on Nov 4, 2010

What happened to Dremel (software)?

DrJosiah · on Nov 4, 2010

As far as I know, nothing. But since I no longer work for Google, I would need to set up a dremel-like system (which would minimally include needing to inject logs, etc., into an new or existing hosted system), which increases the barrier to entry until I have the time to do so.

nlake44 · on Nov 4, 2010

Anyone got an idea how hard it might be to port this to work on top of Hadoop?