I took a Sawzall class for a week at Google along with my team, our manager had requested it because there was a lot of things that we wanted to do with logs in the long-term, which were quickest to do with Sawzall. The people who taught the class were awesome, but I can't help feeling that they were a bit apologetic for 1) knowing so much about the language, 2) having to teach us, and 3) not being able to change the language in any way. But maybe that's hindsight and altered memories based on the dinner and drinks we shared after the week of classes.
The opacity of the language lead to a shared conclusion of those in the class (some of whom were taking it as a refresher), that all of the unique Sawzall code at Google had already been written in the first few months of it's use, and everyone else had just been copying and pasting snippets from everyone else's scripts.
I can understand why Google released it, it's the start to a halfway decent map-reduce implementation, having a low-overhead startup, quick runtime, etc. (compared to Python, which had been an initial logs processor, but which was punted for logs-processing mapreduces thanks to it's relatively high startup costs compared to processing time). But with things like Hadoop (and it's support for arbitrary languages for operations), I can't help but feel like this is a little late to the open source game.
Also, back at Google, I had the start to a project to translate a subset of Python to Sawzall in order to allow for people to not have to suffer, and potentially to write better logs processing code. Left before even getting close to finishing it.
I'm wondering if this is the start of a set of open source lunching from google of its core tools. AFAIK, sawzall without mapreduce is like a car without the engine, but anyway I'm very pleased to read the code and start trying the language. Kudos to Google!
Google's implementation of MapReduce is so tightly bound to their internal infrastructure (GFS, BigTable, etc.) that opensourcing wouldn't do anybody much good.
Sawzall as a language is quite a bit uglier than the vast majority of general purpose languages. Couple this with the read/emit nature of the language, and it's either useful as a stream processing language, or as a step in a mapreduce chain.
Given how easy other languages are at processing streams, tagging output, etc., and that Sawzall doesn't really have an idea of shared state between "records" (aside from data emitted), it's hard to find things that Sawzall is good at other than mapreduce.
Or I could use any one of a dozen other languages that are more convenient to use, already available on my system, already works with S4, and with a syntax that doesn't make me want to cry :P
You might also want to take a look at Cascalog. It offers a much nicer API — just list the various predicates and it'll take care of the rest for you, e.g. to find all the guys following Emily:
Haha, I think it's a great excuse. But I just don't see the value in telling my version control software about the MIME type of every single file in my project.
R. Pike was involved in both projects, if that's what is meant by lineage. I am basing this on the author list in the sawzall paper, I am not working at google.
Before Google's Go, there was another language named Go. "Chrome" is the name of the UI outside of the content in Mozilla's browser, email client, etc. Google has a smaller-scale ad-hoc SQL-inspired language and system for log queries called Dremel (which is awesome to use, I miss the hell out of it).
Sawzall as a physical tool falls into the same category as Dremel, which hasn't seemed to have been an issue. Never mind the "Go" and "Chrome" names.
As far as I know, nothing. But since I no longer work for Google, I would need to set up a dremel-like system (which would minimally include needing to inject logs, etc., into an new or existing hosted system), which increases the barrier to entry until I have the time to do so.
The opacity of the language lead to a shared conclusion of those in the class (some of whom were taking it as a refresher), that all of the unique Sawzall code at Google had already been written in the first few months of it's use, and everyone else had just been copying and pasting snippets from everyone else's scripts.
I can understand why Google released it, it's the start to a halfway decent map-reduce implementation, having a low-overhead startup, quick runtime, etc. (compared to Python, which had been an initial logs processor, but which was punted for logs-processing mapreduces thanks to it's relatively high startup costs compared to processing time). But with things like Hadoop (and it's support for arbitrary languages for operations), I can't help but feel like this is a little late to the open source game.
Also, back at Google, I had the start to a project to translate a subset of Python to Sawzall in order to allow for people to not have to suffer, and potentially to write better logs processing code. Left before even getting close to finishing it.