Squeryl: A concise, type-safe Scala ORM and DSL

vog · on Sept 26, 2010

This seems to be a very promising and well designed project. However, there one part that puzzles me. On the one hand,

> SQL's declarativeness is preserved, not encapsulated in a lower level API that requires imperative and procedural code to get things done.

but on the other hand

> A significant part of optimizing a database abstraction layer is to choose for every situation the right balance between fine and large grained retrieval, and the optimal mix of laziness and eagerness.

Why is that even relevant? If the possibilities of SQL are preserved, why can't you simply let the database aggregate your data and make it return exactly what you need? ... which will be what you'll send to the user, after some formatting/templating, so the really wanted result can't be too big.

In other words, the difference between fine and coarse grained retrieval only matters if you have a big intermediate result that you have to process aside from the database because your ORM doesn't allow you to do that within the database. Since Squeryl claims to be different in that regard, why does the retrieval strategy even matter?

barrkel · on Sept 26, 2010

I agree with you in the context of a single table. Introduce something a little more complicated, such as master/detail relations, however, and things aren't so straightforward. The mere fact of a foreign key in the child that the ORM can pull out of DB metadata (or have manually described to it, etc.) isn't enough to know if it's reasonable to be lazy or eager in retrieving child rows when getting the parent.

vog · on Sept 26, 2010

> I agree with you in the context of a single table.

I actually had complex joins in mind.

A good ORM (in the sense that most ORMs aren't good by this definition) should retrieve the child rows if and only if they are required for the final result. And it should request exactly the columns that are needed by the result. And, as already pointed out in my previous comment, it should hand over the aggregating to the database - that's one of the main things databases are good for.

ShabbyDoo · on Sept 26, 2010

"And it should request exactly the columns that are needed by the result"

Gavin King (the Hibernate guy) argued that all columns should be fetched by default because (A) the marginal cost of doing so is very low once the db has taken the disk hit for any data in the row and (B) an in-memory object cache is easy/efficient if entire objects (rows) are cached at once. I might be paraphrasing a bit much.

One could argue that, when some databases can return a subset of column data from indexes (vs eating the disk hit for the row data) that limiting the columns fetched could be much more efficient.

There are probably very few areas of any app I've worked on where I would have gotten significant performance gains by limiting results to a subset of columns.

barrkel · on Sept 26, 2010

It depends. If there are a lot of columns and you're only interested in a handful, and there's an index that contains all those columns, it should be much cheaper just to select the interesting columns, as all the data needed can be retrieved from the index. Asymptotically, the size ratio between the columns selected and the columns in the table gives the speedup.

ShabbyDoo · on Sept 26, 2010

"a big intermediate result that you have to process aside from the database because your ORM doesn't allow you to do that within the database."

Processing in the DB isn't always desirable or even possible. Let's say that you are writing the Mint.com killer, and you want to fetch a user's portfolio and query a bunch of 3rd party systems (Yahoo finance, whatever) for stuff related to individual holdings. Let's say you have a table of holdings (security symbol as business key) along with a related set of buy/sell records for each holding.

Let's say that, there a 5% chance that there's something interesting found from the 3rd party systems which requires the buy/sell history for evaluation (capital gains tax-related? I'm just contriving an example here). One strategy would be to do a big join fetch of the holdings and history table. This would be efficient from a DB perspective if you would eventually require a high enough percentage of the history records. But, if you only require a small fraction of them, then it would be better to lazily initialize the history set on a per-portfolio basis. The exact percentage at which you would be indifferent between strategies from a performance perspective would probably be dependent on so many things that you'd have to just experiment with both fetch plans.

Unless an ORM tool is smart enough to observe code execution paths over time, profile them, and dynamically adjust fetch plans accordingly (a cool idea?), the ORM client code has to be able to specify the strategy it thinks is optimal.

Granularity becomes important when doing batch-y stuff. Throughput is significantly higher when you select or insert records by the thousands instead of individually. However, you do take a big memory hit because you're keeping thousands of records around at once. So, if using table X for some batch activity running in only one thread of your application, it's probably best to use coarse grained fetching. However, if you're reading/writing as part of user transactions and are concerned about allocation/deallocation, record-by-record processing might be better. My knowledge of DB internals is too weak to discuss the trade-offs in the db of fetch granularity decisions.

jsean · on Sept 27, 2010

Squeryl is really awesome as far as scala-orms go. I've used it in a couple of hobby projects and currently am using it in a bigger, and hopefully revenue bringing, project (a mix consisting of scala, wicket, squeryl and mysql. So far so good!)

Maxime, the guy behind squeryl is also very very helpful. I've posted a few questions on squeryl's group and have never waited more than a day for a reply.

Perhaps it's because squeryl still is relatively unknown, but still, this goes to show that this project at least has a Human Interface which is always nice.

Lastly, hopefully squeryl will get some more attention now that Lift has given it some official attention.

Anyways, yeah. Squeryl. Cheers.

mhansen · on Sept 26, 2010

Needs a code example on the front page.

pjscott · on Sept 27, 2010

I was about to post the same thing. Code examples are the second thing I look at when I see the web page for some new library or framework. The first thing I look at is the one-sentence description right below the title, if any.

You may notice that this maps pretty well to how pages on Github are set up. That's because Github has really nice design.

rue · on Sept 26, 2010

Seems decent enough, I should reintroduce to Scala.

The website does not degrade well, though: no JS == no code listings.

vog · on Sept 26, 2010

It also uses a strange apostrophe replacement in words like won't and SQL's. That is, it uses an accute accent over a space (’) instead of a simple apostrophe (').