Hacker News new | past | comments | ask | show | jobs | submit login
Google App Engine's Datastore Admin is Terribly Inefficient (marram.posterous.com)
48 points by marram on Dec 26, 2011 | hide | past | favorite | 17 comments



This is something that a lot of GAE developers misunderstand: put()ing a datastore entity is not a single write operation. There are indexes to update - in your case, lots of them - and updating these indexes can require several write operations. A simple delete is one write per index but changing a value can be two operations; one to delete the old index value and one to write the new one. And since each property has two indexes (ascending and descending), these numbers are X2.

If you create your own bulk delete method, you will find that it takes exactly as many write ops as the admin console tool.

You probably have defined more indexes on your entities than you need to - you will likely be able to make your app cheaper by removing unnecessary indexes. Managing indexes carefully is a critical part of making apps affordable on GAE.


I had "vaccumed" all indices referencing those entities before issuing a delete. Albeit, there was only once index per purged entity type. So this would not explain the 20x write operations.

Also, note that the deletions were through the "Datasore Admin" app, which was recently added. It is different from the classic Datastore Viewer.


You misunderstand how GAE indexes work.

There are two kinds of indexes:

* multi-property indexes which you configure via datastore-indexes.xml (or yaml). You can remove these by removing them from the xml/yaml and vacuuming.

* single-property indexes, which you decide when you define your data model. You can't vacuum these, and they are defined on a per-entity basis. The only way to make them go away is to re-save the relevant entities without the index defined. Note: multiproperty indexes require single-property indexes on all the properties covered.

These single-property indexes are almost certainly causing your high write op counts. You really should examine your data model with this new understanding; by removing unnecessary single-property indexes, you may be able to dramatically reduce your bill.


A GAE datastore delete takes multiple operations because it also updates indexes:

1 entity delete = 2 Writes + 2 Writes per indexed property value + 1 Write per composite index value

All from this page: http://code.google.com/appengine/docs/billing.html#Billable_... And more about why it is so: http://code.google.com/appengine/articles/life_of_write.html

Well, the OP is just another coder who can't read docs, but can write a blog.


Regardless of how decent Google's AppEngine documentation is, this is indeed a bug.

The correct behavior would be to recalculate the indices just once, instead of reindexing after every single delete operation.

It then becomes

    2*entities + 2*indexed property values + composite index values
operations to delete all entities in the datastore, instead of

    2*entities + 2*entities*indexed property values + entities*composite index values
operations.


To delete all entities should be free. Who cares about indexes? `rm -rf`, done.


In any case, good that something is pointed out that can and will be easily overlooked :)


I deleted all indices referencing those entities before starting the entity deletions.


I'm pretty sure this uses the map reduce API which has a lot of overhead in the datastore. In principle map reduce is nice because it could make very large jobs fast. But since Google engineers don't pay for anything, they optimized for time, not cost.

And with regards to your script, you can't just delete 3k keys in one request. If you want I'll send you the script I've adapted for jobs that make large changes to the datastore.


From my experience purging data via MapReduce API use a lot less write quota than admin interface (but with a bit of instance hour overhead which doesn't seems like a problem)

I can't remember the exact number but it was about 10 times less than deleting via admin interface and finish in 5 minutes rather than 3 hours.


I meant that I needed 3k requests to finish the job, deleting 1k entities in each request :).


Should Google refund developers when they make an uninformed decision that costs them money?

One could argue it is a bug in GAE that allows developers to make an expensive mistake when they don't fully understand how something (fairly complicated) works.

Someone else could argue that we are all developers and we should know the costs associated with the systems we are building. There is a real cost associated with PaaS systems like GAE.

What do you think?


I ran into the same issue. If you want to purge all data from an app, it's much cheaper (and sometimes even faster) to start over and create a complete new app with an empty data store than to use the data store admin and delete the data from there.


number of writes also depends on the number of indexes you have on the data


Can somebody explain the article in laymen terms ? For those not too familiar with GAE...


"Blogger misunderstands how indexes work on App Engine."


There are plenty of things that are wrong with Google App Engine. And there are plenty of bugs that exist that have cost me money.

Why don't you try filing a bug report/suggesting a warning and send an email requesting something of a refund. They tend to be a friendly bunch who give refunds to obvious problems.

Moving to AWS will of course save you lots of money in the longer term, depending on what your hosting requirements are.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: