Hacker News new | past | comments | ask | show | jobs | submit login

> 1. Use Bulk Operations

> If you need to insert a lot of rows at once in your database, you really should not use execute. The sqlite3 module provides a way to bulk insertions: executemany.

Note that this is actually part of DBAPI 2.0[0], so pretty much every Python database driver should have an executemany(). Worst case scenario, it's implemented on top of execute() and you get little to no benefits (that's the case for psycopg2 currently[1]), best case scenario it can use much more efficient methods (seems to be the case for sqlite3).

> 3. Cursors Can Be Iterated Upon

That is an optional extension but part of the DBAPI 2.0 spec as well[2], and most drivers should implement it

> for row in connection.execute('SELECT * FROM events'):

The return value of execute() is spec-undefined, so beware that specific method, it is absolutely not portable, neither is execute/executemany on the connection object.

> 7. Use Placeholders to Interpolate Python Values

Warning: the actual placeholder is driver-dependent and not portable, you can query the driver for the preferred param style[3], you can also ask it for its thread-safety guarantees (the sqlite3 module advertises module-level thread-safety but not connection-level let alone cursor-level)

[0] https://www.python.org/dev/peps/pep-0249/#executemany

[1] http://initd.org/psycopg/docs/cursor.html#cursor.executemany

[2] https://www.python.org/dev/peps/pep-0249/#next

[3] https://www.python.org/dev/peps/pep-0249/#paramstyle




ou get little to no benefits (that's the case for psycopg2 currently[1])

Yeah, this bit me on the backside using psycopg2 as an underlying driver for Pandas DataFrame.to_sql(). My hack? Dump the dataframe to a StringIO as CSV and load that instead...


In the past when I've had to insert many rows (for example), I wrapped it inside a transaction. Why and when would someone use executemany() instead? Or does executemany() do a transaction behind the scenes?


Why and when would someone use executemany() instead?

What kills you in this use case is roundtrips - send one row, wait for an acknowledgement (even if you are not committing it yet) then sent one more, and so on and so on. There is a lot of dead time spent in protocol overheads or just idle. A well implemented executemany() sends as many rows at a time as will fit in a network packet and keeps streaming them as fast as it can.


Transactions are a separate orthogonal concern.

What executemany can do is prepare the statement once and call it multiple times with different sets of parameters, whereas execute may be preparing the statement every time it is called (possibly hoping that the database system has a statements cache) or not preparing them at all (and the DBMS only does simple query planning).


For some of the executemany use-cases the following execute would work as well:

INSERT INTO myt (myc1, myc2) VALUES ('c11', 'c12'), ('c21', 'c22');


> return value ... undefined

I much prefer treating cursor.execute like list.append. Mutation methods should return None to emphasize their impurity.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: