Don't test with SQLite when you use Postgres in Production

pimterry · on Aug 4, 2015

Better: Don't test only with SQLLite when you use Postgres in Production.

His points are all valid, you definitely shouldn't release something to production that you haven't tested thoroughly in a separate identical environment.

That doesn't mean you should never test with SQLLite though.

A good pattern I see all the time is to have a final stage of system tests that run slowly but very accurately in a production-equivalent environment, and to also have a preceding test stage that uses SQLLite instead. By using SQLLite in memory it's much much faster and easier to manage and easier to parallelize etc etc, so you find out quicker if your code is fundamentally broken.

The goal of testing is to find out as quickly as possible (minimal context-switching, easy to check locally before commit) whether your application will work. That means checking the things that are most likely to break most of the time as early as possible. It's typically not going to be a complex database incompatibility issue that makes your tests fail. It's going to be the SQL query you wrote 30 seconds ago, or the leftover query that doesn't work at all the table structure you've just moved to, etc etc. These are things you can check perfectly effectively in a lot of cases, and much quicker, with SQLLite and friends.

Quick test stages that catch all the obvious problems early, final definitive production-equivalent test stage that makes sure it'll definitely definitely definitely work in real life. If you do it well, you can set it up so they both use exactly the same test definitions, and just swap out the DB config.

pothibo · on Aug 4, 2015

I'm sorry but I really don't agree with this. One of the point the OP made is that if you develop in SQLite and deploy on a Postgresql database, it means you discard all the features that postgresql has over SQLite (data types and SQL Queries for example).

You wouldn't test your Facebook consuming API code with a Twitter endpoint, why do you apply the same logic to your DBMS?

swsieber · on Aug 4, 2015

Exactly. IIRC, it is possible to run postgresql in memory if you configure it correctly. In the age of docker and the like, having different concurrent instances of a dbms should be easy enough, right?

DiThi · on Aug 4, 2015

I came here to say something along those lines. Our project has a SQLite backend for hassle-free local development, and we use Postgres in all remote servers. Tests run both in local and remote.

jeltz · on Aug 4, 2015

What hassle did you expereince when using PostgreSQL locally? On the projects I have been in we have ran PostgreSQL in local development almost painlessly. PostgreSQL is quite developer firendly, at least for my use cases.

I suggest using syncrhonous_commit=off to sped up the test suite and to use unix sockets (assuming you do not run Windows).

DiThi · on Aug 8, 2015

The problem with using PostgreSQL was difficulty to make it work everywhere, we have at least one dev in each platform. I use the pure JS version of SQLite (sql.js in npm). The server is fairly small, having the bulk of the code in the client (SPA).

If it wasn't for the MySQL I had to implement after the SQLite work, I would say it's better to investigate the quickest way to configure a local Postgres DB. There are portable windows binaries.

noir_lord · on Aug 4, 2015

Was that something you rolled yourself or is there an existing solution for it? as it sounds interesting.

andrewvc · on Aug 4, 2015

The comments here are full of people bucking against this advice. I've worked at TWO companies now where people said NO we want to use SQLIte in dev. Both switched in under a year despite fierce internal opposition. Everyone had changed their tune once they hit growth in users and complexity. Why? Because being ideologically right is not as nice as being sure you're writing code that works.

The fact is if you have a serious site there's a good chance SOME specific feature will start looking appealing ( maybe array values, JSON , window functions, a pg extension...). These features are written because people use them, we aren't all writing dead simple CRUD apps. Or you your ORM will interact with one slightly differently than the other. I don't expect to convince any of you doubters, but the first time you get a bug in prod that slipped through the testing process but really could have been caught in dev, you'll ask yourself why you weren't using pg there too.

There's no sane reason to complicate your life by running two databases unless you either have a dead simple app, or are one of the jabbering idealists I see on here. Productive programmers simplify the problem and minimize their tech stacks.

avereveard · on Aug 4, 2015

I am for both:

we run a brutaly different dev environment than production: windows, 32 bit, hsqldb, windows codepage against linux, 64bit, postgres, utf8 codepage

we also have a beta environment that's a perfect mirror of production down to the vm vendor and package version and an alpha environment that's on a cheaper vendor and uses a more updated version of production os/packages (and has experimental features of our software enabled)

this maybe slows down development a little, but catches a brand whole class of interesting error that are normally hidden and wait to happen till the next time you need to upgrade something in the production environment

then again we do have some sane lib that hides the horrible differences between databases so we have a whole class of bugs that's managed by that layer. (no it's not an ORM I hate those)

drdaeman · on Aug 4, 2015

Seems that someone's running wild, downvoting every comment they disagree with.

I believe neither maintaining a code that's compatible with multiple databases, nor using a single one is a mistake per se. Whatever rocks your boat is the attitude - one just needs to be aware of consequences (both good and bad ones) of chosen path.

But, yeah, the linked article's warning is completely valid - testing on a single platform, then deploying to another is likely to encounter some issues one day.

avereveard · on Aug 4, 2015

> I believe neither maintaining a code that's compatible with multiple databases

me too! but there are soo many compatibility layers you can get and bolt on for free and they also work around version issues and things like that... of course if you need the latest version then you're in a different boat, but the generic problem of talking to a database is basically solved by now

sirtaj · on Aug 4, 2015

What sort of constraints do you have that your dev environment can't be a VM running the same software as production?

avereveard · on Aug 4, 2015

people get accustomed to do things a certain way, then we upgrade to a newer postgres or a newer whatever everything breaks down and we have to track down dozens of little papercuts.

if anyone has ever tried to upgrade a ruby app to a newer ubuntu release, he should know that building no variance in the deployment environment is a recipe to never upgrade it. (no I'm not using rails currently, I have to build a thing that needs to live more than two years)

darkxanthos · on Aug 4, 2015

I don't understand the argument against this. I have had even small sites behave differently when moving between Postgres and SQLite. I <3 SQLite but it's not a golden hammer.

mhurron · on Aug 4, 2015

> I don't understand the argument against this

Write only pure, portable SQL. Using RDBMS extensions is evil. Enligtenment only comes to those who are pure(ly using SQL).

jeltz · on Aug 4, 2015

The problem is that it is not jsut the extensions that differ, it is also quite fundamental parts.

* SQLite is dynamically typed.

* The text types differ in meaning between databases (text, varchar, nvarchar, varchar2, ...). I also beleive char works in different ways in different databases, but I do not use blank padded strings so I am not sure.

* Time types and fucntions are very different.

* Oracle and PostgreSQL run at a different default isolation level than MySQL and MSSQL.

* MSSQL has read locks on rows, which can cause deadlocks which cannot happen in other databases.

* Autoincrement is not implemented in the same way in different databases.

* Text collations and text equality works differently in different datbases. (If they care about trailing blanks, case, etc and the order tuples are sorted in.)

And these are jsut the ones I recall on top of my head.

adrianmsmith · on Aug 4, 2015

To add to your list:

* Oracle treats the empty string and null as equivalent, PostgreSQL, MySQL and MSSQL treat the empty string and null as distinct.

mhurron · on Aug 4, 2015

I am not supprting the position.

electrum · on Aug 4, 2015

Unfortunately, limiting yourself to only what is available in standard SQL is not practical. For example, doing idempotent inserts, insert-or-update, sequences, transactional behavior for DDL, data types, date/time manipulation, etc.

To paraphrase Tom Kyte from Oracle (Ask Tom): "Your company paid good money for Oracle and all of its features. Use them instead of wasting money reimplementing them poorly yourself."

There are many ways to achieve portability across databases, but limiting yourself to standard SQL is not the best way (at least for everything). And I say this as someone who develops an analytic database (Presto) that tries to follow standard SQL for everything.

Mister_Snuggles · on Aug 4, 2015

> To paraphrase Tom Kyte from Oracle (Ask Tom): "Your company paid good money for Oracle and all of its features. Use them instead of wasting money reimplementing them poorly yourself."

This applies just as well to the open source world. You may not have paid money for PostgreSQL, but you did pay for it in that the time you spend maintaining it could be used for other things. Likewise, your time could be spent reinventing its features in your application or just using those features and spending your time building something better for your customers.

ebiester · on Aug 4, 2015

And if you keep using them, you will keep loading Oracle until you need to license another 8 cores/$400,000 (list, which admittedly nobody pays.)

hvs · on Aug 4, 2015

We aren't discussing religion, we are discussing software development. There is no need to eschew extensions if that's the only database your company uses.

anthonybsd · on Aug 4, 2015

I'll buck against this advice any day. I've never used SQLite for mocking but I've used HSQLDB for functional tests of an extensive ORM-derived DAOs that talked to Oracle in production in the past. It was pretty flawless. I.e. when running in Oracle emulation mode HSQLDB was pretty close to the real thing for our purposes. For a few tiny differences we had automatic script that would convert from Oracle schema creation file to HSQLDB "Oracle" schema creation file any time changes to the real schema were introduced.

batbomb · on Aug 4, 2015

I was going to say this too: HSQLDB is more suitable, at least for me, testing environment than SQLite.

electrum · on Aug 4, 2015

For Java, see my other comment for how to run real PostgreSQL or MySQL from unit tests: https://news.ycombinator.com/item?id=10003789

lkrubner · on Aug 4, 2015

I will tell a story that I think is very strange.

Last year I worked at a small startup that was focused on medical records. They used PostGres in production, but SQLLite in development. The frontend was pure AngularJS. They had a massive Python code base for pulling data from the database, turning it into JSON, and then sending it to the frontend.

But then things began to change.

PostGreSQL gained the ability to handle JSON natively. We began the preliminary process of switching over to use that ability. It would mean less work for the Python code.

Here is where it got strange: some of the team was so deeply committed to using SQLLite, that they began hacking SQLLite to add the ability of SQLLite to handle JSON natively. That is, any feature that PostGreSQL had, which we used, they wanted SQLLite to have.

On the one hand, the technology they developed was amazingly cool. On the other hand, it was a massive effort that had nothing to do with the goals of the startup.

I could not fathom why we didn't simply use PostGreSQL in both development and production.

justinclift · on Aug 4, 2015

Just to point out... there's no capital G in PostgreSQL's name. It's either "PostgreSQL" or "Postgres". ;)

Also to be super pedantic... it's SQLite, not SQLLite. ;)

spacemanmatt · on Aug 4, 2015

You're not wrong. If I were managing that startup I would hit the ceiling about hacking JSON into SQLite. That is very unlikely to be relevant to the mission.

dspillett · on Aug 4, 2015

This is one of those things that I would hope goes without saying, but obviously doesn't... Always test against what you expect to see in production. If you test against something else first (in this case mocking through an in-memory DB) to make the testing of other parts faster/easier then that is fine, but once those tests are done you still need to do a final full test against the real stack(s) you expect to see in production.

On the mocking thing: I thought the point of that was to completely avoid DB calls for speed when testing other logic layers, so even a fast in-memory DB isn't needed and anything (i.e. just some code that manufactures a response object) that produces a result set in the right format will do? In that case even using an in-memory DB is less efficient than the other option so is at best a "half optimisation". Am I missing a point here?

frobaldo · on Aug 4, 2015

Yeah, when not hitting any DB, you're not testing runtime with ORMs. Got bit by that.

.net Entity Framework has this problematic limitation that you can't check if SQL queries will be properly built without DB - example situation is if you're trying to use not mapped .net method in your SQL-targeting queries. That's a runtime error you can't (yet) test without DB.

I'm cautious enough to believe other ORMs may have similar quirks, only testable with something to query on.

plorkyeran · on Aug 4, 2015

If the in-memory DB has the same feature set as the real DB, using the in-memory DB for tests should be a single flip you switch, while writing code to generate fake result sets for every tests requires writing all of that code. 90% of the benefits for 1% of the cost.

jeltz · on Aug 5, 2015

But they do not have the same feature sets. And PostgreSQL trivially be tuned to work almost like an in-memory database.

mangeletti · on Aug 4, 2015

I've always preferred this (using Django):

1. Local development is done with the simplest of everything (local memory cache, SQLite3 database, console based email backend, local static file storage (vs S3, etc.)). The result is that there is very little overhead and everything is easy to work on and test quickly. This also gives me the ability to quickly wipe out things locally (erase and make new migrations for trivial changes, delete the database file, clear cache more quickly by just restarting the Django server, etc.).

2. Final testing takes place on a staging server (Heroku), which is free, and which can contain all the production-level backing services that production will have. This server will be treated like production (no in-place hot-fixes and database wipes / restarts). Separate dev accounts will of course be used for things like APIs, email, etc.

3. Production (Heroku).

This gives me the best of both worlds; the simplicity of local development with simple backing services, with the comprehensiveness of a staging server with better parity.

acveilleux · on Aug 4, 2015

This makes sense when you use an ORM that will transparently work. You trust the ORM to insulate you. If you have needed to go direct to SQL for something then you're looking at putting in test-specific code in your production code to simulate/fake/disable the custom SQL bit...

Just run postgres on a ramdisk, or on an SSD with sync disabled and a scripted setUp/tearDown.

mangeletti · on Aug 4, 2015

To expand on this further, my primary objective is to be able to develop without regard for backing services for which I have a proper abstraction layer (e.g., ORM). In my opinion, it is a mistake to think you can mirror production in a local env, because, no matter what, there will be at least one difference that catches you off guard, not to mention the cost of maintaining a golden setup and teaching debs how to use it (even with the advent of Docker, etc.). If you decouple your application from its backing services by design, you will save many man months down the road.

dfox · on Aug 4, 2015

We are mostly PostgreSQL shop and on last few projects I wanted to make it possible to use SQLite for development and testing, but I've fount that it is simply not possible. not because there are things that work on sqlite and break badly on PostgreSQL, but other way around. Just introducing Boolean or DateTime column with server_default into your SQLalchemy model causes that you can not use SQLite.

sgt · on Aug 4, 2015

Postgres is relatively light weight. We use Postgres heavily in integration tests and it's quite fast. I don't see that SQLite would speed things up significantly other than causing other potential issues due to it being a different database.

stavros · on Aug 4, 2015

You just made me wonder whether disabling fsync for postgres would make your tests behave non-identically to a database that does fsync, and how much faster they would be. Interesting question...

jeltz · on Aug 5, 2015

You can get most of the benefit you would get from disabling fsync by just disabling synchronous_commit. Disabling synchronous_commit means that fsyncs are still doen but we do not wait for them to complete.

spacemanmatt · on Aug 4, 2015

I'm pretty sure the behavior would be identical except for durability.

stavros · on Aug 4, 2015

I think so too. Okay, fsync, off you go for dev.

mgamer · on Aug 4, 2015

I use MySQL in my system but the advice should be applicable to Postgres as well: keep your test database in a ramdisk. By moving my MySQL to ramdisk I got almost a tenfold improvement when running tests on a build server. Not so much (but also significant) improvement when running tests on my development machine.

jeltz · on Aug 4, 2015

We have benchmarked this at our company and notivced that using a ramdisk does not give much extra performance over just turning of synchronous_commit in PostgreSQL. Most of the slowdown from the database turned out to be latency from waiting on the background writer to fsync.

jackweirdy · on Aug 4, 2015

Why do you think there was a difference between tests on a build server and on your development machine? Was it just a case of the build server being configured for performance?

mgamer · on Aug 4, 2015

I'm pretty sure that it's down to hard disk performance. I don't recall exact numbers but difference in IOPS between my development machine and a build server in the cloud (Azure) was greatly in favour of the former.

jgeraert · on Aug 4, 2015

I tend to abstract away my queries using database views. This way the query in application becomes very simple (select * from view where $simplepredicate).

In my test database (sqllite,hsqldb,derby,...) i create actual tables containing test data with the same definition as the views.

This allows me to have rather complex queries that are database specific and still have fast running database queries.

The views themselves i test separately. They are also easier to fix in production than code releases, and can also be replaced by materialized views if necessary.

Inserts/updates are typically simple too.

linuxlizard · on Aug 4, 2015

NASA rule of thumb I read somewhere.

"Test what you fly. Fly what you test."

atmosx · on Aug 4, 2015

A few notes:

- All the concerns listed in this article are addressed by ORMs.

- SQLite supports CTE[1] (subqueries).

- It's safer to use the exact same setup in development and production, but it's slower for applications with many tests. It's a trade-off and that's all there is to say.

[1] http://stackoverflow.com/questions/21819183/how-to-use-ctes-...

dietrichepp · on Aug 4, 2015

ORMs are not a panacea, for most applications it makes sense to bypass the ORM for operations which translate well to SQL but become convoluted or slow when expressed using the ORM's API. ORMs can't hide all of the differences between database implementations anyway, not without hiding some of the functionality that you actually want. It's unfortunate, but unavoidable.

loopbit · on Aug 4, 2015

ORMs are written by people and sometime have bugs. Also, they don't always return the same result for all the database managers.

Both this facts can bite you in the ass if you are not careful. It's great to run development tests on SQLite, but your CI environment, staging or whatever you have before you push your changes to production should try to mimic production as much as possible.

ahachete · on Aug 4, 2015

Testing with a different database would only be good if you:

a) Also test with your real production database b) You restrict yourself to using the lowest common feature denominator of both databases (which is probably a pretty low figure)

Still, different databases behave differently in many aspects. And despite how cool SQLite is, PostgreSQL is so much more advanced (specially in SQL querying) that I don't see the point losing all those features. As I have already mentioned in HN, check this amazing presentation: http://use-the-index-luke.com/blog/2015-02/modern-sql before deciding to restrict yourself to a subset of the SQL supported by PostgreSQL.

Given that it's easy to start PostgreSQL from a unit test, and how lightweight PostgreSQL is, I see no real point in using SQLite for testing. Use PostgreSQL!

msluyter · on Aug 4, 2015

On the project I work on, we face a similar problem. We use H2 for an in memory test database, but now we're frequently running into the problem where we'd love to use a specific postgres feature (say, json support) that isn't supported by H2.

That, and occasionally we find syntax differences that cause a headache when doing a database setup/teardown. A single minor SQL difference requires us to create separate H2/Postgres flyway configurations.

I think a better option than H2 or SQLite, that we're currently investigating, is using Docker to bring up a local postgres instance for testing.

(All of that assumes a certain dedication to using Postgres. If you want to be database agnostic, then you may in fact be better off not using Postgres in dev/test just to force yourself to remain compatible with other DBs.)

serve_yay · on Aug 4, 2015

I agree. Bite the bullet and learn to work with Pg in your dev environment. The dev ergonomics aren't worth all the other stuff listed here. Think about all the time you'll spend reasoning about the differences between the two. This has nothing to do with your project.

XorNot · on Aug 4, 2015

In the age of containers this able thing is baffling to me. I have a couple of things that need postgres in production, and a single script which launches a two docker containers with tmux running postgres and apache with live scrolling logs, and which binds the postgres port to my local host.

I simply cannot understand where the friction on this is existing today. Even pre-docker its fairly easy to setup any of the big dbms in their own little environments (5 versions of mysql on the same server for a migration project).

acveilleux · on Aug 4, 2015

Plus with a few tweaks to postgres relaxing ACID compliance (i.e.: no sync) and using an SSD or better a ramdisk for storage performance will be fantastic for unit tests.

serve_yay · on Aug 4, 2015

Yeah, same here, I don't see how avoiding Pg (or other "real" RDBMS) makes things easier.

mnkypete · on Aug 4, 2015

Starting Postgres locally for unit testing is quite easy and done with a few commands (e.g. Powershell)

https://gist.github.com/tobiasviehweger/cbfd9a1a55bff0862f9e

spacemanmatt · on Aug 4, 2015

When I work with PostgreSQL heavily, I don't ever take a server down if I don't have to. I just have test harnesses create fresh test databases from templates.

code_duck · on Aug 4, 2015

I've run Linux as my desktop for a long time, so I've always found it natural to set up my desktop as closely matched to the deployment server environment as possible. It's very simple to set up MySQL on Debian or Ubuntu, and only slightly more trouble on OSX.

Using SQLite would actually be more trouble and I've never seen a need to use it all for development. Any time I've tried to use SQLite, it's been pointless as some of my queries written to run on MySQL or Postgres fail due to lack of support for some feature or another. And I already have plenty of MySQL and pg running on my systems, so...

weddpros · on Aug 4, 2015

We're using Vagrant for developers' environment (with mongodb/redis/rabbitmq/consul.io)

I can't recommend it enough. Maybe a local memory database could be faster... until your environment grows beyond just a db

K0nserv · on Aug 4, 2015

Extending his argument further. Strive to have a local environments that is as close to production as possible. Use Memcache in production? Use Memcache for dev/testing too! Use Elasticsearch in production? Use Elasticsearch in dev/testing!

This is all really simple to setup to. Think about how a new developer gets up to speed and starts coding in your company. Does it involve downloading and installing Postgres.app[0]? If it does it's no good. Starting a local environment should require a single command. `vagrant up` is one option if you use vagrant. Local environments should ideally use the same method of provisioning as production servers. It can however be simpler, one of our codebases has a bootstrap.sh file that sets everything up, it works surprisingly well. No version conflicts, no weird bugs due to slightly varying versions for different developers and no fake service such as SQLLite.

For the life of me I can't understand the test speed issues that people talk about. We have a pretty small codebase with some 2k test(94% coverage). That takes about 6-7 minutes to run inside a vagrant VM using all the same services as production. 6-7 minutes is a long time, but you shouldn't be running your whole test suite during development. During development only run the one or two tests that are relevant for the feature you are building or the bug you are fixing. These are typically really fast even with a proper database. The whole test suite should be ran before pushing and by CI. If your database is still too slow look at using a RAMdisk or tweaking the configuration like people have suggested in the comments

0: http://postgresapp.com/

rbanffy · on Aug 4, 2015

While I agree testing well is crucial, running your tests on one RDBMS that's fast and running them less frequently (think "before merging into production") is a good compromise. PostgreSQL is slower here than SQLite and if your tests take 20 minutes to run, your developers won't test as often as you would like them to.

It also prevents you from doing things that only one RDBMS does (I was bitten by this because SQLite supports timestamps with timezone data and, at that time, the MySQL we used in production didn't) making your app more "robust" (you are using an ORM for a reason, right?).

Imagine you only test your app in PostgreSQL and, because of that, your app makes assumptions about the environment only PostgreSQL satisfies. You simply can't move to anything else without extensive rewriting. Now, when the workload changes, you can't just change your storage backend to one that suits your workload. You need to make PostgreSQL do whatever your new needs are. PostgreSQL is probably a good enough solution, but it's not the optimal solution for all problems.

electrum · on Aug 4, 2015

Aside from a few seconds to spin up a new PostgreSQL instance, it should not be any slower than SQLite. We have a wrapper [1] (forked from [2]) that starts a new PostgreSQL instance from Java unit tests. We have the same for MySQL [3]. This allows every developer to automatically run tests against the target database without needing to set anything up.

(The common alternative in Java is to use H2 or Derby, which are similar in concept to SQLite)

1. https://github.com/airlift/testing-postgresql-server

2. https://github.com/opentable/otj-pg-embedded

3. https://github.com/airlift/testing-mysql-server

anthonybsd · on Aug 4, 2015

Hmm, I'm actually on PostgreSQL right now using H2, this might come useful. Thanks.

electrum · on Aug 4, 2015

Just add the Maven dependency and see their own tests for an example:

https://github.com/airlift/testing-postgresql-server/blob/ma...

https://github.com/airlift/testing-mysql-server/blob/master/...

Though you want to use @BeforeClass / @AfterClass or similar rather than creating one for each test.

jbb555 · on Aug 4, 2015

No, I will do this.

Because it's so much easier and it still lets me test 95% of my code. And the alternative is not testing at all because there is limited time for testing and setting up a proper database for this is so much more trouble.

The choices are not good test vs bad test. They are test-with-issues vs. no test.

(Obviously you have to do SOME testing with the real DB but this article is talking about unit tests (or related))

desas · on Aug 4, 2015

How is it so much trouble? Install postgres & any extensions you use on your dev/test boxes, this is a one-off cost that takes minutes.

Have your test bootstrapper run "createdb somethingunique" and then "export APP_DBCONN_STR='somethingunique'". Adding this to your bootstrapper will take minutes.

AlphaSite · on Aug 4, 2015

Ive found running the tests in docker works quite well.

gitaarik · on Aug 4, 2015

If you're writing unit tests to test the business logic of your app, you shouldn't need a database at all. You should write your business logic so that it isn't dependent on a database, so you can really test the business logic and you don't have to mock the database.

If you're talking integration tests, then of course you should use an environment as close to production as possible.

hakanderyal · on Aug 4, 2015

For some apps, business logic depends on database functionality. For example, I rely on Postgres to prevent duplicate records, or delete cascades.

In any case, using SQLite when doing TDD, and testing with postgres when you are done implementing is an acceptable trade-off for most use cases.

jammycakes · on Aug 4, 2015

Unfortunately it is not always possible to maintain a clean separation between "business logic" and "data access logic." Considerations such as performance, what is and isn't supported, or data access semantics often mean that changing one necessitates changing the other.

Attempting to separate out the two often means that you end up miscategorising business logic as data access logic (resulting in poor test coverage) or data access logic as business logic (resulting in poor performance and unnecessary complexity).

lgunsch · on Aug 4, 2015

I agree entirely.

As soon as your code touches the database (or a mock of it), it looses most of its re-usability. It suddenly becomes much harder to use it elsewhere. In most cases, the code is then tied to that particular framework (like Django, or Flask), and you cannot move it out into a separate independent module. This becomes a problem in large projects, and for example micro-service architectures.

It also mixes up the levels of abstraction. Getting and instantiating objects is mixed right (hard coupled) in with the code that uses them.

bane · on Aug 4, 2015

SQLite is awesome for rapid prototyping and proofs of concept work. But you should pretty much always move to a real RDBMS once you start nearing beta. There's some advantages to using SQLite early on when you're still figuring things out, but once you've nailed down the schema and queries and done some early validation testing, you pretty rapidly run out of advantages and it's just better to go with something like Postgres.

It doesn't really take a lot of discipline or work to switch over, and once you're entering beta candidate territory, that's when features and performance tuning start to take over and that's where something like Postgres starts to shine in comparison.

Even if you keep with the same schema and queries, just moving over to Postgres on a separate box, you'll probably start to see immediate performance improvement and you'll get better scaling performance almost immediately.

falcolas · on Aug 4, 2015

My two cents on this topic - as usual, the answer is neither black nor white. At the end of the day, an in-memory SQLite database works very nicely as a stub for most standard SQL queries as part of a unit test mocking system.

It allows you to create, populate, test, and teardown an entire relational DB in hundredths of a second, which makes it ideal for unit tests where you want to clean the slate between tests to ensure that you aren't accidentally creating silent dependencies on a database state created from a previous test.

On the other hand, when you're done doing your Red-Green-Refactor cycle for a new feature, you want to immediately run integration tests, which will exercise your program against not only your production model database, but the other APIs you had mocked out for regular unit testing.

Mocking is good for iterative development, and SQLite is a great 80% tool for mocking.

nbevans · on Aug 4, 2015

He claims SQLite doesn't have CTE's but I believe it actually does.

zimpenfish · on Aug 4, 2015

Definitely seems to.

https://www.sqlite.org/lang_with.html

subssn21 · on Aug 4, 2015

This is a new feature as of 2014. I hadn't looked at SQLite actively since before then

azinman2 · on Aug 4, 2015

It would be nice if you could embed it and run in memory so it's as convienent.

And yes, you can do CTEs in SQLite but that's besides the point.

Jean-Philipe · on Aug 4, 2015

I also started using SQLLite for testing (through an ORM), but at some point it became too painful to maintain and didn't find all of our bugs. It's not even any faster than Postgres.

skarap · on Aug 4, 2015

This depends on what the other options are. If you have

* test with SQLite

* test with Postgres

then using the same environment as production (Postgres) is of course better, but if the real options are

* test with SQLite

* don't test

SQLite is much better!

ju-st · on Aug 4, 2015

My experience with testing with Sqlite and Mysql when the customer uses MS SQL is very bad. JDBC only theoretically abstracts the real DB...

WatchDog · on Aug 4, 2015

Since when does JDBC abstract the real DB in theory? All JDBC does is provide a common interface to execute a statement and receive result sets.

Fudgel · on Aug 4, 2015

What about if you're using an ORM, wouldn't that fix the issues mentioned?

olavgg · on Aug 4, 2015

Absolutely not! Heck with Hibernate for example you can't write a simple inner join on a non id value;

For example this query is invalid with HQL

SELECT p FROM person p INNER JOIN Invitation i ON i.email LIKE p.email

And that is one of MANY gotchas. Hibernate has it advantages though, especially when it comes to developer productivity. But JDBC is needed for some edge cases.

I often use Groovy SQL instead of using JDBC, comes with excellent transaction support and helps simplifying you DB specific code so it makes sense for newcomers too.

cmdkeen · on Aug 4, 2015

No. It's perfectly possible to create queries via an ORM which work with one database engine but fail in another - whilst not SQLite and Postgres I've managed to do this several times with Entity Framework and Oracle trying to use CROSS APPLY thinking it is connecting to MS SQL Server.

collyw · on Aug 4, 2015

Its perfectly possible to create queries that work with configuration of the same database and fail in the other.

dspillett · on Aug 4, 2015

Unfortunately not.

An ORM may allow you to pass through queries or other directives that have DBMS specific behaviours so if you use any features like that the ORM can't protect you at all. Further more the ORM might change the way it talks to the underlying DB depending on what it is, to make use of efficiency enhancing features that one DB has but others may not, again you are not testing like-for-like in this case. The ORM itself may have bugs that are only apparent when exposed to a given DB.

An ORM often protects you from needing to know the specifics of the storage engine underneath, but you still need to test against the same storage engine(s) as you expect to see in production.

blackbeard · on Aug 4, 2015

Good question but not necessarily. I worked for a company who did this with SQL CE in the integration environment and normal SQL server in production with NHibernate as an ORM. Turns out there are some subtle differences that can creep through the ORM abstraction including the extensive dialect abstraction in NH.

collyw · on Aug 4, 2015

I have experienced plenty of issues with MySQL, many of which are fixed by changing the configuration. Using a different database doesn't make sense, unless you are in the very early stages.

jeltz · on Aug 4, 2015

Sadly not, since there are many cases where databases act differently. For example SQLite is dynamically typed while other databases enforce the types of columns. An ORM protect against some common cases, but not against all the subtle differences between databases.

whalesalad · on Aug 4, 2015

Can we all go through and s/SQLLite/SQLite? The inconsistency in the article and the comments is overwhelming.

onli · on Aug 4, 2015

Seriously. How is someone that does not even know the correct name of the software supposed to know enough about said software to know whether it is similar enough to be a good test replacement?

I know this is a bit too much against the person, but here that is directly relevant to the topic at hand.

Not that he is totally wrong, but he is wrong in the universality of his argument. It depends on what postgres-specific features the software is using and how the communication with the database is handled. If it is handwritten SQL, it is not a good idea regardless. If it is a ORM that maps the two very well and there is noting postgres-specific about the software, it is unlikely to be a problem.

ryan-allen · on Aug 4, 2015

Really??!!

matthewmacleod · on Aug 4, 2015

I reckon it's okay to do this in the right circumstances:

- It's a relatively simple app - You're using an ORM - You aren't using any advanced SQL features - It's only to make local development easier - There's still a full CI test run with your production database

That said, if your test suite is large enough that database performance is an issue during testing then either your app is too complex for the above to apply, or you are probably doing something else wrong.

WatchDog · on Aug 4, 2015

So the alternative is to mock out every database call or use a full database to run the tests. Using an in-memory database is convenient, keeps your tests portable and most importantly it helps catch a lot of bugs that might of been missed with mocking. It might not be as good as using the real DB, but its better nothing.

jeltz · on Aug 4, 2015

PostgreSQL will be close enough to an in memory database if you turn off synchronous_commit. For most test ssuties I would suspect much more time will be spent plannign the queries (which can be improved with prepared staatments) than the time spent on disk IO if you just reduce the durability requirements of your test database.

lmz · on Aug 4, 2015

Data directory on tmpfs is close enough to an in-memory database.

PythonicAlpha · on Aug 4, 2015

Some comments on the arguments:

(1) SQLite did never claim to be as "complete" as other databases -- it is and will be a "lightweight" database.

(2) Everybody with marginal knowledge of different databases should know, that using different databases always puts you on risk and needs extra testing. You would also not recommend to develop your application on Linux, use a crosscompiler and ship the product on Windows untested. And with that said, most of the arguments can be deleted.

(3) When you are using an ORM, most of the arguments are obsolete, too.

empthought · on Aug 4, 2015

> When you are using an ORM, most of the arguments are obsolete, too.

At the cost of pretty much everything you get out of using something that isn't SQLite.

on Aug 4, 2015

[deleted]

empthought · on Aug 4, 2015

I don't know what you're talking about; is there even a downvote button on Hacker News? I certainly don't see one.

Some features that ORMs make unnecessarily difficult (or impossible without simply avoiding the use of the ORM) to use, off the top of my head:

* transactional DDL

* temporary tables

* user-defined data types

* asynchronous notifications

* dynamic SQL

* non-serial keys

* multi-attribute keys

* temporal schema

* window functions (really all manner of analytics queries)

Those were just the ones off of the top of my head, ones that are not "independent of SQL features."

Enjoy your flat-files, because that's actually what ORM gets you close to: a procedural abstraction inversion over a rich and capable declarative interface and first-order-logic engine.

acveilleux · on Aug 4, 2015

To answer the downvote tidbit, there is and it shows up when your Karma is high enough. That threshold is not disclosed however.

on Aug 4, 2015

[deleted]

empthought · on Aug 4, 2015

You wrote, "when you are using an ORM, most of the arguments [against using a different DBMS for testing than production] are obsolete, too," which is false, unless the point of the ORM is to bring the production DBMS down to the test DBMS's level of capability.

CHY872 · on Aug 4, 2015

This is way simplistic. Frequently testing with the real database wastes a tonne of time. I've worked with a team where running a single test with the true database took a minute, and there were 200 such tests.

Thankfully, the only guarantee they needed was that provided by a key-value store, so a ConcurrentHashMap was used on dev machines.

Then the true database was used by the CI server, and commits only occurred when all of the tests passed with the true db.

All of the OP's reasons would (in many cases) pale in comparison to 'My tests take hours to run'.

jeltz · on Aug 5, 2015

If every test takes a minute then the database is probably not the culprit. I would suspect the test suite does something really stupid which a faster database would not fix.

We have a slow, unoptimized test suite (it does some really stupid things) which hits the database in most test cases and it still manages to complete 6166 tests in 4 minutes on my laptop (this includes setting up and tearing down the database).

spacemanmatt · on Aug 4, 2015

When I last worked with a large PostgreSQL codebase, it ran ~400 PGTap tests in about 3 minutes. A typical test included 10-20 queries for setup and assertions.