When we suggested to our bosses SQLite as a production solution for our large-scale servers, they thought we were kidding.

Could SQLite be appropriate for a high-throughput server (up to a billion requests a day) in a production environment?

We believed the answer is yes, if only for the specific use case we needed to solve.

Background

SQLite is generally not considered suitable for a full-blown production environment. Being light-weight, easy to utilize (no installation needed), to create, and to modify makes it wonderful for development and testing. But when scaling to production its virtues become its hindrances:
  • Its light-weight implementation is naive, so it does not handle high-speed locking well
  • It is not scalable
  • It is easily corruptible
Actually, sqlite.org itself lists when SQLite might not be appropriate, which covers most of big scale production use cases.

A production use-case where SQLite might still make sense

So, what were we thinking? What use case are we trying to solve which made us think of SQLite?

My colleague, Ariel, was the one who originally suggested this idea,  to make a local relational cache solution.

The problem that we are trying to solve is of an Ad Server – it gets a lot of requests, and is supposed to choose the best available ad to serve for the requesting user. It is a well-worn problem, where the server needs to get all the available ads, filter out all the ones which do not fit the request (the wrong size, not meant for the user’s locale, or language, etc.), sort them according to our ‘secret sauce’ and serve the best ads.

The solutions are also well-worn – the most naive is to connect to the backing repository (most likely a relational database cluster like Postgresql) each time, with a big query returning the relevant ads, and then sorting them in code. To prevent an internal network bottle-neck or overwhelming the repository itself, a caching layer is in order – like a local memcache or Redis service, which will retain for a timed period a snapshot of the available ads, and alleviate the stress on the backing repository.

Unfortunately, adding the caching layer, you forgo the relational advantages of the backing repository, and you need to unmarshal a large chunk of the cached data with each request, and filter it in code.

Would SQLite be able to be a stand-in for the classic cache layers, while retaining the relational advantages of the mothership?

It might, but how about its shortcomings? What assumptions do we need that will avoid the known problems of SQLite in high-stress production environment?

Well, the main assumption which makes this solution theoretically possible is that our SQLite database is totally read-only. Our server code should never change it. This would solve any locking problems, as there are no read locks. We could find nowhere on the internet anyone saying there is a problem in high-throughput reading of SQLite when there are no writes – it could be possible!

The other two shortcomings I listed above are also not critical for a caching solution since:

  • caching is local, so horizontal scale is not an issue
  • the order of magnitude of our data is capped to around 10s-100s of thousands of entries (rather than millions – what user data would amount to), so vertical scale is also not an issue
  • caching only holds data gleaned from other sources, so data corruption never results in data loss – simply rebuild it from the master data store!

Benchmarks

So, after the management laughed itself out, they wanted numbers – show us how well your solution works under stress, compared to the known solutions!
We created a Sinatra (ruby) solution which caches production data using SQLite, and then compared it to a similar solution using Redis, using siege to stress the two systems and then compared the numbers:
Throughput (rpm) Response time (ms)
SQLite 1800 6
Redis 250 150

Wow! This is a bit one-sided, isn’t it? How do we explain this?

Well, our data contained a couple of thousands of available ads, each having up to a couple of hundreds of creatives (each of a different size, language or design).

In our Redis solution we provisioned the cache according to a few facets (locale and provider, for example) to avoid going through the whole database for each request, but it still called for a big chunk of data being read from the cache each request. Since Redis objects are actually a text (or JSON) representation of our objects, the server was busy unmarshaling the data with each request resulting in a CPU bottleneck, which restricted the response time, as well as the throughput.

In SQLite this is not a problem, since the filters happened on the indices, before any data was unmarshaled, ultimately loading only a much smaller data-set, which is much more optimized to the current scenario.

To prove that point – constraining the number of creatives per ad to 10 improved the Redis response time to 25ms.

What next?

So we proved that reading from a local SQLite database could be done really fast and in high throughput, how would that translate to an end-to-end production solution?
Also, it’s not like we didn’t encounter any problems in our solution…

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s