Goodbye, CouchDB

May 10th, 2012 by Steven Hazel

Here at Sauce Labs, we recently celebrated the completion of a significant project to improve our service uptime and reliability, as we transitioned the last of our CouchDB databases to MySQL. We’d outgrown CouchDB, to the point that a majority of our unplanned downtime was due to CouchDB issues, so wrapping up this migration was an important milestone for us.

CouchDB was a very positive experience at first, and its reliability isn’t out of bounds for a database that is, after all, only on version 1.2. But our service is very sensitive to reliability issues, we strive to give our users 99.99% uptime for their Selenium testing, and ultimately we decided that this transition was the most efficient path forward for us.

Once we decided on MySQL (specifically, we’re now using Percona, with its InnoDB-based XtraDB storage engine), we rearchitected our DB abstraction layer and one by one migrated all our databases, large and small. Our uptime was dramatically improved over the past couple months as we worked through the migration, and performance was slightly improved in the bargain.

This post describes our experience using CouchDB, and where we ran into trouble. I’ll also talk about how this experience has affected our outlook on NoSQL overall, and how we designed our MySQL setup based on our familiarity with the positive tradeoffs that came with using a NoSQL database.

First, how did we get into trouble?

Everything Was Going to be Great

When we first started Sauce Labs back in 2008, we thought we were building something very different from the service we run today. We were excited to try a NoSQL db, having spent too many years using MySQL in ways that the designers of relational databases never imagined. CouchDB seemed well suited to our needs.

Our original product design featured a REST API for storing data on behalf of our customers, and the plan was to drop almost straight through to CouchDB’s already RESTful API. This let us get a prototype up and running in a hurry. It didn’t matter that CouchDB was new and not yet hardened by a lot of real-world usage, we thought, because our database I/O needs were meager, our app was naturally horizontally scalable, and our product was fault-tolerant. We could easily bridge any reliability gap just by keeping replicas and retrying things when they failed. What could go wrong?

What Could Go Wrong

As our little company grew, and we learned about the problems our customers faced, our product underwent several major changes. It made less sense over time to partition data so strictly by user. We came to rely more on database I/O performance. In general, we found ourselves using CouchDB very differently from how we’d originally imagined we would, and much more the way most web apps use databases. That was still a reasonable way to use CouchDB, but the margin of safety we thought we had when choosing it slowly evaporated as our product evolved. And, as it turned out, we needed that margin of safety.

Sauce Labs’ service is more sensitive to reliability issues than the average web app. If we fail a single request, that typically fails a customer’s test, and in turn their entire build. Over time, reliability problems with CouchDB became a serious problem. We threw hardware at it. We changed the way we used CouchDB. We changed our software to rely much less on the database and do much less database I/O. Finally, we decided the best next step was to switch.

Again, none of this speaks particularly badly about CouchDB. It’s a young database, and reliability and performance issues are to be expected. And in a way it’s too bad, because we have no love for classical relational databases. We’re convinced that NoSQL is the future. We’re just not convinced it’s the present.

Some Things We Really Liked about CouchDB

  • No schemas. This was wonderful. What are schemas even for? They just make things hard to change for no reason. Sometimes you do need to enforce constraints on your data, but schemas go way too far. With CouchDB, adding new fields to documents was simple and unproblematic.
  • Non-relational. Relational databases grew up solving problems where data integrity was paramount and availability was not a big concern. They have a lot of features that just don’t make sense as part of the database layer in the context of modern web apps. Transactional queries with 6-way joins are tempting at first, but just get you into trouble when you need to scale. Preventing them from day one is usually easy.
  • No SQL. It’s 2012, and most queries are run from code rather than by a human sitting at a console. Why are we still querying our databases by constructing strings of code in a language most closely related to freaking COBOL, which after being constructed have to be parsed for every single query?

    SQL in its natural habitat

    Things like SQL injection attacks simply should not exist. They’re a consequence of thinking of your database API as a programming language instead of a protocol, and it’s just nuts that vulnerabilities still result from this poorly thought out 1970s design today.

  • HTTP API. Being able to query the DB from anything that could speak HTTP (or run curl) was handy.
  • Always-consistent, append-only file format. Doing DB backups just by copying files was simple and worry-free.
  • Javascript as a view/query language was familiar and useful.
  • Indexes on arbitrary calculated values seemed like a potentially great feature. We never ran into a really brilliant way to use them, though it was straightforward to index users by email domain name.
  • Finally, it’s worth pointing out that even under stress that challenged its ability to run queries and maintain indexes, CouchDB never lost any of our data.

The Problems We Encountered with CouchDB

Availability:

  • In our initial setup, slow disk performance made CouchDB periodically fail all running queries. Moving to a much faster RAID setup helped, but as load increased, the problems came back. Percona is not breaking a sweat at this load level: our mysqld processes barely touch the CPU, we have hardly any slow queries, the cache is efficient enough that we’re barely doing disk reads, and our write load is a very comfortably small percentage of the capacity of our RAID 10 arrays.
  • Views sometimes lost their indexes and failed to reindex several times before finally working. Occasionally they’d get into a state in which they’d just reindex forever until we deleted the view file and restarted CouchDB. For our startup, this was agony. Surprise reindexing exercises were the last thing we needed as a small team already taking on a giant task list and fighting to impress potential big customers.
  • Broken views sometimes prevented all views from working until the poison view file was removed, at which point view indexing restarted its time-consuming and somewhat unreliable work. I don’t know how many times one of us was woken up by our monitoring systems at 4am to learn that our service was down because our database had suddenly become a simple key/value store without our permission.
  • Compaction sometimes silently failed, and occasionally left files behind that had to be removed to make it work again. This led to some scary situations before we tightened up our disk usage alarms, because we discovered this when we had very little space left in which to do the compaction.
  • In earlier versions, we ran into three or four different bugs relating to file handle usage. Bug reports led to quick fixes for these, and these problems were all gone by version 1.0.2.

Performance:

  • There’s really only one thing to say here, and that’s that view query performance in CouchDB wasn’t up to the level of performance we’d expect from roughly equivalent index-using queries in MySQL. This was not a huge surprise or a huge problem, but wow, a lot of things are quicker now, and our database machines are a lot less busy.

Maintenance headaches:

  • When CouchDB fails, it tends to fail all running queries.  That includes replication and compaction, so we needed scripts to check on those processes and restart them when necessary.
  • View indexes are only updated when queried — insertion does not update the index.  That means you have to write a script to periodically run all your views, unless you want them to be surprisingly slow when they haven’t been queried in a while. In practice we always preferred view availability to any performance boost obtained by not updating indexes on insertion, but writing reliable scripts to keep view indexes up to date was tricky.
  • The simple copying collector used for compaction can spend a lot of time looking at long-lived documents. That’s particularly bad news when a database has both long-lived and short-lived documents: compaction takes a long time, but is badly needed to keep disk usage under control. Plus, you have to run compaction yourself, and monitoring to make sure it’s working is non-trivial. Compaction should be automatic and generational.

Unfulfilled promise:

  • CouchDB’s design looks perfect for NoSQL staple features like automatic sharding, but this is not something it does.
  • What is the point of mapreduce queries that can only run on a single machine? We originally assumed this feature was headed toward distributed queries.
  • It was never clear to us what the CouchDB developers considered its core use cases. We saw development focus on being an all-in-one app server, and then on massive multi-direction replication for mobile apps. Both interesting ideas, but not relevant to our needs.

(We’re told that a few of these issues have already been addressed in the recently-released CouchDB 1.2.)

We were able to work with CouchDB’s performance, and over time we learned how to script our way around the maintenance headaches. And while we were worried that CouchDB seemed to be gravitating toward use cases very different from our own, it was the availability issues that eventually compelled us to switch. We talked about a number of possible choices and ultimately settled on a classic.

MySQL, the Original NoSQL Database

So why not switch to another document-oriented database like MongoDB, or another NoSQL database? We were tempted by MongoDB, but after doing some research and hearing a number of mixed reviews, we came to the conclusion that it’s affected by a lot of the same maturity issues that made CouchDB tough for us to work with. Other NoSQL databases tended to be just as different from CouchDB as MySQL — and therefore just as difficult to migrate to — and a lot less well known to us. Given that we had experience with MySQL and knew it was adequate for our needs, it was hard to justify any other choice.

We’re familiar with MySQL’s downsides: among other things, it’s terrible to configure (hint: the most important setting for performance is called innodb_buffer_pool_size), and its query engine, besides being SQL-oriented, guesses wrong about how to perform queries all the time. Experienced MySQL users expect to write a lot of FORCE INDEX clauses.

The InnoDB storage engine, on the other hand, is pretty great overall. It’s been hardened by heavy use at some of the biggest internet companies over the past decade, dealing with workloads that resemble those faced by most modern developers. At the lowest level, almost any database is built on the same fundamentals of B-trees, hashes, and caching as InnoDB. And with respect to those fundamentals, any new database will have to work very hard to beat it on raw performance and reliability in real-world use cases. But maybe they won’t all have to: Percona’s forward-thinking key/value interface is a good example of how the solid InnoDB storage engine might make its way into true NoSQL architectures.

In switching to MySQL, we treated it as much like a raw storage engine as we reasonably could. So now we’re back to using MySQL in the way that inspired so much NoSQL work in the first place:

  • We ported our CouchDB model layer to MySQL in a way that had relatively minor impacts on our codebase. From most model-using code, using MySQL looks exactly the same as using CouchDB did. Except it’s faster, and the DB basically never fails.
  • We don’t use foreign keys, or multi-statement transactions, or, so far, joins. When we need to horizontally scale, we’re ready to do it. (But it’ll be a while! Hardware has gotten more powerful since the days when sharding was invented, and these days you can go a long way with just a single write master.)
  • We have a TEXT column on all our tables that holds JSON, which our model layer silently treats the same as real columns for most purposes. The idea is the same as Rails’ ActiveRecord::Store. It’s not super well integrated with MySQL’s feature set — MySQL can’t really operate on those JSON fields at all — but it’s still a great idea that gets us close to the joy of schemaless DBs.

It’s a nice combination of a proven, reliable database storage engine with an architecture on top of it that gives us a lot of the benefits of NoSQL databases. A couple months into working with this setup, we’re finding it pretty hard to argue with this best-of-both-worlds approach.

Comments (You may use the <code> or <pre> tags in your comment)

  1. UX-admin says:

    “And with respect to those fundamentals, any new database will have to work very hard to beat it on raw performance and reliability in real-world use cases.”

    You don’t know what you are writing about, plain and simple: MySQL has never been able to approach the performance of the Oracle RDBMS.

    And the fact that you picked MySQL, then NoSQL, then went to MySQL instead of Oracle or PostgreSQL tells me everything I need to know (and already suspected when I first started reading this essay).

    Another one bites the dust. You thought you could be smarter than ACID principles, and all those computer science doctors and scientists, and got busted. Maybe someday you’ll learn. And maybe they’re in the woods.

  2. The Gnome says:

    UX-admin, in what way is Oracle a “new” database?

    Also, why would a team want to move to PostgreSQL if it already had in-house expertise with MySQL?

    I think it’s time for you to stop taking whatever pills are making you so conceited.

  3. wow says:

    Mr. UX-admin was too biased to read the article the way it was written. “New database” means a 1.X database such as CouchDB or MongoDB, not Oracle RDBMS or PostgreSQL. The author indicated a reason for choosing MySQL was familiarity as well, and that makes for a quicker conversion that often runs with more stability.

    So, ux-admin or arrogant dumb-ass, I am not sure which handle best fits but choose one and apply it on your own, you obviously know all anyway.

  4. Jeff B. says:

    While I’m generally resistant to the whole NoSQL movement, I understand it has its uses and probably did for your business. It really seems that the problem is the maturity of the CouchDB engine and not the NoSQL concept in general. Unfortunately none of the other players in the NoSQL world are ready for prime time either, as you commented.

    To UX-admin, I should think that the cost vs. benefit of Oracle DBMS should make its non-consideration very obvious to you as it is to most of the rest of the non-enterprise world. While PostgreSQL may be a reasonable option, you really don’t explain why it would be a better fit than MySQL and install-base numbers aren’t in your favor.

    Please, elaborate on your position instead of dismissing people who disagree with you.

  5. Derek says:

    Not a popular or cheap solution, but Lotus Domino may have been another solution from CouchDB.

  6. […] Goodbye, CouchDB. Steven Hazel shares his experience report with CouchDB. Like many relationships it all started great, but reliability, performance, and maintenance problems drove him into the arms of Percona MySQL. They use MySQL in NoSQL mode and in return they get better performance and a love that never fails. […]

  7. nah says:

    I agree with ux-admin on this one. Going from anything to mysql is a step backwards.

    “We already know it” does not make it the right choice for the job. Good engineers learn the tools needed to do the job right, not just how to hack together familiar pieces that they hope will work.

    In the free RDBMS space Postgres performs better, scales easier, and behaves far more consistently than mysql can ever hope. Percona and Xtradb are nice, but inadequate compensation for all the other hack idiocy in mysql. It is truly the PHP of databases (and its sponsor keeps it that way). That more people don’t see this and switch to Postgres is tragic.

    Outside the RDBMS space there are lots of great projects not even mentioned here, like Cassandra (on one end of the scale) and Redis (solving a different problem on the other end). Our architecture uses all three in various places (Cassandra, Postgres, and Redis).

    I don’t know enough about your needs to evaluate your persistence architecture, but mysql … well, best of luck. You’re going to need it (remember this in a few years).

  8. Frank LaRosa says:

    There’s a lot of truth to the idea that true high performance stems from good application design more than from any decision related to hardware or database products.

    Ultimately, the best applications in the world today, in terms of performance and scalability, are applications that take direct responsibility for their own data storage rather than farm it off to a database product. Meaning that the application is aware of scaling and sharding and its developers are forced to deal with those issues every time they interact with the storage engine.

    If you take such an approach, it hardly matters whether the ultimate storage is done with MySQL or Couch or Mongo or something else. True, SQL is philosophically designed to abstract storage details from the developer, but you can ignore that abstraction by eschewing things like foreign keys and joins as the developers have done here.

    Oracle may well outperform MySQL on a per-machine basis. As a developer working on a scalable solution, I shouldn’t depend on that to make or break my product. If I design something truly robust, and truly scalable, then I ought to be able to do what I need simply by increasing the number of nodes without regard to the performance of any individual machine. That is the mark of a successful high-scalability solution.

    Frank

  9. Chris Johnson says:

    The first 3 bullet points about why they liked CouchDB via criticizing relational databases show just how little Sauce Labs really knows about relational databases, and their general lack of maturity and wisdom in the field of computer science.

    “No schemas. This was wonderful. What are schemas even for? They just make things hard to change for no reason.” Right. I’m sure that all the work that Edgar F. Codd did to invent relational database schemas was just to make them hard to change, to piss off the Sauce Labs crew some 40 years later. You noobs obviously know nothing about relational calculus. http://en.wikipedia.org/wiki/Codd%27s_theorem

    “Non-relational. Relational databases grew up solving problems where data integrity was paramount and availability was not a big concern. They have a lot of features that just don’t make sense as part of the database layer in the context of modern web apps.” More of the same ignorance. Yeah, modern web apps don’t need stuff like ACID data persistence. Who cares if we lose the data, or if the data we get back is wrong but without any indication of error?

    The fact that little MySQL makes your data way more “available” than CouchDB could seems to make that statement even more ludicrous.

    “No SQL. It’s 2012, and most queries are run from code rather than by a human sitting at a console. Why are we still querying our databases by constructing strings of code in a language most closely related to freaking COBOL, which after being constructed have to be parsed for every single query?” Guess you clueless beginners have not heard of prepared statements — parsed once. And you conflate SQL with relational database — since there are relational databases that do not use standard SQL. Although why you’d not use one just other than your irrational fear of “Teh SQL!” it’s hard to imagine. Better start coding in binary machine language, because all that text you write as code is “strings of code in a language most closely related to freaking [some older language]”.

    N00bs. Punks. Come back and make this argument in 20 years.

  10. Kalpesh says:

    We recently went though same pain as you guys experienced except we were using Berkely db instead of couch db. We are using cassandra also but facing simliar issues with that and it will soon get replaced with Mysql. I thought its worth linking our experience with yours and spread the Mysql success http://neopatel.blogspot.com/2012/04/from-nosql-to-mysql-bdb-to-mysql-part1.html

  11. ChrisM says:

    @Kalpesh – please for the love of God and all that is holy take a long and proper look at Postgres. I used to be a MySQL devotee and found Postgres hard to set up and get into to begin with, but a few years ago we started to run into huge scalability issues with MySQL. This forced our entire business to reevaluate PostgreSQL and we haven’t looked back since.

    It has so many powerful features that you just do not get with MySQL, and we have found performance to be an order of magnitude greater once you start doing more complex works.

    For me the most telling statement in the rather ill informed Saucelabs writeup is that in MySQL it’s par for the course to have to worry about FORCE INDEX statements. In Postgres you can simply let the planner get on with it’s job and it all just works.

    As others have stated SQL isn’t just some cumbersome mistake that’s there to get in your way. It provides a rich and powerful environment in which to work with your data. With powerful features such as windowing functions and WITH queries (which even let you wrap INSERTs, DELETEs, UPDATEs and SELECTs into single compound queries requiring a single round trip to the database eliminating some race conditions) you can access, query and manipulate your data in ways that the NoSQL crowd can only dream of.

    CHECK constraints and triggers can be used to ensure your data is always integral and internally consistent. This isn’t just a nice to have it is vitally important as you scale and end up with multiple developers working on the same system and / or have multiple programs inserting and manipulating data. Knowing that bugs cannot lead to garbage data (at least structurally) gives amazing peace of mind and leads to fewer application level checks, also reducing the error handling that is required.

    Finally please do not listen to anyone who says that an application should store its own data and be directly responsible for storage performance. The application should make best use of the established tools that are available rather than reinvent the wheel. Writing a data storage engine that is more reliable, efficient, powerful or flexible than any of the SQL databases out there (or even the NoSQL databases) is a seriously non-trivial problem and is almost guaranteed to lead to data loss at one time or another as well as sucking up vital resources and massively delaying your projects time to market. Scalability is almost always a nice problem to have, worry about building the best application you can first and resort to drastic action to scale your application only once you have exhausted all your other options.

  12. bn says:

    Lumping all NoSql solutions under one umbrella is missing the point. It’s not Sql vs NoSql, it’s Sql vs. key/value stores, vs document stores vs column stores vs structure servers…

    There’s at least as much difference between Neo4j and Redis as there is between Redis and MySql.

    You have to look at your problem space before deciding on a database technology. It’s pretty obvious from Sauce Labs statements on how they use MySql that they don’t need a relational db at all. They needed a reliable document store that scales without being I/O bound. They won’t run into any of the issues that cause most people to look at NoSql over relational – i.e. range based, ad-hoc or aggregate queries over very large datasets. These are the some of the primary driving forces behind map/reduce and NoSql. They could probably get by with a really large SAN, fronted by Redis to hold indexes. Or if they can do hosted – go directly to DynamoDb.

    Seems to me the issue in Sauce Labs case was mischaracterization of the problem to be solved and selecting a solution without defining the requirements.

    I build real time data collection systems that collect a LOT of data and I find the best way to come to a decision about data storage is to define the requirements of my application with no storage in the picture at all. Once I do that it becomes obvious what I need to solve the problem.

  13. […] istemiyor). Kardeşim ölçeklenebilir bir servis yazacaksan NoSQL kullanacaksın diyorsanız sizi şuraya alayım. Yeni teknolojilerin baştan kaybetmesinin diğer nedenleri […]

  14. John says:

    Nice article.

    Did you consider Postgres with hstore, or Redis? Would be interesting to hear what the pros and cons of these solutions were for you.

  15. […] Sauce实验室曾经满怀激情的将其应用迁移到CouchDB数据库,但现在却从CouchDB数据库迁移到了传统的MySQL数据库,他们甚至在自己的官方博客上发布了这样的文章:别了,CouchDB! […]

  16. […] 最近,一家提供云端运行Selenium测试的公司Sauce Lab在其官方博客上发表了一篇博客《告别CouchDB》,根据自身云平台的案例,介绍了为何在当初选择CouchDB,而又在现在转而选择MySQL的详细过程。在如今NoSQL大行其道的时候,Sauce Lab为何又要告别NoSQL,转而投入传统关系数据库的怀抱呢? […]

  17. Thank you for all you have done to improve the efficiency of using.

  18. […] overhaul to the database underpinning the Sauce OnDemand service. We recently blogged about that here—well worth the read. Since rolling out the upgrade, customers have enjoyed in excess of 99.95% […]

  19. Again, I’d recommend looking carefully at PostgreSQL 9.2 here. one of the most important things that is coming here is the native JSON type, and PostgreSQL also has relatively mature XML support. This gives you an ability to write your software in such a way that you almost never touch SQL from your app code and you can use schemaless storage with hstore and json (well technically it’s a field in a schema that is largely free-form but checked as a key value store or in the case of json, valid json). You can also write stored procedures in Javascript to manipulate these values and even index the result.

  20. […] all available resources to re-architecting the Sauce cloud service infrastructure.  In May we wrote in detail about our experience.  Since then, Sauce users have enjoyed nearly four-nines (99.99%) availability […]

  21. […] and nural programming to name a few. Is NoSql a good choice for large scale implementations? http://sauceio.com/index.php/2012/05/goodbye-couchdb/ If planning a LargeScale application the includes mobile is Nsql a safe choice for […]

  22. Ryan Weiss says:

    Thanks for writing this outline Steven. I’ve been debating for weeks on which NoSQL or MySQL solution to choose for certain projects. I actually didn’t even know about Percona server and it’s ability to handle NoSQL “queries” (bypassing a FEW layers of the RDBMS and increasing performance), and also able to handle regular SQL queries. The benchmarks blow everything else out of the water, and it’s stable. It seems like a no-brainer, so THANK YOU for enlightening me to its existence.

Leave a Comment