Goodbye, CouchDB

May 10th, 2012 by Steven Hazel

Here at Sauce Labs, we recently celebrated the completion of a significant project to improve our service uptime and reliability, as we transitioned the last of our CouchDB databases to MySQL. We’d outgrown CouchDB, to the point that a majority of our unplanned downtime was due to CouchDB issues, so wrapping up this migration was an important milestone for us.

CouchDB was a very positive experience at first, and its reliability isn’t out of bounds for a database that is, after all, only on version 1.2. But our service is very sensitive to reliability issues, we strive to give our users 99.99% uptime for their Selenium testing, and ultimately we decided that this transition was the most efficient path forward for us.

Once we decided on MySQL (specifically, we’re now using Percona, with its InnoDB-based XtraDB storage engine), we rearchitected our DB abstraction layer and one by one migrated all our databases, large and small. Our uptime was dramatically improved over the past couple months as we worked through the migration, and performance was slightly improved in the bargain.

This post describes our experience using CouchDB, and where we ran into trouble. I’ll also talk about how this experience has affected our outlook on NoSQL overall, and how we designed our MySQL setup based on our familiarity with the positive tradeoffs that came with using a NoSQL database.

First, how did we get into trouble?

Everything Was Going to be Great

When we first started Sauce Labs back in 2008, we thought we were building something very different from the service we run today. We were excited to try a NoSQL db, having spent too many years using MySQL in ways that the designers of relational databases never imagined. CouchDB seemed well suited to our needs.

Our original product design featured a REST API for storing data on behalf of our customers, and the plan was to drop almost straight through to CouchDB’s already RESTful API. This let us get a prototype up and running in a hurry. It didn’t matter that CouchDB was new and not yet hardened by a lot of real-world usage, we thought, because our database I/O needs were meager, our app was naturally horizontally scalable, and our product was fault-tolerant. We could easily bridge any reliability gap just by keeping replicas and retrying things when they failed. What could go wrong?

What Could Go Wrong

As our little company grew, and we learned about the problems our customers faced, our product underwent several major changes. It made less sense over time to partition data so strictly by user. We came to rely more on database I/O performance. In general, we found ourselves using CouchDB very differently from how we’d originally imagined we would, and much more the way most web apps use databases. That was still a reasonable way to use CouchDB, but the margin of safety we thought we had when choosing it slowly evaporated as our product evolved. And, as it turned out, we needed that margin of safety.

Sauce Labs’ service is more sensitive to reliability issues than the average web app. If we fail a single request, that typically fails a customer’s test, and in turn their entire build. Over time, reliability problems with CouchDB became a serious problem. We threw hardware at it. We changed the way we used CouchDB. We changed our software to rely much less on the database and do much less database I/O. Finally, we decided the best next step was to switch.

Again, none of this speaks particularly badly about CouchDB. It’s a young database, and reliability and performance issues are to be expected. And in a way it’s too bad, because we have no love for classical relational databases. We’re convinced that NoSQL is the future. We’re just not convinced it’s the present.

Some Things We Really Liked about CouchDB

  • No schemas. This was wonderful. What are schemas even for? They just make things hard to change for no reason. Sometimes you do need to enforce constraints on your data, but schemas go way too far. With CouchDB, adding new fields to documents was simple and unproblematic.
  • Non-relational. Relational databases grew up solving problems where data integrity was paramount and availability was not a big concern. They have a lot of features that just don’t make sense as part of the database layer in the context of modern web apps. Transactional queries with 6-way joins are tempting at first, but just get you into trouble when you need to scale. Preventing them from day one is usually easy.
  • No SQL. It’s 2012, and most queries are run from code rather than by a human sitting at a console. Why are we still querying our databases by constructing strings of code in a language most closely related to freaking COBOL, which after being constructed have to be parsed for every single query?

    SQL in its natural habitat

    Things like SQL injection attacks simply should not exist. They’re a consequence of thinking of your database API as a programming language instead of a protocol, and it’s just nuts that vulnerabilities still result from this poorly thought out 1970s design today.

  • HTTP API. Being able to query the DB from anything that could speak HTTP (or run curl) was handy.
  • Always-consistent, append-only file format. Doing DB backups just by copying files was simple and worry-free.
  • Javascript as a view/query language was familiar and useful.
  • Indexes on arbitrary calculated values seemed like a potentially great feature. We never ran into a really brilliant way to use them, though it was straightforward to index users by email domain name.
  • Finally, it’s worth pointing out that even under stress that challenged its ability to run queries and maintain indexes, CouchDB never lost any of our data.

The Problems We Encountered with CouchDB

Availability:

  • In our initial setup, slow disk performance made CouchDB periodically fail all running queries. Moving to a much faster RAID setup helped, but as load increased, the problems came back. Percona is not breaking a sweat at this load level: our mysqld processes barely touch the CPU, we have hardly any slow queries, the cache is efficient enough that we’re barely doing disk reads, and our write load is a very comfortably small percentage of the capacity of our RAID 10 arrays.
  • Views sometimes lost their indexes and failed to reindex several times before finally working. Occasionally they’d get into a state in which they’d just reindex forever until we deleted the view file and restarted CouchDB. For our startup, this was agony. Surprise reindexing exercises were the last thing we needed as a small team already taking on a giant task list and fighting to impress potential big customers.
  • Broken views sometimes prevented all views from working until the poison view file was removed, at which point view indexing restarted its time-consuming and somewhat unreliable work. I don’t know how many times one of us was woken up by our monitoring systems at 4am to learn that our service was down because our database had suddenly become a simple key/value store without our permission.
  • Compaction sometimes silently failed, and occasionally left files behind that had to be removed to make it work again. This led to some scary situations before we tightened up our disk usage alarms, because we discovered this when we had very little space left in which to do the compaction.
  • In earlier versions, we ran into three or four different bugs relating to file handle usage. Bug reports led to quick fixes for these, and these problems were all gone by version 1.0.2.

Performance:

  • There’s really only one thing to say here, and that’s that view query performance in CouchDB wasn’t up to the level of performance we’d expect from roughly equivalent index-using queries in MySQL. This was not a huge surprise or a huge problem, but wow, a lot of things are quicker now, and our database machines are a lot less busy.

Maintenance headaches:

  • When CouchDB fails, it tends to fail all running queries.  That includes replication and compaction, so we needed scripts to check on those processes and restart them when necessary.
  • View indexes are only updated when queried — insertion does not update the index.  That means you have to write a script to periodically run all your views, unless you want them to be surprisingly slow when they haven’t been queried in a while. In practice we always preferred view availability to any performance boost obtained by not updating indexes on insertion, but writing reliable scripts to keep view indexes up to date was tricky.
  • The simple copying collector used for compaction can spend a lot of time looking at long-lived documents. That’s particularly bad news when a database has both long-lived and short-lived documents: compaction takes a long time, but is badly needed to keep disk usage under control. Plus, you have to run compaction yourself, and monitoring to make sure it’s working is non-trivial. Compaction should be automatic and generational.

Unfulfilled promise:

  • CouchDB’s design looks perfect for NoSQL staple features like automatic sharding, but this is not something it does.
  • What is the point of mapreduce queries that can only run on a single machine? We originally assumed this feature was headed toward distributed queries.
  • It was never clear to us what the CouchDB developers considered its core use cases. We saw development focus on being an all-in-one app server, and then on massive multi-direction replication for mobile apps. Both interesting ideas, but not relevant to our needs.

(We’re told that a few of these issues have already been addressed in the recently-released CouchDB 1.2.)

We were able to work with CouchDB’s performance, and over time we learned how to script our way around the maintenance headaches. And while we were worried that CouchDB seemed to be gravitating toward use cases very different from our own, it was the availability issues that eventually compelled us to switch. We talked about a number of possible choices and ultimately settled on a classic.

MySQL, the Original NoSQL Database

So why not switch to another document-oriented database like MongoDB, or another NoSQL database? We were tempted by MongoDB, but after doing some research and hearing a number of mixed reviews, we came to the conclusion that it’s affected by a lot of the same maturity issues that made CouchDB tough for us to work with. Other NoSQL databases tended to be just as different from CouchDB as MySQL — and therefore just as difficult to migrate to — and a lot less well known to us. Given that we had experience with MySQL and knew it was adequate for our needs, it was hard to justify any other choice.

We’re familiar with MySQL’s downsides: among other things, it’s terrible to configure (hint: the most important setting for performance is called innodb_buffer_pool_size), and its query engine, besides being SQL-oriented, guesses wrong about how to perform queries all the time. Experienced MySQL users expect to write a lot of FORCE INDEX clauses.

The InnoDB storage engine, on the other hand, is pretty great overall. It’s been hardened by heavy use at some of the biggest internet companies over the past decade, dealing with workloads that resemble those faced by most modern developers. At the lowest level, almost any database is built on the same fundamentals of B-trees, hashes, and caching as InnoDB. And with respect to those fundamentals, any new database will have to work very hard to beat it on raw performance and reliability in real-world use cases. But maybe they won’t all have to: Percona’s forward-thinking key/value interface is a good example of how the solid InnoDB storage engine might make its way into true NoSQL architectures.

In switching to MySQL, we treated it as much like a raw storage engine as we reasonably could. So now we’re back to using MySQL in the way that inspired so much NoSQL work in the first place:

  • We ported our CouchDB model layer to MySQL in a way that had relatively minor impacts on our codebase. From most model-using code, using MySQL looks exactly the same as using CouchDB did. Except it’s faster, and the DB basically never fails.
  • We don’t use foreign keys, or multi-statement transactions, or, so far, joins. When we need to horizontally scale, we’re ready to do it. (But it’ll be a while! Hardware has gotten more powerful since the days when sharding was invented, and these days you can go a long way with just a single write master.)
  • We have a TEXT column on all our tables that holds JSON, which our model layer silently treats the same as real columns for most purposes. The idea is the same as Rails’ ActiveRecord::Store. It’s not super well integrated with MySQL’s feature set — MySQL can’t really operate on those JSON fields at all — but it’s still a great idea that gets us close to the joy of schemaless DBs.

It’s a nice combination of a proven, reliable database storage engine with an architecture on top of it that gives us a lot of the benefits of NoSQL databases. A couple months into working with this setup, we’re finding it pretty hard to argue with this best-of-both-worlds approach.

Comments (You may use the <code> or <pre> tags in your comment)

  1. Jason says:

    So why did you go back to MySQL instead of moving to PostgreSQL? HSTORE and the builtin JSON data type with 9.2, which are all indexable through GIN, are a fantastic NoSQL solution, and fully queryable.

  2. Ben Atkin says:

    Nice article. I too like CouchDB but find myself using SQL. I noticed that you wrote about how you considered using MongoDB but decided against it. Was PostgreSQL ever in the running? I think HStore would be useful when migrating away from a document-oriented NoSQL database.

  3. Mike Curry says:

    We almost made the mistake of using CouchDB at the beginning, we dodged the bullet right off the start and used MySQL… although now we are kind of regretting not using Postgres.

    Mike/Verelo.com

  4. this post was very useful, and I thank you for posting it.
    I would be interested in a follow up: Why choose MySql among other Relational Databases?

  5. Sean Grove says:

    I’d agree that Postgres seems like the more natural progression here, especially with the features Jason mentioned. That said, great post, and thank you for sharing your experiences, and thanks for letting me experience some growing pains with CouchDB!

  6. Noah Yetter says:

    Relational is still the future. If you don’t think data integrity still matters, well, good luck to you.

    Maybe you should attempt to understand what schemas are *for* before dismissing them, and the relational model, as useless.

    And last but not least, try building some reports and then get back to me about how useless SQL is. BTW if you have problems with SQL injection it’s 100% due to your own poor programming practices (use bind variables! always! no excuses!).

  7. Father Time says:

    So glad you decided to grow up and join the real world.

    Schemas, dumbass, are one of the things that makes MySQL faster than your arbitrary document store disaster.

  8. Drew says:

    I’m curious too if PostgreSQL was ever in the running and if so, what made you decide not to go that direction.

    Thanks for sharing!

  9. Chris F says:

    We went with PostgreSQL, bad idea. Stay with MySQL specifically the Percona XtraDB version. Percona has percona-tools(formerly maat-kit) and excellent clustering abilities. PostgreSQL not so much.

  10. kelleysislander says:

    Thanks for the heads-up on couchDB. I too am wondering why mySQL instead of Postgres? IMHO, mySQL is the “Fisher-Price” version of a database while PG is “the real thing” in terms of features and functionalities. The hstore datatype column is exceedingly fast and would have served you well based on what you said in the article…

    Look at Instagram with 30 million users on a PG database, or skype that runs on a PG database…

  11. Jason Belich says:

    hrm… you lost me at “expect to write a lot of FORCE INDEX clauses”.. While it’s true the configuration of mysql is unmitigated hell, if you have to force index on a query, that’s definitive evidence either that you have _royally_ screwed up your schema, or you’re composing a query at the outer edge of mysql’s (or your own) capabilities.

  12. Don Park says:

    Great writeup. I use couchdb on my small project and Ive seen some of the same problems. I have not seen couch die or have any reliability problems, but im not using it very hard.

    The two biggest problems I’ve seen with couchdb are:

    1. view indexes need to be kept up to date manually. this trade-off should have went the other way, make the writer wait for a view index update instead of making the first reader wait.

    2. storing data on disk using strictly append-only sounds nice but its a big problem when only half the available space is actually available because a compaction run takes at least as much space as the data store being compacted.

    Couchdb 1.2 added support for automatic compaction, and now compresses the on-disk files for the database and view index which should be a huge space savings.

  13. I can understand the move from NoSQL to SQL. You change from apples to pears.
    But I can’t understand why NOT use PostgreSQL!
    Is it the lack of available DBAs and support consultants?
    Clearly there’s no technical reason to use MySQL. It was, maybe, 10 years ago. Or, am I missing something?

  14. greenlight says:

    kelleysislander: If you’re going to throw around examples with big numbers, Facebook run (a highly tunes/customized, but open-sourced) MySQL.

  15. Nathan Rice says:

    While I can’t condone the nastiness of some other comments, they are correct in several ways:

    1. SQL is actually nice, if you take the time to learn it. People who complain about it tend to not understand it very well.

    2. Using MySQL with JSON CLOBs to avoid creating and maintaining a database schema is really just a bad idea. If you really have totally schema-less data you should look into using a key/value store together with a relational product. Postgres does this with HStore as others mentioned, but MySQL + Redis is a viable option as well.

  16. Bill Karwin says:

    SQL injection is an important issue, but NoSQL databases are not immune from code injection vulnerabilities. They are vulnerable when developers interpolate untrusted input into Javascript code at runtime, and then execute it as a mapreduce function. Injection vulnerabilities of either type are an application design failure, not a database failure.

    When you need schemaless data but also need it indexed, consider the inverted-index approach FriendFeed uses:
    http://backchannel.org/blog/friendfeed-schemaless-mysql

    Another site I’ve worked on uses Apache Lucene/Solr to index their document-oriented data, after storing it persistently in MySQL. They can fully index their data in a non-relational way, do faceted searches, even run their site off the Lucene store temporarily when they need to restart MySQL.

    I never use FORCE INDEX. If the optimizer isn’t choosing an index on its own, you’ve defined the wrong indexes. Your SQL queries are not so complex that the optimizer can’t handle them. Also, littering your code with FORCE INDEX will prevent your queries from using the more appropriate index when you finally do create that index.

  17. kelleysislander says:

    for Facebook mysql didn’t quite work out so they had to build the cassandra db:

    http://en.wikipedia.org/wiki/Apache_Cassandra

  18. Rob says:

    kelleysislander,

    Facebook only built Cassandra for a very small piece of their architecture – the inbox search. They also don’t use Cassandra anymore either, having moved that part of the architecture to HBase.

    My understanding, though, is that the core architecture of the site is, and always has been, MySQL. Very heavily modified MySQL, but MySQL nonetheless.

    People need to quit treating this stuff like it’s a religion. Use whatever tools are best for you.

  19. Bill Karwin says:

    @kelleysislander: Facebook still uses MySQL for most of their data. They are huge operation, with multiple needs for specialized data handling. When they have specialized needs, they use other technologies too. They did use Cassandra at one time for inbox searching, but they don’t use it for everything.

    Recently, Facebook has been moving away from Cassandra and toward HBase. That’s natural that they rethink their architecture continually, since their needs keep expanding.

    https://www.facebook.com/note.php?note_id=454991608919

    Facebook is probably not the best example to base technology choices on, because their scale is unique, making them kind of a different use case than the rest of the world.

  20. Sean McQuillan says:

    Congrats guys! I’m happy to see that you’ve completed the transition.

    Nice blog post Steve, I bet you were happy to finally be able to write it :).

  21. Alexander says:

    > PostgreSQL not so much.

    Bollocks. Postgres has excellent replication support, and there are tons of add-ons (eg., PgPool) that provide additional clustering services such as sharding.

    I agree with the others. Postgres is a more natural choice these days. HStore, transactional DDL, functional and partial indexes, superb GIS support, GIN/GIST indexes, and _no_ legacy cruft.

  22. Any particular reason that you didn’t go w/ BigCouch (https://github.com/cloudant/bigcouch)? It deals with a lot of the issues that you described above.
    That, combined with about an afternoon of perl pretty much resolved all of the issues that you described above for us…

  23. Garry says:

    Alexander,

    Postgres’ replication support is only viable in 9. Slony was an utter failure. MySQL has had working replication for years.

    Postgres 9’s replication is quite good though, judging from the one large database we replicate; it tends to work out nicely.

  24. Matt Freeman says:

    How are you setting up indexing if mySQL doesnt understand json, do you have another out of band process that creates tables that emulate the ‘views’ of couchdb?

  25. Jack8 says:

    You have made the right decision.

    We are using InnoDB for more than 8 years – never lose a single bit of data, unplanned downtime is almost zero, this is just unbelievable!

    InnoDB is such as an amazing product.

  26. Jack8 says:

    For Postgres, they even don’t have a bug tracking system (check it out if you don’t believe!), I won’t use it until it is fixed.

  27. KG says:

    @Jack8

    You won’t use a product unless it has a bug tracking system? Do you use Windows? Where’s their bug tracking system? I guess you should uninstall that and only use only a Linux variant from here on out because, ya know, they have a bug tracking system. And you can skip on Mac OS X, too.

    How about MySQL? Have you looked at their “bug tracking system”? You assume because it’s there that they are actually going to look at your bug request, right? Is that what gives you the warm and fuzzies? When you file that bug, I hope you go to sleep knowing full well that the dev team is going to ignore the hell out of it as soon as it crosses their inbox.

    I kid only to make a point. The fact that PostgreSQL doesn’t have a “bug tracking system” in the traditional sense doesn’t mean the developers don’t actively fix bugs. THAT is absurd. PostgreSQL is a very mature product. It’s battle hardened, and, in my (and many, many others) opinion a tank compared to MySql’s toy truck.

    I believe they don’t have a bug tracking system. Yet I didn’t actually write them off just because they don’t. If you actually used PostgreSQL, even though it (GASP!!!!) doesn’t have a bug tracking system, you’d probably shit yourself at how stable it is, despite the fact that there is no bug tracking system. How many times can I write bug tracking system in this post?!?

    Just use it. What have you got to lose? It’s free. It’s open source. It’s mature. It’s stable. I won’t say it’s better than MySql outright because, frankly, I haven’t done analysis to confidently say one way or the other. Just try it. It’s a great product.

    Sorry, your statement really left a bad taste in my mouth with how ignorant it came off.

  28. Jack8 says:

    @KG

    If you compare PostgreSQL with a proprietary software then I have nothing to say.

    For opensourced projects, the key is we all have the FREEDOM to choose, I will avoid opensource projects if they don’t have a bug tracking system, and so far only PostgreSQL hit this criteria.

    After all, this is just my choice and none of other people business.

    I never advise others not to use PostgreSQL, I just bring in more information – PostgreSQL don’t have a bug tracking system, whether you use PostgreSQL is none of my business, too.

    TL;DR

    I don’t use PostgreSQL because it don’t have a bug tracking system, this is IMHO and you should make your own choice.

  29. pat says:

    From the post: “Indexes on arbitrary calculated values”

    FWIW, this is more a general database feature than a NoSQL or CouchDB feature. Every RDBMS I’ve used except MySQL has had it. (And yes, it is indeed a great feature! I hope MySQL adds it someday.)

    Noah said: “Relational is still the future. If you don’t think data integrity still matters, well, good luck to you.”

    The relational model is completely independent from data integrity. You can ensure data integrity with files on disk (where do you think your MySQL data lives?), and you can write relational code that can lose data left and right (and many people did, back in the bad old days of MyISAM). Please don’t take a universally good attribute of a data store and pretend that the only way to achieve it is to use your particular technology. Object databases, for example, are not relational, yet they have transactions and schemas and all that good stuff that you like.

  30. Alex says:

    Jack8: Have you found a bug in Postgres? I’ve occasionally wished for a new feature in Postgres, but I’ve never found a real bug in it. Their release process is rock solid.

    In contrast, I’ve run across several data integrity bugs in (latest, stable versions of) MySQL. Upon searching, I’ve found them dutifully filed in the bug tracker, sometimes over 5 years ago, with no action since then, except other people asking for status updates. (Part of the problem seems to have been with ownership: when Oracle bought MySQL, the whole release process changed, and version 6 kind of got dropped on the floor. Not that I blame them: Oracle buying you is enough to make any project stumble!)

    Is the feature-checkbox for “has a bug tracker” more important than actually fixing bugs as they’re found? I agree that it’s always better to have a bug tracker than not, but Postgres has been turning out annual major releases without issue for years, while MySQL in the recent past has had some trouble with this. I give credit to a project for having a bug tracker, but if they’re not using it to help drive quality, then what use does it serve anyone? We might as well be filing our bugs at /dev/null.

  31. Martin says:

    Leaving CouchDB towards a “Key – JSON-values-over-MySQL” solution sounds very awkward. You left a document DB and its Map/Reduce indexer for a relationnal DB used as a bare key-value store. If the move was so easy, maybe your documents looked too much like tables ?
    By the way, you didn’t gave us any hint about your databases’ size ?

    I’m also surprised by your masochist DB approach : when it’s CouchDB you install it and expect unicorns, then you go back to MySQL and prepare for a tweaking fight…

    Like any other DB indeed, Couch relies on “B-trees, hashes, and caching”. But did’t you red everywhere that Javascript is slow and that views are 4 times faster in Erlang ?

    You expected availability but didn’t even mention a look at clustering tools like BigCouch or Lounge ?

    The article looks like you gave up the fight, quickly coming back to a feels-like-home MySQL. Fine, but don’t blame the software then.

  32. Joel Jacobson says:

    “No schemas. This was wonderful. What are schemas even for?” hahaha, hilarious

  33. KG says:

    @Alex – well stated.

    @Jack8,

    You are exactly right. It is your choice. And I respect your opinion. In the end, you are the one who has to make your own decision.

    The fact that you made a comment on someone else’s article (who I also happened to read, and enjoy) about your choice puts it out there in the wild.

    Your statement about me comparing PostgreSQL to proprietary software (I assume you are referring to Windows and OS X) and having nothing to say is sad to me. You say you won’t use PG due to lack of bug tracking. @Alex makes a great point, in that bugs filed for MySql rarely get addressed. The exact same thing goes for Windows and OS X. Even worse in some cases because they are much larger scale projects. That is why I asked if you used Linux, as that is the only OS you should be using if you require bug tracking.

    Why does this not register to you as a ridiculous argument? The only reason I’m actually taking time out of my morning to address this is to give you more information about your decisions. As a developer (which I assume you are as you mention InnoDB and not losing data) I would hope you are constantly evaluating new tools and products, but I know lots of bias exists. And that’s fine.

    If you are comfortable with MySql, that’s fine. It’s a perfectly good product for some situations. However, when you take everything you’ve said into context, it appears you place a high value on data integrity. If this is the case, who cares about some bullshit bug tracking software? PG is solid software. We’ve been using PG in a high availability production environment for years. Prior to version 9.0, the only problems we encountered were with data replication, and some of that was hardware related in the end. After 9.0 (currently on 9.2 I believe) that issue has been eliminated. Like @Alex said, PG releases updates much more frequently than MySql.

    TL;DR – I use PostgreSQL because it doesn’t suck at it’s main purpose: storing and retrieving my data. If you are concerned at all with your product doing the same (doesn’t matter which), what difference is a bug tracking system going to do other than give you insight into how many bugs AREN’T getting addressed?

  34. kelleysislander says:

    Bill and Rob, many thanks for educating me / us on how Facebook handles their data…

    So it now appears that Facebook runs mySQL in the same sense that NASCAR runs Chevy’s then, eh? So “technically” it started out as the same mySQL we can all get, and they evolved it into something that they could actually use in their large environment.

    Since most of us do not have the luxury of such we are stuck with the out of the box open source product, which is about as close to what FB uses as a Chevy Imapala is to Dale Earnhardt Jr’s car, but still a “Chevy” nonetheless…

    Cheers

  35. HandlerSocket is astonishingly fast, faster than even redis and memcached even though it’s using MySQL’s underlying structure and it can integrate transparently with SQL interfaces too (e.g. write with handlersocket, read with SQL).

    Another thing worth a look is TokuDB, an InnoDB drop-in replacement (or supplement) with fractal indexes that in particular make for incredibly fast inserts, efficient disk use, lower replication latency and much less degradation over time, if that’s something that’s important to your app.

  36. Clarence says:

    Kelllyiskander: You are quite wrong on most conclusions. Firstly Facebook releases it’s modifications as open source, so you can use that. Further your Nascar comparison is not very valid. Facebold patches are for bottlenecks that they discover at their huge scale that hardly nobody else is troubled by. Same as they also patches menages to run at their scale. Fact is mostly anyone else is perfectly fine with almost stock product, or better, the Percona patchset.

  37. eMBee says:

    btw: there are also some comments on this article here:
    http://lxer.com/module/forums/t/33194/

    one point that is brought up there about postgresql is the claim that they apparently do not support upgrading the dataformat if it changes, and require the use of pg_dump to upgrade. that can be expensive if not impossible if there is lots of data. this may not be enough reason to choose mysql over it, but still is something that has me worry…

    btw i had not heared about hstore before. it looks really interesting, and may be just what i need.

    greetings, eMBee.

  38. : INDEX mb says:

    [...] and saucelabs say no to NOSQL. [I am saving these for the next time someone suggests putting book data in a [...]

  39. Cl S says:

    I agree with Martin.

    This post made it sound like Sauce Labs gave up too early/easy.
    The constant indexing/compaction issue was fixed in version 1.1.1
    The compaction automation was added in version 1.2

    The post talked about issues, failed, problems, but it doesn’t have any substance that explained what exactly happened. How was the CPU and IO on the servers? What were running when the error occured? How many CPUs do you have? Are they all high on iowait when Couch is doing something that is CPU bound?

    Have you tuned your filesystem IO? Do you know what’s your disk latency?

  40. Quora says:

    Which nosql is suitable for backend of a HRM application?…

    CouchDB should work. I do not have experience with it and I also see articles like this http://saucelabs.com/blog/index.php/2012/05/goodbye-couchdb/

  41. M@ says:

    We have 4 massive webapps, with, I believe, higher use than SauceLabs. We evaluated several DB options including Couch, Mongo, Postgres, Oracle, MySQL, and settled on Mongo. There have been growing pains, but the company has been very responsive to change request, and has kept pace with what we need. We serve dozens-of-thousands of requests per minute- with 0 failed requests a day – and are heavily using the database for every request.

    Thought I’d share. YMMV.

  42. kelleysislander says:

    Hi Clarence, Thanks for letting me know where I was wrong… But the point about NASCAR is quite valid indeed: Were I faced with a production bottle neck as they have faced then me, or the rest of us, certainly do not have the luxury of a huge expensive dev team to solve the problem on the fly as they did. NASCAR wants speed and can afford to develop it – I want speed too but cannot afford to develop it with my limited resources. So when FB “patched” mySQL they made a MAJOR improvement to it to the point that internally it is closer to Dale Earnhardt Jr’s Chevy than my Impala, in those respects.

    It is great that they shared their solutions and that we can all benefit – but again, they are running a heavily modified version just like NASCAR runs heavily modified cars that are still called Chevrolet. Yes they made it more robust because they had to – it was insufficient out of the box and for some reason of their own they made the decision to stick with mySQL. They made the product better for us all – but not after modifying to to suit their needs. Absent their involvement mySQL would still be where it was before they fixed it up which leads to the question:

    Do you select a technology for what it does today, or do you select it because you think somebody might make it better one day?

    mySQL remains a really decent database for a great majority of projects which is why it is so ubiquitous, and just because FB chose it lends it some more street cred, but there’s always a backstory snd we may never really know why they chose mySQL over more worthy offerings such as Postgres. But in the end who cares because that is really the overarching value of open source – we all benefit!

    Cheers

  43. Angry Dude says:

    This wonderful video should have been watched by the team before choosing CouchDB, perhaps

    http://www.youtube.com/watch?v=b2F-DItXtZs

  44. Miguel says:

    I created docDB (https://github.com/marbdq/docdb) for similar reasons!

  45. Joerg says:

    The (non-)compatibility argument against PostgreSQL is kind of missing the point and highlights an interesting view of production systems. PostgreSQL indeed doesn’t support switching minor (x.y) versions without downtime for data migration. If that is a problem, it begs the question of why you want to do it in first place. Because you can? Not something to do in a production environment. Because you need security updates / critical bugfixes? That should be an update on the release branch, so x.y.z, not to x.y+1. Those release branches exist for a reason. So the short version is, only change the minor version and associated release branch if you need the new functionality and can schedule the down time. Given all the testing normally required, it’s not that big a red flag…

    The user hitting this (accidently) sounds more like a case of missing RTFM. Even the release notes have been explicit about this for ages.

  46. Greg says:

    Not many people talking about Riak (http://basho.com), which is what we use for production NoSQL. We tried CouchDB and MySQL in prototyping but rejected both. Riak’s clustering *is native and automatic*, and we even use it across multiple datacenters with no problems. I don’t work for Basho, but I can say we’ve had zero downtime with Riak.

  47. Arlo says:

    “Experienced MySQL users expect to write a lot of FORCE INDEX clauses.” is only true if the database was designed terribly. MySQL is very good at picking the right index and optimizing the query, you just have to set indexes up intelligently.

    Your entire use of MySQL seems to be kind of like shoving a square in a round hole. Why not just use it the way it was meant to be used? You will find it faster and more reliable.

Leave a Comment