Spoiler Alert: If you read this article, you’ll be one of the first to hear about a previously-unpublicized feature from Sauce. It’s like an easter egg!
It probably comes as no surprise that at Sauce we write a lot of Selenium tests. Our website needs good test coverage, just like our customers’ apps. We have a build that runs all of these tests (and many more unit tests besides) after every chunk of commits. If tests fail in our build, it stays “red” until someone commits a fix. During that time, we can’t deploy the new code, and it’s our custom to not even push more commits on top while the build is red, so the problem can be diagnosed and fixed without complicating matters.
In other words, it’s a big deal when the build breaks because it is potentially interfering with other developers’ workflows. That’s one of the reasons we pull out our hair and yell obscenities when we encounter flakes in our build. A flake occurs when a test that normally passes, or passes under normal conditions, fails non-deterministically (i.e., under seemingly random conditions). If we run the build again, that same test might pass, leaving us without a lot of information about what went wrong. Is something wrong in the code? Is something wrong in our build infrastructure? It leaves us uncertain whether we might actually have a problem with that functionality in production, too—if it’s failing 1 out of every 1,000 times in the build, is it affecting 0.1% of our customers?
On a recent Flakey Friday (a Friday dedicated to tracking down and eleminating flakiness from tests), we caught a test acting strangely, and failing one out of every ten or so runs. The test looked like this:
This is one test for our job* detail page. The
setUp function for this test class handles creating a new random user and a new random job. The test logs the user in and goes to the page for this job. It then clicks a link designed to make the job “public” (i.e., viewable by anyone on the web), checks both the database and the website to make sure the AJAX-powered toggle did its trick, and finally makes sure we can toggle the job back to “private” in the same way.
This is a straightforward test and is built using Selenium test best practices (creating a fresh random user object, a fresh random job, and using spin asserts to avoid race conditions), but every so often it would fail because Selenium would check that the link text changed after click—something that, in these cases, didn’t happen. Likewise, the job was not marked with the appropriate status in the database.
The first step was taken care of in a rather brute-force way: I simply created 14 new versions of the same test, like so:
This way, I could run our custom version of the Nose test runner and have it pick up all and only the tests I was interested in using a wildcard match:
Then, I made use of a feature we have not yet publicized: programmatic Sauce Breakpoints. This is achieved by sending a special Selenium command that the Sauce Cloud understands to mean that you want the job breakpointed. For both Selenium RC and WebDriver, the special command is
sauce: break. For Selenium RC, this command is sent as the
context parameter for
setContext. For Selenium WebDriver, it is passed as the
script value of the
execute command. Luckily, the Python WebDriver API implements these commands, so all I had to do was hack
sauce: break into our main test class’s
if not self.passed:
tearDown logic here says, “If the test didn’t pass, get a traceback and breakpoint the test if I’ve set
self.break_on_fail. Then, report the status to Sauce, and close the WebDriver session.” With all of these modifications in hand, I was able to run the offending test multiple times in parallel like so:
nose --processes=15 test_can_publish_and_back*
Then, all I had to do was go to my Sauce Labs tests page and watch to see which tests turned up as breakpointed. I could navigate to the detail page for a breakpointed test and use the dev tools in Chrome to examine what was happening. In the case of this flake, I discovered the problem was that the AJAX request was not successful—it was receiving a 401 response from our test server. This meant that the CSRF protection for the AJAX POST was messing up somehow. After a lot of website backend debugging, we were able to determine that, under load, new CSRF tokens sometimes took longer to save to our persistent data store than it did for the website to respond with them to the request, making the browser’s next (valid) request appear invalid to the server, thus causing it to reply with a 401. Luckily, upgrading our backend code and making our session save synchronous took care of the problem.
The details I have shared about our particular flake are not important to the big story here. What is important is that we had a kind of flake that was nigh-impossible to pin down without a tool like Sauce Breakpoints. It allowed me (within the space of one parallel Sauce test run) to observe the bug in its natural habitat and get into the dev tools of this problem session, where we were able to find the first clue on the trail which eventually led to squashing the issue. We hope this strategy can also be useful to others who aren’t tolerant of mysterious flakes in their build. Let us know if you can think of any other testing practices which can be augmented by Breakpoints!
Addendum: Selenium RC
The example code of programmatic Sauce Breakpoints above is for Selenium 2, a.k.a WebDriver. Breakpoints also work for Selenium 1 (a.k.a. Selenium RC) tests, but the code is different. Here is our
tearDown function for Selenium 1 tests, which illustrates the use of the
if not self.passed:
* at Sauce, we call an individual test run in our infrastructure by a customer a “job”