How Leverages Appium for Mobile Test Automation

July 1st, 2014 by Amber Kaplan

We love this blog post written by Quentin Thomas at HotelTonight! In it, he explains how they use Appium to automate their mobile tests. He also walks readers through specifics, such as the RSpec config helper. Read a snippet below.

Thanks to the engineers at Sauce Labs, it is now possible to tackle the mobile automation world with precision and consistency.

Appium, one of the newest automation frameworks introduced to the open source community, has become a valuable test tool for us at HotelTonight. The reason we chose this tool boils down to Appium’s philosophy.

“Appium is built on the idea that testing native apps shouldn’t require including an SDK or recompiling your app. And that you should be able to use your preferred test practices, frameworks, and tools”.

-Quentin Thomas, HotelTonight, June 17, 2014

To read the full post with code, click here. You can follow Quentin on Twitter at @TheQuengineer.

Have an idea for a blog post, webinar, or more? We want to hear from you! Submit topic ideas (or questions!) here.

Bleacher Report’s Continuous Integration & Delivery Methodology: Test Analytics

June 24th, 2014 by Amber Kaplan

This is the final post in a three part series highlighting Bleacher Report’s continuous integration and delivery methodology by Felix Rodriguez.  Read the first post here and the second here.

Last week we discussed setting up an integration testing server that allows us to post, which then kicks off a suite of tests. Now that we are storing all of our suite runs and individual tests in a postgres database, we can do some interesting things – like track trends over time. At Bleacher Report we like to use a tool named Librato to store our metrics, create sweet graphs, and display pretty dashboards. One of the metrics that we record on every test run is our PageSpeed Insights score.

PageSpeed Insights

PageSpeed insights is a tool provided by Google developers that analyzes your web or mobile page and gives you an overall rating. You can use the website to get a score manually, but instead we hooked into their api in order to submit our page visit score to Liberato. Each staging environment is recorded separately so that if any of them return measurements that are off, we can attribute this to a server issue.

average page speeds

Any server that shows an extremely high rating is probably only loading a 500 error page. A server that shows an extremely low rating is probably some new, untested JS/CSS code we are running on that server.

Below is an example of how we submit a metric using Cukebot:


require_relative 'lib/pagespeed'
Given(/^I navigate to "(.*?)"$/) do |path|
  visit path
  pagespeed =
  ps = pagespeed.get_results
  score = ps["score"]
  puts "Page Speed Score is: #{score}"
  metric = host.gsub(/http\:\/\//i,"").gsub(/\.com\//,"") + "_speed"
    puts "Could not send metric"


require 'net/https'
require 'json'
require 'uri'
require 'librato/metrics'

class PageSpeed
  def initialize(domain,strategy='desktop',key=ENV['PAGESPEED_API_TOKEN'])
    @domain = domain
    @strategy = strategy
    @key = key
    @url = "" + \
      URI.encode(@domain) + \

  def get_results
    uri = URI.parse(@url)
    http =, uri.port)
    http.use_ssl = true
    http.verify_mode = OpenSSL::SSL::VERIFY_NONE
    request =
    response = http.request(request)

  def submit(name, value)
    Librato::Metrics.authenticate "", ENV['LIBRATO_TOKEN']
    Librato::Metrics.submit name.to_sym  => {:type => :gauge, :value => value, :source => 'cukebot'}


Google’s PageSpeed Insights return relatively fast, but as you start recording more metrics on each visit command to get results on both desktop and mobile, we suggest building a separate service that will run a desired performance test as a post – or at least in its own thread. This will stop the test from continuing its run or causing a test that runs long. Which brings us to our next topic.

Tracking Run Time

With Sauce Labs, you are able to quickly spot a test that takes a long time to run. But when you’re running hundreds of tests in parallel, all the time, it’s hard to keep track of the ones that normally take a long time to run versus the ones that have only recently started to take an abnormally long time to run. This is why our Cukebot service is so important to us.

Now that each test run is stored in our database, we grab the information Sauce stores for run time length and store it with the rest of the details from that test. We then submit that metric to Librato and track over time in an instrument. Once again, if all of our tests take substantially longer to run on a specific environment, we can use that data to investigate issues with that server.

To do this, we take advantage of Cucumber’s before/after hooks to grab the time it took for the test to run in Sauce (or track it ourselves) and submit to Librato. We use the on_exit hook to record the total time of the suite and submit that as well.

Test Pass/Fail Analytics

To see trends over time, we’d also like to measure our pass/fail percentage for each individual test on each separate staging environment as well as our entire suite pass/fail percentage. This would allow us to notify Ops about any servers that need to get “beefed up” if we run into a lot of timeout issues on that particular setup. This would also allow us to quickly make a decision about whether we should proceed with a deploy or not when there are failed tests that pass over 90% of the time and are currently failing.

The easiest way to achieve this is to use the Cucumber after-hook to query the postgres database for total passed test runs on the current environment in the last X amount of days, and divide that by the total test runs on the current environment in the same period to generate a percentage, store it, then track it over time to analyze trends.


Adding tools like these will allow you to look at a dashboard after each build and give your team the confidence to know that your code is ready to be released to the wild.

Running integration tests continuously used to be our biggest challenge.  Now that we’ve finally arrived to the party, we’ve noticed that there are many other things we can automate. As our company strives for better product quality, this pushes our team’s standards with regard to what we choose to ship.

One tool we have been experimenting with and would like to add to our arsenal of automation is So far we have seen great things from them and have caught a lot of traffic-related issues we would have missed otherwise.

Most of what I’ve talked about in this series has been done, but some is right around the corner from completion. If you believe we can enhance this process in anyway, I would greatly appreciate any constructive criticism via my twitter handle @feelobot. As Sauce says, “Automate all the Things!”

Have an idea for a blog post, webinar, or more? We want to hear from you! Submit topic ideas (or questions!) here.

Guest Post: Bridging the Test Divide – Beyond Testing For A Release

June 16th, 2014 by Amber Kaplan

This is the second of a three part series by Matthew Heusser, software delivery consultant and writer. 

When I start to think about testing, I think about it in two broad strokes: new feature testing and release-testing. New feature testing tries to find problems with something new and specific, while release-testing happens after “code complete”, to make sure the whole system works together, that a change here didn’t break something there.

Release-testing (which some call regression testing) slows down the pace of release and delays feedback from our customer. Release-testing also increases cycle time – the time from when we begin work on a feature until it hits production. Over time, as our software become more complex, the amount of testing we want to do during release testing goes up.

Meanwhile, teams want to ship more often, to tighten the feedback loop.

Today I am going to talk about making release testing go away – or at least drastically reducing it.

It all starts during that tutorial in Spain I wrote about last time.

Two Worlds

The frequency of release for the people in my tutorial was very diverse, but two groups really struck me — the telecom that had a four-month test-release cycle, and the Latvian software team with the capability to deploy to production every single day.

That means arriving at the office the morning, looking at automated test runs, and making a decision to deploy.

There is a ‘formula’ to make this possible. It sounds simple and easy:

  • Automate a large number of checks on every build
  • Automate deploy to production
  • Continuously monitor traffic and logs for errors
  • Build the capability to rollback on failure

That transforms the role of test from doing the “testing we always do” to looking at the risk for a given release, lining it up against several different test strategies, and balancing risk, opportunity, reward, and time invested in release-testing?

The trick is to stop looking at the software as a big box, but instead to see it as a set of components. The classic set of components are large pieces of infrastructure (the configuration of the web server, the connections to the database, search, login, payment) and the things that sit on top of that – product reviews, comments, static html pages, and so on. Develop at least two de-ploy strategies — one for audited and mission-critical systems (essential infrastructure, etc) and another for components and add-ons.

We’ve been doing it for years in large IT organizations, where different systems have different release cycles; the trick is to split up existing systems, so you can recognize and make low-risk changes easier.

This isn’t something I dreamed up; both Zappos and Etsy have to pass PCI audits for financial services, while Zappos is part of Amazon and publicly traded. Both of these organizations have a sophisticated test-deploy process for parts of the application that touch money, and a simpler process for lower-risk changes.

So split off the system into different components that can be tested in isolation. Review the changes (perhaps down to the code level) to consider the impact of the change, and test the appropriate amount.

This can free up developers to make many tiny changes per day as long as those changes are low risk. Bigger changes along a theme can be batched together to save testing time — and might mean we can deploy with still considerably less testing than a ‘full’ site retest.

But How Do We Test It?

A few years ago, the ideal vision of getting away from manual, documented test cases was a single ‘test it’ button combined with a thumbs up or down at the end of an “automated test run.”

If the risk is different for each release, and we are uncomfortable with our automation, then we actually want to run different tests for each release — exactly what thinking testers (indeed, anyone on the team) can do with exploratory testing.

So let the computers provide some automated checks, all the time. Each morning, maybe every half an hour, we get a report, look at the changes, and decide what is the right thing for this re-lease. That might mean full-time exploratory testing of major features for a day or two, it might be emailing the team and asking everyone to spend a half hour testing in production.

This result is grown up software testing, varying the test approach to balance risk with cost.

The first step that I talked about today is separating components and developing a strategy that changes the test effort based on which parts were changed. If the risk is minimal, then deploy it every day. Hey, deploy it every hour.

This formula is not magic. Companies that try it find engineering challenges. The first build/deploy system they write tends to become hard to maintain over time. Done wrong continuous testing creates systematic and organizational risk.

It’s also a hard sell. So let’s talk about ways to change the system to shrink the release-test cycle, deploy more often, and reduce risk. The small improvements we make will stand on their own, not threaten anyway — and allow us to stop at any time and declare victory!

A Component Strategy

that_badWhen a company like says that new programmers commit and push code to production the first day, do they really mean modifications to payment processing, search, or display for all products?

Of course not.

Instead, programmers follow a well-written set of directions to … wait for it … add the new user to the static HTML ‘about us’ page that lists all the employees, along with an image. If this change generates a bug, that will probably result in an X over an image the new hire forgot to upload, or maybe, at worst, break a div tag so the page mis-renders.

A bad commit on day one looks like this – not a bungled financial transaction in production.

How much testing should we have for that? Should we retest the whole site?

Let’s say we design the push to production so the ‘push’ only copies HTML and image files to the webserver. The server is never ‘down’, and serves complete pages. After the switch, the new page appears. Do we really need to give it the full monty, the week-long burn down of all that is good and right in testing? Couldn’t the developer try it on a local machine, push to stag-ing, try again, and “just push it?”

Questions on how?

More to come.

By Matthew Heusser – for Sauce Labs

Stay tuned next week for the third part of this mini series! You can follow Matt on Twitter at @mheusser.

Have an idea for a blog post, webinar, or more? We want to hear from you! Submit topic ideas (or questions!) here.

Re-Blog: JavaScript Multi Module Project – Continuous Integration

June 11th, 2014 by Amber Kaplan

lubos-krnacOur friend Lubos Krnac describes how to integrate Sauce with Protractor in a quest to implement continuous integration in his JavaScript multi module project with Grunt.

Below is a quote from his most recent blog post along side some code.

Read the rest of his post to get the full how-to here.

An important part of this setup is Protractor integration with Sauce Labs. Sauce Labs provides a Selenium server with WebDiver API for testing. Protractor uses Sauce Labs by default when you specify their credentials. Credentials are the only special configuration in test/protractor/protractorConf.js (bottom of the snippet). The other configuration was taken from the grunt-protractor-coverage example. I am using this Grunt plug-in for running Protractor tests and measuring code coverage.

// A reference configuration file.
exports.config = {
  // ----- What tests to run -----
  // Spec patterns are relative to the location of this config.
  specs: [
  // ----- Capabilities to be passed to the webdriver instance ----
  // For a full list of available capabilities, see
  // and
  capabilities: {
    'browserName': 'chrome'
    //  'browserName': 'firefox'
    //  'browserName': 'phantomjs'
  params: {
  // ----- More information for your tests ----
  // A base URL for your application under test. Calls to protractor.get()
  // with relative paths will be prepended with this.
  baseUrl: 'http://localhost:3000/',
  // Options to be passed to Jasmine-node.
  jasmineNodeOpts: {
    showColors: true, // Use colors in the command line report.
    isVerbose: true, // List all tests in the console
    includeStackTrace: true,
    defaultTimeoutInterval: 90000
  sauceUser: process.env.SAUCE_USERNAME,
  sauceKey: process.env.SAUCE_ACCESS_KEY

You may ask “how can I use localhost in the configuration, when a remote selenium server is used for testing?” Good question. Sauce Labs provides a very useful feature called Sauce Connect. It is a tunnel that emulates access to your machine from a Selenium server. This is super useful when you need to bypass company firewall. It will be used later in main project CI configuration.

Have an idea for a blog post, webinar, or more? We want to hear from you! Submit topic ideas (or questions!) here.

Bleacher Report’s Continuous Integration & Delivery Methodology: Creating an Integration Testing Server

June 10th, 2014 by Amber Kaplan

Bleacher-report-logoThis is the second of a three part series highlighting Bleacher Report’s continuous integration and delivery methodology by Felix Rodriguez.  Read the first post here.

Last week we discussed how to continuously deliver the latest version of your application to a staging server using Elastic Beanstalk. This week we will be discussing how Bleacher Report continuously runs integration tests immediately after the new version of our app has been deployed.

When our deploy is complete, we use a gem called Slackr to post a message in our #deploys chat room. This is simple enough and just about any chat software can do this. We chose to use Slack because of the built-in integration functionality.

We created an outgoing webhook that submits any posts to our #deploys channel as a post to our Cukebot server. The Chukebot server searches the text, checks for a “completed a deploy” message, then parses the message as a Json object that includes the deploy_id, user, repo, environment, branch, and Github hash.

class Parser
  ## Sample Input:
  # OGUXYCDI: Dan has completed a deploy of nikse/master-15551-the-web-frontpage-redux to stag_br5. Github Hash is 96dd307. Took 5 mins and 25 secs
  def self.slack(params)
    text = (params["text"])
    params["deploy_id"] = text.match(/^(.*):/)[1]
    params["branch"] = text.match(/of\s(.*)\sto/)[1]
    params["repo"] = text.match(/to.*_(.*?)\d\./)[1]
    params["cluster"] = text.match(/to(.*?)_.*\d\./)[1]
    params["env"] = text.match(/to\s.*_.*?(\d)\./)[1]
    params["suite"] = set_suite(params["repo"]) 
    params["hash"] = text.match(/is\s(.*?)\./)[1]
    puts params.inspect
    return params

Once parsed, we have all the information we need to submit and initiate a test suite run. A test suite and its contained tests are then recorded into our postgresql database.

Here is an example of what this suite would look like:

  id: 113,
  suite: "sanity",
  deploy_id: "FJBETJTY",
  status: "running",
  branch: "master",
  repo: "br",
  env: "4",
  all_passed: null,
  cluster: " stag",
  failure_log: null,
  last_hash: "0de4790"

Each test for that suite is stored in relation to the suite like so:

  id: 1151,
  name: "Live Blog - Has no 500s",
  url: "",
  session_id: "20b9a64d66ad4f00b21bcab574783d73",
  passed: true,
  suite_id: 113
  id: 1152,
  name: "Writer HQ - All Article Types Shown",
  url: "",
  session_id: "4edbe941fdd8461ab6d6332ab8618208",
  passed: true,
  suite_id: 113

This allows us to keep a record over time of every single test that was run and to which suite and deploy it belongs. We can get as granular as the exact code change using the Github hash and later include screenshots of the run. We also have a couple of different endpoints we can check for failed tests in a suite only, tests that have passed only, or the last test suite to run on an environment. We wanted to record everything in order to analyze our test data and create even more integrations.

This helps us automatically listen for those completed deploy messages we talked about earlier, as well as to have a way of tracking those tests runs later. After every test suite run we then post the permalink of the suite back into our #cukes chat room so that we have visibility across the company.

Another added benefit is that it allowed us to build a front end for non tech savvy people to initiate a test suite run on any environment.

Check it out for yourself; we just open sourced it.

Stay tuned next week for part two of this mini series! You can follow Felix on Twitter at .

Have an idea for a blog post, webinar, or more? We want to hear from you! Submit topic ideas (or questions!) here.

Guest Post: A Dialectical Theory of Software Quality, or Why You Need Independent QA

June 9th, 2014 by Amber Kaplan

QAProduct quality, and in particular software quality, can be an ephemeral characteristic of the product. It may not be easy to define, but in a sense, it is the opposite of the definition of pornography. You may not recognize it when it’s there, but you know it when it’s not. I propose that anything in a software product, or for that matter any other product, that induces unnecessary aggravation in the user is a detraction from the quality of the product.

For those unfamiliar with the term “dialectical” or its noun form, “dialectics”, these terms can be very roughly defined as an approach to looking at things that sees them as dualities. For example, the concept of “night” is more meaningful when coupled with the concept of “day.” “Good” has more meaning when paired with the concept of “evil”. Creative and constructive processes can be thought of as dialectical, where there is a tension between opposing imperatives and the result of such processes can be thought of as the resolution of these tensions.

As applied to the discipline of software engineering, one dialectic that exists is that between the imperatives of developers and architects and those of users. In the development process, the imperatives of independent QA engineers are those of users and are theoretically opposite to those of developers. Developers are totally absorbed in the technical intricacies of getting from point A to point B. They work to some set of explicit or implicit product functionality items that make up a product requirements set. Their concern is in how to implement these requirements as easily as possible. They work from the inside out, and are intimate with the details of how the functionality requirements are implemented. Independent QA, on the other hand, works from the same set of defined or implicit functionality and requirements but, in theory, does not care about the details of the implementation. QA engineers are intimately concerned with all aspects of how to use the product. By exercising the product, they find the points of aggravation to which the developers may be completely oblivious. To the extent that their findings are heeded, the quality, defined as, among other things, the lack of aggravation, can be enhanced.

In a sense, any piece of software that is run by someone other than the person who wrote it is being tested. The question is not whether the software will be tested, but by whom, how thoroughly, and under what circumstances. Any shortcuts, data formats, dependencies, and so many other elements that a developer used to get their code to run that are not present outside of their development environment may cause a problem when someone else runs that code.

There are many types of software testing. One fundamental division of testing is that between so-called white box testing and so-called black-box. White-box testing is testing carried out with knowledge of the internals of the software. Black-box testing emphasizes the exercise of the software’s functionality without regard to how it is implemented. Complete testing should include both types of tests. The emphasis in the text that follows is on black-box testing and the user experience, where the dialectical view of QA has the most relevance.

Bugs and other manifestations of poor quality cost money. There is a classical analysis that basically says that the cost of fixing a bug increases geometrically the later on in the development cycle it is found. Having your customer base be your principle test bed can prove to be expensive. Another possible source of expense is the support for workarounds for bugs that are not fixed. I can give a personal example of this. Some time ago I purchased an inexpensive hardware peripheral which came with a configuration software package. This package had a bug that, on the surface, is very minor, but when I used it I had problems configuring the product correctly. It took two calls to their support team to resolve the problem. Given the low price of this peripheral, one may wonder if their profit from the sale of this unit was wiped out. If many people call with the same question, how does this affect their earnings? How much does a product that is difficult to use, buggy, or otherwise of poor quality increase the cost of selling the product? Repeat sales cost less to generate then new sales and to the extent that poor quality impacts repeat sales, the cost of sales is driven up.

The scope of independent QA need not be limited to bug hunting. Test-driven development can be done at both the highest level and the unit level. QA can make an important contribution in the earliest phases of product specification by writing scenario documents in response to a simple features list before any detailed design is done. For example, in response to a single feature item such as “login”, a creative QA engineer may specify tests such as “attempt login specifying an invalid user name, attempt login specifying an incorrect password, begin login and then cancel, attempt login while login is in process, attempt multiple login multiple times specifying invalid passwords”, and on and on. Another engineer, seeing this list of tests, may well think of other tests to try. The developer writing the login functionality can see from the list what cases they need to account for early on in their coding. When something is available to test, the QA engineer executes the tests specified in the scenarios document. Those scenarios that turn out to be irrelevant because of the way the login functionality is implemented can be dropped. Other tests and scenarios that the tester thinks of or encounters in testing can be added. Ambiguities encountered in this testing can be brought to the attention of development for resolution early on.

As more and more software is Web-based, runs in Web browsers and is available to more non-technical users, usability issues become more important. How often have you visited Web sites and been unable or have had great difficulty in doing what you wanted? There are all too many feature-rich Web sites based on some usage model known only to the designer. The simplest of actions such as logout may become difficult simply because the hyperlink for it is in some obscure spot in a tiny font. A vigilant QA engineer given the task of testing this Web page may well notice this user inconvenience and report it. A common user scenario such as placing an order and then cancelling it may leave the user unsure about whether or not the order has actually been cancelled. The developer may not have thought of this scenario at all, or if they did, thought only in terms of a transaction that either went to completion or was rolled back. A consideration that is trivial to the developer, however, may cause grave consternation to the end user. A transaction that did not complete for some catastrophic reason such as a connection being dropped unexpectedly could well leave the end-user wondering about the state of their order. The independent QA engineer may identify a need for a customer to be able to log back into the site and view their pending orders.

Current trends in software development such as Agile, as well as the move to continuous integration and deployment, do not negate the need for an independent QA function. Indeed, continually making changes to an application’s UI, functionality, or operating assumptions may prove unnerving to users. Assumptions of convenience, such as the idea that the user community will understand how to work with a new UI design because they are already familiar with some arbitrary user model supporting it, can easily creep in under an environment of constant change carried out by people who do not question these assumptions. Independent QA is still needed to define and execute user scenarios made possible by product change as well as old scenarios whose execution steps may be made different by UI changes. Automated unit testing, programmatic API testing, and automated UI tests created by development-oriented engineers cannot simulate the dilemmas of a user who is new to the product or is confused by arbitrary UI changes. A highly visible example of this is the failure of Windows 8 to gain widespread acceptance and the huge market for third-party software to bring back the Start menu familiar to experienced Windows users. Nor was the smartphone-style UI, based on a platform with more inherentlimitations than the existing Windows desktop, a big hit with them.

The work of independent QA engineers can, among other things, serve as an “entry point” for tests that may later be added to an automated test suite. A set of steps, initially executed by an actual human doing ad-hoc or exploratory testing, that cause an operation to fail inelegantly, can lead to a test program or script that should be added to the suite that is executed in a continuous integration cycle.

None of these considerations invalidate the value of testing based on knowledge of the internals of a product. Unit testing, white box testing, and anything else that one can think of to exercise the application may uncover bugs or usage issues. White-box testing may quickly uncover change- introduced bugs that black-box testing might only find with a great deal of time and effort, or not at all. In this context, automated tests automatically kicked off as part of a continuous integration cycle are an extension of an existing white box regression test suite but not a replacement for actual hands-on, exploratory, black-box QA. You might say that white-box testing is the dialectical negation of black-box QA. It verifies that the individual pieces work, where independent, black-box QA verifies that the product works for the user. The two approaches to testing complement each other. Both are necessary for a more complete assessment of product quality.

By Paul Karsh for Sauce Labs

Have an idea for a blog post, webinar, or more? We want to hear from you! Submit topic ideas (or questions!) here.

Guest Post: Test Lessons at ExpoQA

June 6th, 2014 by Amber Kaplan

This is the first of a three part series by Matthew Heusser, software delivery consultant and writer. 

Every now and again and opportunity comes along that you just can’t refuse. Mine was to teach the one-day version of my class, lean software testing, in Madrid, Spain, then again the following week in Estonia. Instead of coming back to the United States, I’ll be staying in Europe, with a few days in Scotland and a TestRetreat in the Netherlands.

And a lot of time on airplanes.

The folks at Sauce Labs thought I might like to take notes and type a little on the plane, to share my stories with you.

The first major hit in Madrid is the culture shock; this was my first conference where English was not the primary language. The sessions were split between English and Spanish, with translators in a booth making sure all talks were available in all languages.

The Testing Divide

Right now, in testing, I am interested in two major categories: The day to day work of testing new features and also the work of release-testing after code complete. I call this release testing a ‘cadence’, and, across the board, I see companies trying to compress the cadence.

My second major surprise in Madrid is how wide the gap is —and I believe it is getting wider —between legacy teams that have not modernized and teams starting from scratch today. One tester reported a four-month cycle for testing. Another team, relying heavily on Cucumber and Selenium, were able to release every day.

Of course, things weren’t that simple. The Lithuanian team used a variety of techniques I can talk about in another post to reduce risk, something like devOps, which I can talk about in another post. The point here is the divide between the two worlds.

Large cadences slow down delivery. They slow it down a lot; think of the difference between machine farming in the early 20th century and the plow and horse of the 19th.

In farming, the Amish managed to survive by maintaining a simple life, with no cars, car insurance, gasoline, or even electricity to pay for. In software, organizations that have a decades-long head start: banks, insurance companies, and pension funds, may be able to survive without modernization.

I just can’t imagine it will be much fun.

Batches, Queues and Throughput

Like many other conferences, the first day of ExpoQA is tutorial day, and I taught the one-day version of my course on lean software testing. I expected to learn a little about course delivery, but not a lot —so the learning hit me like a ton a bricks.

The course covers the seven wastes of ‘lean’, along with methods to improve the flow of the team – for example, decreasing the size of the measured work, or ‘batch size’. Agile software development gets us this for free, moving from ‘projects’ to sprints, and within sprints, stories.

In the early afternoon we use dice and cards to simulate a software team that has equally weighted capacity between analysis, dev, test and operations —but high variability in work size. This slows down delivery. The fix is to reduce the variation, but it is not part of the project, so what the teams tend to do is to build up queues of work, so any role never runs out of work.

What this actually does is run up the work in progress inventory – the amount of work sitting around, waiting to be done. In the simulation I don’t penalize teams for this, but on real software projects, ‘holding’ work created multitasking, handoffs, and restarts, all of which slow down delivery.

My lesson: Things that are invisible look free —and my simulation is far from perfect.

After my tutorial it is time for a conference day – kicked off by Dr. Stuart Reid, presenting on the new ISO standard for software testing. Looking at the schedule, I see a familiar name; Mais Tawfik, who I met at WOPR20.Mais is an independent performance consultant; today she is presenting on “shades of performance testing.”

Performance Test Types

Starting with the idea that performance testing has three main measurements: Speed, Scalability, and Stability, Mais explains that there are different types of performance tests, from front-end performance (javascript, waterfalls of HTTP requests, page loading and rendering) to back-end (database, webserver), and also synthetic monitoring – creating known-value transactions continuously in production to see how long they take. She also talks about application usage patterns – how testing is tailored to the type of user, and how each new release might have new and different risks based on changes introduced. That means you might tailor the performance testing to the release.

At the end of her talk, Mais lists several scenarios and asks the audience what type of performance test would blend efficiency and effectiveness. For example, if a release is entirely database changes, and time is constrained, you might not execute your full performance testing suite/scripts, but instead focus on rerunning and timing the database performance. If the focus on changes is the front end, you might focus on how long it takes the user interface to load and display.

When Mais asks if people in the organization do performance testing or manage it, only a handful of people raise their hands. When she asks who has heard of FireBug, even less raise their hand.

Which makes me wonder if the audience is only doing functional testing. If they are, who does the performance testing? And do they not automate, or do they all use Internet Explorer?

The talk is translated; it is possible that more people know these tools, it was just that the translator was ‘behind’ and they did not know to raise their hands in time.

Here’s hoping!

Time For A Panel

At the end of the day I am invited to sit on a panel to discuss the present (and future) of testing, with Dr. Reid, Dorothy Graham, Derk-Jan De Grood, Celestina Bianco and Delores Ornia. The questions include, in no particular order:

  •             Will testers have to learn to code?
  •             How do we convince management of the important of QA and get included in projects?
  •             What is the future of testing? Will testers be out of a job?
  •             What can we do about the dearth of testing education in the world today?

For the problem with the lack of education, Dorothy Graham points to Dr. Reid and his standards effort as a possible input for university education.

When it is my turn, I bring up ISTQB The International Software Testing Qualifications Board. – if ISTQB is so successful (“300,000 testers can’t be wrong?”) then why is the last question relevant? Stefaan Luckermans, the moderator, replied that with 2.9 Million testers in the world, the certification had only reached 10%, and that’s fair, I suppose. Still, I’m not excited about the quality of testers that ISTQB turns out.

The thing I did not get to say, because of time, that I want to do is point out that ISTQB is, after all, just a response to a market demand for a 2-3 day training certification. What can a trainer really do in 2-3 days? At most, maybe, teach a single technical tool, turn the lightbulb of thinking on, or define a few terms. ISTQB defines a few terms, and it takes a few days.

The pursuit of excellent testing?

That’s the game of a lifetime.

By Matthew Heusser – for Sauce Labs

Stay tuned next week for part two of this mini series! You can follow Matt on Twitter at @mheusser.

Have an idea for a blog post, webinar, or more? We want to hear from you! Submit topic ideas (or questions!) here.

Bleacher Report’s Continuous Integration & Delivery Methodology: Continuous Delivery Through Elastic Beanstalk

June 3rd, 2014 by Amber Kaplan

elastic_beanstalkThis is the first of a three part series highlighting Bleacher Report’s continuous integration and delivery methodology by Felix Rodriguez. 

I have been tinkering with computers since I was a kid and I can remember playing Liero on D.O.S. like it was the greatest game ever to exist. I started out building computers and websites, then got into tech support, and now I am a Quality Assurance technician at Bleacher Report – when I’m not cruising around California on my motorcycle, that is.

While working at Bleacher Report,  I helped maintain their existing automation suite. I took it upon myself to revamp the collection of long unrelated rspec tests into a more OOP Cucumber-based testing framework. Now we have a integration testing server that I built with an API to build suites and track tests over time.

We are starting to move some of our new services over to Elastic Beanstalk because we knew it would be easier for us to manage our stacks and issue deploys. Being a rather new service, we were unable to find any integrations with Travis CI out of the box. After experimenting with some of the custom functionality this tool provides, we were able to issue commands to download the binaries to the VM that Travis spins up and create the files we need in order to issue an Elastic Beanstalk deploy command. This was far simpler and less time consuming than trying to install our deployment software on a Travis VM.

After demoing this to our Operations department, they were more than eager to have us switch new applications to Elastic Beanstalk as developers have way more control over how the development environment is configured (think Heroku or Nodejitsu). On my own I was able to build an application and the environment it was contained in, as well as ensure the latest version was being continuously deployed, after a successful travis build, to a staging server, it was able to kick off an integration suite, and return results of each step of the process. This was magic to us; it freed up a lot more time for Operations to focus on making sure our applications scale, allowed QA to focus on writing tests – not running them, and developers to focus on coding their application without having to adhere to the limitations of their environment with old tool sets.

If you’re using Amazon’s Elastic Beanstalk service, or plan on building any new applications, I highly suggest this route to make your life much easier. If not I would skip to “The Hard Way” which allows you to use EB indirectly to update your apps.

The Easy Way

TravisCI unfortunately does not support Elastic Beanstalk out of the box, but using a clever hack you can automate the EB configuration and deploy cycle through a .travis.yml config. You should have keep track of each answer the EB init prompt asks you so we can preseed the responses in the “echo -e” command.

I got most of my inspiration from but I was unable to get it working completely, so I had to try something else.

- wget ""
- unzip ""
- AWS-ElasticBeanstalk-CLI-2.6.2/AWSDevTools/Linux/
- mkdir .elasticbeanstalk
- sudo echo "[global]" >> .elasticbeanstalk/config
- sudo echo "AwsCredentialFile=/Users/travis/.elasticbeanstalk/aws_credential_file"
  >> .elasticbeanstalk/config
- sudo echo "ApplicationName=cukebot" >> .elasticbeanstalk/config
- sudo echo "" >> .elasticbeanstalk/config
- sudo echo "EnvironmentName=YOUR_STAGING_ENVIRONMENT_NAME" >> .elasticbeanstalk/config
- sudo echo "Region=us-east-1" >> .elasticbeanstalk/config
- cat .elasticbeanstalk/confi
- cat ~/.elasticbeanstalk/aws_credential_file
- echo "us-east-1" | git aws.config
- echo -e "$AWS_ACCESS_KEY_ID\n$AWS_SECRET_ACCESS_KEY\n1\n\n\n1\n53\n2\nN\n1\n" | AWS-ElasticBeanstalk-CLI-2.6.2/eb/linux/python2.7/eb init
- git aws.push

Now anytime you push code to master and your travis build succeeds you will automatically deploy your new code to the staging enviroment you created.

The Hard Way

TravisCI supports a number of deploy services out of the box, unfortunately for us, we do not use any of those services to deploy our apps. The way we had to approach continous delivery was through travis’s custom webhooks.

First we must build a small application that accepts posts from travis when a build completes. They provide a sample sinatra application to help you get started: we want to modify it a bit to add a json object we create to our amazon sqs queue.

puts "Received valid payload for repository #{repo_slug}" # "stag",
  :repo => repo,
  :branch => "master",
  :user_name => user,
  :env => "1"
queue.send if payload["branch"] == "master"

From there I added a deploy queue class that we can accept the information passed from the travis payload like so:

require 'aws-sdk'
require 'json'
class DeployQueue
  def initialize(options={})
    @queue_text = {
      :cluster => options[:cluster], # staging or production 
      :repo => options[:repo],
      :branch => options[:branch],
      :env => options[:env],
      :user_name => options[:user_name]
    @sqs =
    @q = @sqs.queues.named("INSERT_NAME_OF_QUEUE")
    puts "Deploy sent to queue: #{options[:repo]}_deploy_queue: #{@queue_text}"
  def send
    msg = @q.send_message(@queue_text.to_s)

Then we can add the following to your .tavis.yml

  webhooks: http://url/where/your/app/is/
  on_success: always
  on_failure: never

Amazon Elastic Beanstalk allows us to build a worker with an easy to use GUI interface that will run commands for each message in our AmazonSQS queue. I created a quick video demonstration for you to see how easy it is!

Basically, all we have to do now is wrap our deploy script inside of a small Sinatra web application.

Create a Procfile with the following:

worker: bundle exec ruby app/worker.rb

As well as an app/worker.rb file

require 'bundler/setup'
require 'aws-sdk'
require 'sinatra'
require_relative '../lib/deploy_consumer'

enable :logging, :dump_errors, :raise_errors

  :access_key_id => ENV['AWS_ACCESS_KEY_ID'],
  :secret_access_key => ENV['AWS_SECRET_KEY'])

post '/deploy' do
  json =
  puts "json #{json.inspect}"
  data = JSON.parse(json) ## Your Deploy CMD here

The DeployConsumer is not necessary; it’s a script that I made that just takes the Json object received from the queue and uses it to determine what environment it should deploy to.  This should be replaced with your own deploy script. If you are interested in what the consumer looks like, you can view it here:

Stay tuned next week for part two of this mini series! You can follow Felix on Twitter at .

Have an idea for a blog post, webinar, or more? We want to hear from you! Submit topic ideas (or questions!) here.

Guest Post: Open Sauce Enables Plone to Focus on Robot Framework

May 16th, 2014 by Amber Kaplan

Robot Framework Our friends in the Plone community recently took to Open Sauce for their testing needs to save time. The results have been stellar; with the time saved they’re able to focus on improving Robot Framework, according to their release manager Eric Steele.

Check out the rest of what they have to say below.

When I took over as release manager for the Plone CMS project, we ran our test suite nightly, but that only covered our Python code and some simple form submissions. The entire JavaScript layer remained largely untested, save a few click-arounds by hand before each release. The suspicion that some critical feature might have broken in a browser combination we hadn’t tried kept me up at night. As I began preaching the need for continuous integration and in-browser testing, it was surprising to find a whole team’s-worth of people excited to obsess over running tests, improving coverage, and collecting a fleet of VMs to run the few Selenium tests we’d put together at that point. The latter proved to be our undoing; we spent more time managing our testing infrastructure than we did doing actual testing.

Thankfully, Sauce Labs’ Open Sauce came along to save us.

Open Sauce has freed up my testing team to do far more interesting things. We’ve put quite a bit of effort into helping Robot Framework grow. Robot’s Behavior-Driven Development abstraction seems to fit everyones’ heads a bit better and allows us to easily alter tests based on which features are active. Asko Soukka, previously featured on this blog, became Plone’s Person of the Year for 2013 based on the work he put into extending Robot Framework for our community.

Asko has created a set of Robot keywords to enable automated screenshots for our Sphinx documentation. This allows our documentation to show the Plone user interface in the same language as the document. Groups deploying Plone sites can regenerate our end-user documentation with screenshots featuring their own design customizations. It’s a huge win; users see examples that look exactly like their own site. Finally, in a bit of pure mad science, Asko has piped those image generation scripts through a text-to-speech program to create fully-automated screencasts.

The Plone community is currently at work on the upcoming release of Plone 5. With its new widgets layer and responsive design, there are so many new ways that bugs could creep into our system. Happily, that’s not the case. I get a nightly report full of screenshots of Plone in action across browser, device, and screen size. Basic accessibility gotchas are quickly caught. Content editing and management features are automatically tested on both desktop and mobile. Open Sauce allows us to focus on getting things done and done correctly. Finally, I can sleep soundly — or at least find something else to worry over. -Eric Steele, Release Manager, Read Eric’s blog here or follow him on Twitter.

Do you have a topic you’d like to share with our community? We’d love to hear from you! Submit topics here, feel free to leave a comment, or tweet at us any time.

Ask a Selenium Expert: Selenium Grids, Scaling, and Parallelization

May 7th, 2014 by Amber Kaplan

selenium testing & sauceThis is part 5 of 8 in a mini series of follow-up Q&A’s from Selenium expert Dave Haeffner. Read up on the firstsecondthird, and fourth.

Dave discussed  how to build out a well factored, maintainable, resilient, and parallelized suite of tests that run locally, on a Continuous Integration system, and in the cloud in our recent webinar, “Selenium Bootcamp“.

Following the webinar, there were several follow-up questions. Dave’s agreed to respond to 8. Below you’ll find the fourth Q&A. Stay tuned next Wednesday for the next question.

5. ­Is Selenium Grid still relevant/useful for parallelization? ­

Selenium Grid is a great option for scaling your test infrastructure if you’re okay with handling the overhead of spinning up/maintaining a bunch of machines. It, by itself, will not give you parallelization. That is to say, it can handle however many connections you throw at it, but you will still need to find a way to execute your tests in parallel. You can learn more about Selenium Grid on it’s project main page.

-Dave Haeffner, April 9, 2014

Can’t wait to see the rest of the Q&A? Read the whole post here.  Get more info on Selenium with Dave’s book, The Selenium Guidebook, or follow him on Twitter or Github.

Have an idea for a blog post, webinar, or more? We want to hear from you! Submit topic ideas (or questions!) here.