Re-Blog: Add Some Sauce To Your IE Tests

September 4th, 2014 by Amber Kaplan

Sauce Labs hearts ThoughtWorks! And apparently the feeling’s mutual. Check out this great blog post mentioning Sauce Labs by Tom Clement Oketch.

See an excerpt below:


Appium Bootcamp – Chapter 6: Run Your Tests

August 7th, 2014 by Amber Kaplan

appium_logoThis is the sixth post in a series called Appium Bootcamp by noted Selenium expert Dave HaeffnerRead:  Chapter 1 Chapter 2 | Chapter 3 | Chapter 4 | Chapter 5 | Chapter 6 | Chapter 7 | Chapter 8

Dave recently immersed himself in the open source Appium project and collaborated with leading Appium contributor Matthew Edwards to bring us this material. Appium Bootcamp is for those who are brand new to mobile test automation with Appium. No familiarity with Selenium is required, although it may be useful. This is the sixth of eight posts; two new posts will be released each week.

Now that we have our tests written, refactored, and running locally it’s time to make them simple to launch by wrapping them with a command-line executor. After that, we’ll be able to easily add in the ability to run them in the cloud.

Quick Setup

appium_lib comes pre-wired with the ability to run our tests in Sauce Labs, but we’re still going to need two additional libraries to accomplish everything; rake for command-line execution, and sauce_whisk for some additional tasks not covered by appium_lib.

Let’s add these to our Gemfile and run bundle install.

# filename: Gemfile

source ''

gem 'rspec', '~> 3.0.0'
gem 'appium_lib', '~> 4.0.0'
gem 'appium_console', '~> 1.0.1'
gem 'rake', '~> 10.3.2'
gem 'sauce_whisk', '~> 0.0.13'

Simple Rake Tasks

Now that we have our requisite libraries let’s create a new file in the project root called Rakefile and add tasks to launch our tests.

# filename: Rakefile

desc 'Run iOS tests'
task :ios do
  Dir.chdir 'ios'
  exec 'rspec'

desc 'Run Android tests'
task :android do
  Dir.chdir 'android'
  exec 'rspec'

Notice that the syntax in this file reads a lot like Ruby — that’s because it is (along with some Rake specific syntax). For a primer on Rake, read this.

In this file we’ve created two tasks. One to run our iOS tests, and another for the Android tests. Each task changes directories into the correct device folder (e.g., Dir.chdir) and then launches the tests (e.g., exec 'rspec').

If we save this file and run rake -T from the command-line, we will see these tasks listed along with their descriptions.

> rake -T
rake android  # Run Android tests
rake ios      # Run iOS tests

If we run either of these tasks (e.g., rake android or rake ios), they will execute the tests locally for each of the devices.

Running Your Tests In Sauce

As I mentioned before, appium_lib comes with the ability to run Appium tests in Sauce Labs. We just need to specify a Sauce account username and access key. To obtain an access key, you first need to have an account (if you don’t have one you can create a free trial one here). After that, log into the account and go to the bottom left of your dashboard; your access key will be listed there.

We’ll also need to make our apps available to Sauce. This can be accomplished by either uploading the app to Sauce, or, making the app available from a publicly available URL. The prior approach is easy enough to accomplish with the help of sauce_whisk.

Let’s go ahead and update our spec_helper.rb to add in this new upload capability (along with a couple of other bits).

# filename: common/spec_helper.rb

require 'rspec'
require 'appium_lib'
require 'sauce_whisk'

def using_sauce
  user && !user.empty? && key && !key.empty?

def upload_app
  storage =
  app = @caps[:caps][:app]
  storage.upload app

  @caps[:caps][:app] = "sauce-storage:#{File.basename(app)}"

def setup_driver
  return if $driver
  @caps = Appium.load_appium_txt file: File.join(Dir.pwd, 'appium.txt')
  if using_sauce
    @caps[:caps].delete :avd # re:
  end @caps

def promote_methods
  Appium.promote_singleton_appium_methods Pages
  Appium.promote_appium_methods RSpec::Core::ExampleGroup


RSpec.configure do |config|

  config.before(:each) do

  config.after(:each) do


Near the top of the file we pull in sauce_whisk. We then add in a couple of helper methods (using_sauce and upload_app). using_sauce checks to see if Sauce credentials have been set properly. upload_app uploads the application from local disk and then updates the capabilities to reference the path to the app on Sauce’s storage.

We put these to use in setup_driver by wrapping them in a conditional to see if we are using Sauce. If so, we upload the app. We’re also removing the avd capability since it will cause issues with our Sauce run if we keep it in.

Next we’ll need to update our appium.txt files so they’ll play nice with Sauce.


# filename: android/appium.txt

appium-version = "1.2.0"
deviceName = "Android"
platformName = "Android"
platformVersion = "4.3"
app = "../../../apps/api.apk"
avd = "training"

require = ["./spec/requires.rb"]
# filename: ios/appium.txt

appium-version = "1.2.0"
deviceName = "iPhone Simulator"
platformName = "ios"
platformVersion = "7.1"
app = "../../../apps/"

require = ["./spec/requires.rb"]

In order to work with Sauce we need to specify the appium-version and the platformVersion. Everything else stays the same. You can see a full list of Sauce’s supported platforms and configuration options here.

Now let’s update our Rake tasks to be cloud aware. That way we can specify at run time whether to run things locally or in Sauce.

desc 'Run iOS tests'
task :ios, :location do |t, args|
  location_helper args[:location]
  Dir.chdir 'ios'
  exec 'rspec'

desc 'Run Android tests'
task :android, :location do |t, args|
  location_helper args[:location]
  Dir.chdir 'android'
  exec 'rspec'

def location_helper(location)
  if location != 'sauce'

We’ve updated our Rake tasks so they can take an argument for the location. We then use this argument value and pass it to location_helper. The location_helper looks at the location value — if it is not set to 'sauce'then the Sauce credentials get set to nil. This helps us ensure that we really do want to run our tests on Sauce (e.g., we have to specify both the Sauce credentials AND the location).

Now we can launch our tests locally just like before (e.g., rake ios) or in Sauce by specifying it as a location (e.g., rake ios['sauce'])

But in order for the tests to fire in Sauce Labs, we need to specify our credentials somehow. We’ve opted to keep them out of our Rakefile (and our test code) so that we can maintain future flexibility by not having them hard-coded; which is also more secure since we won’t be committing them to our repository.

Specifying Sauce Credentials

There are a few ways we can go about specifying our credentials.

Specify them at run-time

SAUCE_USERNAME=your-username SAUCE_ACCESS_KEY=your-access-key rake ios['sauce']

Export the values into the current command-line session

export SAUCE_USERNAME=your-username
export SAUCE_ACCESS_KEY=your-access-key

Set the values in your bash profile (recommended)

# filename: ~/*.bash_profile

export SAUCE_USERNAME=your-username
export SAUCE_ACCESS_KEY=your-access-key

After choosing a method for specifying your credentials, run your tests with one of the Rake task and specify 'sauce' for the location. Then log into your Sauce Account to see the test results and a video of the execution.

Making Your Sauce Runs Descriptive

It’s great that our tests are now running in Sauce. But it’s tough to sift through the test results since the name and test status are nondescript and all the same. Let’s fix that.

Fortunately, we can dynamically set the Sauce Labs job name and test status in our test code. We just need to provide this information before and after our test runs. To do that we’ll need to update the RSpec configuration incommon/spec_helper.rb.


# filename: common/spec_helper.rb

RSpec.configure do |config|

  config.before(:each) do |example|
    $driver.caps[:name] = example.metadata[:full_description] if using_sauce

  config.after(:each) do |example|
    if using_sauce
      SauceWhisk::Jobs.change_status $driver.driver.session_id, example.exception.nil?


In before(:each) we update the name attribute of our capabilities (e.g., caps[:name]) with the name of the test. We get this name by tapping into the test’s metadata (e.g., example.metadata[:full_description]). And since we only want this to run if we’re using Sauce we wrap it in a conditional.

In after(:each) we leverage sauce_whisk to set the job status based on the test result, which we get by checking to see if any exceptions were raised. Again, we only want this to run if we’re using Sauce, so we wrap it in a conditional too.

Now if we run our tests in Sauce we will see them execute with the correct name and job status.


Now that we have local and cloud execution covered, it’s time to automate our test runs by plugging them into a Continuous Integration (CI) server.

Read:  Chapter 1 Chapter 2 | Chapter 3 | Chapter 4 | Chapter 5 | Chapter 6 | Chapter 7 | Chapter 8

About Dave Haeffner: Dave is a recent Appium convert and the author of Elemental Selenium (a free, once weekly Selenium tip newsletter that is read by thousands of testing professionals) as well as The Selenium Guidebook (a step-by-step guide on how to use Selenium Successfully). He is also the creator and maintainer of ChemistryKit (an open-source Selenium framework). He has helped numerous companies successfully implement automated acceptance testing; including The Motley Fool, ManTech International, Sittercity, and Animoto. He is a founder and co-organizer of the Selenium Hangout and has spoken at numerous conferences and meetups about acceptance testing.

Follow Dave on Twitter – @tourdedave

[Re-Blog] Dev Chat: Vlad Filippov of Mozilla

July 28th, 2014 by Amber Kaplan

Last week Sauce Labs’ Chris Wren took a moment to chat with Vlad Filippov of Mozilla on his blog. Topics covered all things open source and front-end web development, so we thought we’d share. Click the image below to read the full interview, or just click here.

Dev Chat: Vlad Filippov of Mozilla


How Leverages Appium for Mobile Test Automation

July 1st, 2014 by Amber Kaplan

We love this blog post written by Quentin Thomas at HotelTonight! In it, he explains how they use Appium to automate their mobile tests. He also walks readers through specifics, such as the RSpec config helper. Read a snippet below.

Thanks to the engineers at Sauce Labs, it is now possible to tackle the mobile automation world with precision and consistency.

Appium, one of the newest automation frameworks introduced to the open source community, has become a valuable test tool for us at HotelTonight. The reason we chose this tool boils down to Appium’s philosophy.

“Appium is built on the idea that testing native apps shouldn’t require including an SDK or recompiling your app. And that you should be able to use your preferred test practices, frameworks, and tools”.

-Quentin Thomas, HotelTonight, June 17, 2014

To read the full post with code, click here. You can follow Quentin on Twitter at @TheQuengineer.

Have an idea for a blog post, webinar, or more? We want to hear from you! Submit topic ideas (or questions!) here.

Bleacher Report’s Continuous Integration & Delivery Methodology: Test Analytics

June 24th, 2014 by Amber Kaplan

This is the final post in a three part series highlighting Bleacher Report’s continuous integration and delivery methodology by Felix Rodriguez.  Read the first post here and the second here.

Last week we discussed setting up an integration testing server that allows us to post, which then kicks off a suite of tests. Now that we are storing all of our suite runs and individual tests in a postgres database, we can do some interesting things – like track trends over time. At Bleacher Report we like to use a tool named Librato to store our metrics, create sweet graphs, and display pretty dashboards. One of the metrics that we record on every test run is our PageSpeed Insights score.

PageSpeed Insights

PageSpeed insights is a tool provided by Google developers that analyzes your web or mobile page and gives you an overall rating. You can use the website to get a score manually, but instead we hooked into their api in order to submit our page visit score to Liberato. Each staging environment is recorded separately so that if any of them return measurements that are off, we can attribute this to a server issue.

average page speeds

Any server that shows an extremely high rating is probably only loading a 500 error page. A server that shows an extremely low rating is probably some new, untested JS/CSS code we are running on that server.

Below is an example of how we submit a metric using Cukebot:


require_relative 'lib/pagespeed'
Given(/^I navigate to "(.*?)"$/) do |path|
  visit path
  pagespeed =
  ps = pagespeed.get_results
  score = ps["score"]
  puts "Page Speed Score is: #{score}"
  metric = host.gsub(/http\:\/\//i,"").gsub(/\.com\//,"") + "_speed"
    puts "Could not send metric"


require 'net/https'
require 'json'
require 'uri'
require 'librato/metrics'

class PageSpeed
  def initialize(domain,strategy='desktop',key=ENV['PAGESPEED_API_TOKEN'])
    @domain = domain
    @strategy = strategy
    @key = key
    @url = "" + \
      URI.encode(@domain) + \

  def get_results
    uri = URI.parse(@url)
    http =, uri.port)
    http.use_ssl = true
    http.verify_mode = OpenSSL::SSL::VERIFY_NONE
    request =
    response = http.request(request)

  def submit(name, value)
    Librato::Metrics.authenticate "", ENV['LIBRATO_TOKEN']
    Librato::Metrics.submit name.to_sym  => {:type => :gauge, :value => value, :source => 'cukebot'}


Google’s PageSpeed Insights return relatively fast, but as you start recording more metrics on each visit command to get results on both desktop and mobile, we suggest building a separate service that will run a desired performance test as a post – or at least in its own thread. This will stop the test from continuing its run or causing a test that runs long. Which brings us to our next topic.

Tracking Run Time

With Sauce Labs, you are able to quickly spot a test that takes a long time to run. But when you’re running hundreds of tests in parallel, all the time, it’s hard to keep track of the ones that normally take a long time to run versus the ones that have only recently started to take an abnormally long time to run. This is why our Cukebot service is so important to us.

Now that each test run is stored in our database, we grab the information Sauce stores for run time length and store it with the rest of the details from that test. We then submit that metric to Librato and track over time in an instrument. Once again, if all of our tests take substantially longer to run on a specific environment, we can use that data to investigate issues with that server.

To do this, we take advantage of Cucumber’s before/after hooks to grab the time it took for the test to run in Sauce (or track it ourselves) and submit to Librato. We use the on_exit hook to record the total time of the suite and submit that as well.

Test Pass/Fail Analytics

To see trends over time, we’d also like to measure our pass/fail percentage for each individual test on each separate staging environment as well as our entire suite pass/fail percentage. This would allow us to notify Ops about any servers that need to get “beefed up” if we run into a lot of timeout issues on that particular setup. This would also allow us to quickly make a decision about whether we should proceed with a deploy or not when there are failed tests that pass over 90% of the time and are currently failing.

The easiest way to achieve this is to use the Cucumber after-hook to query the postgres database for total passed test runs on the current environment in the last X amount of days, and divide that by the total test runs on the current environment in the same period to generate a percentage, store it, then track it over time to analyze trends.


Adding tools like these will allow you to look at a dashboard after each build and give your team the confidence to know that your code is ready to be released to the wild.

Running integration tests continuously used to be our biggest challenge.  Now that we’ve finally arrived to the party, we’ve noticed that there are many other things we can automate. As our company strives for better product quality, this pushes our team’s standards with regard to what we choose to ship.

One tool we have been experimenting with and would like to add to our arsenal of automation is So far we have seen great things from them and have caught a lot of traffic-related issues we would have missed otherwise.

Most of what I’ve talked about in this series has been done, but some is right around the corner from completion. If you believe we can enhance this process in anyway, I would greatly appreciate any constructive criticism via my twitter handle @feelobot. As Sauce says, “Automate all the Things!”

Have an idea for a blog post, webinar, or more? We want to hear from you! Submit topic ideas (or questions!) here.

Guest Post: Bridging the Test Divide – Beyond Testing For A Release

June 16th, 2014 by Amber Kaplan

This is the second of a three part series by Matthew Heusser, software delivery consultant and writer. 

When I start to think about testing, I think about it in two broad strokes: new feature testing and release-testing. New feature testing tries to find problems with something new and specific, while release-testing happens after “code complete”, to make sure the whole system works together, that a change here didn’t break something there.

Release-testing (which some call regression testing) slows down the pace of release and delays feedback from our customer. Release-testing also increases cycle time – the time from when we begin work on a feature until it hits production. Over time, as our software become more complex, the amount of testing we want to do during release testing goes up.

Meanwhile, teams want to ship more often, to tighten the feedback loop.

Today I am going to talk about making release testing go away – or at least drastically reducing it.

It all starts during that tutorial in Spain I wrote about last time.

Two Worlds

The frequency of release for the people in my tutorial was very diverse, but two groups really struck me — the telecom that had a four-month test-release cycle, and the Latvian software team with the capability to deploy to production every single day.

That means arriving at the office the morning, looking at automated test runs, and making a decision to deploy.

There is a ‘formula’ to make this possible. It sounds simple and easy:

  • Automate a large number of checks on every build
  • Automate deploy to production
  • Continuously monitor traffic and logs for errors
  • Build the capability to rollback on failure

That transforms the role of test from doing the “testing we always do” to looking at the risk for a given release, lining it up against several different test strategies, and balancing risk, opportunity, reward, and time invested in release-testing?

The trick is to stop looking at the software as a big box, but instead to see it as a set of components. The classic set of components are large pieces of infrastructure (the configuration of the web server, the connections to the database, search, login, payment) and the things that sit on top of that – product reviews, comments, static html pages, and so on. Develop at least two de-ploy strategies — one for audited and mission-critical systems (essential infrastructure, etc) and another for components and add-ons.

We’ve been doing it for years in large IT organizations, where different systems have different release cycles; the trick is to split up existing systems, so you can recognize and make low-risk changes easier.

This isn’t something I dreamed up; both Zappos and Etsy have to pass PCI audits for financial services, while Zappos is part of Amazon and publicly traded. Both of these organizations have a sophisticated test-deploy process for parts of the application that touch money, and a simpler process for lower-risk changes.

So split off the system into different components that can be tested in isolation. Review the changes (perhaps down to the code level) to consider the impact of the change, and test the appropriate amount.

This can free up developers to make many tiny changes per day as long as those changes are low risk. Bigger changes along a theme can be batched together to save testing time — and might mean we can deploy with still considerably less testing than a ‘full’ site retest.

But How Do We Test It?

A few years ago, the ideal vision of getting away from manual, documented test cases was a single ‘test it’ button combined with a thumbs up or down at the end of an “automated test run.”

If the risk is different for each release, and we are uncomfortable with our automation, then we actually want to run different tests for each release — exactly what thinking testers (indeed, anyone on the team) can do with exploratory testing.

So let the computers provide some automated checks, all the time. Each morning, maybe every half an hour, we get a report, look at the changes, and decide what is the right thing for this re-lease. That might mean full-time exploratory testing of major features for a day or two, it might be emailing the team and asking everyone to spend a half hour testing in production.

This result is grown up software testing, varying the test approach to balance risk with cost.

The first step that I talked about today is separating components and developing a strategy that changes the test effort based on which parts were changed. If the risk is minimal, then deploy it every day. Hey, deploy it every hour.

This formula is not magic. Companies that try it find engineering challenges. The first build/deploy system they write tends to become hard to maintain over time. Done wrong continuous testing creates systematic and organizational risk.

It’s also a hard sell. So let’s talk about ways to change the system to shrink the release-test cycle, deploy more often, and reduce risk. The small improvements we make will stand on their own, not threaten anyway — and allow us to stop at any time and declare victory!

A Component Strategy

that_badWhen a company like says that new programmers commit and push code to production the first day, do they really mean modifications to payment processing, search, or display for all products?

Of course not.

Instead, programmers follow a well-written set of directions to … wait for it … add the new user to the static HTML ‘about us’ page that lists all the employees, along with an image. If this change generates a bug, that will probably result in an X over an image the new hire forgot to upload, or maybe, at worst, break a div tag so the page mis-renders.

A bad commit on day one looks like this – not a bungled financial transaction in production.

How much testing should we have for that? Should we retest the whole site?

Let’s say we design the push to production so the ‘push’ only copies HTML and image files to the webserver. The server is never ‘down’, and serves complete pages. After the switch, the new page appears. Do we really need to give it the full monty, the week-long burn down of all that is good and right in testing? Couldn’t the developer try it on a local machine, push to stag-ing, try again, and “just push it?”

Questions on how?

More to come.

By Matthew Heusser – for Sauce Labs

Stay tuned next week for the third part of this mini series! You can follow Matt on Twitter at @mheusser.

Have an idea for a blog post, webinar, or more? We want to hear from you! Submit topic ideas (or questions!) here.

Re-Blog: JavaScript Multi Module Project – Continuous Integration

June 11th, 2014 by Amber Kaplan

lubos-krnacOur friend Lubos Krnac describes how to integrate Sauce with Protractor in a quest to implement continuous integration in his JavaScript multi module project with Grunt.

Below is a quote from his most recent blog post along side some code.

Read the rest of his post to get the full how-to here.

An important part of this setup is Protractor integration with Sauce Labs. Sauce Labs provides a Selenium server with WebDiver API for testing. Protractor uses Sauce Labs by default when you specify their credentials. Credentials are the only special configuration in test/protractor/protractorConf.js (bottom of the snippet). The other configuration was taken from the grunt-protractor-coverage example. I am using this Grunt plug-in for running Protractor tests and measuring code coverage.

// A reference configuration file.
exports.config = {
  // ----- What tests to run -----
  // Spec patterns are relative to the location of this config.
  specs: [
  // ----- Capabilities to be passed to the webdriver instance ----
  // For a full list of available capabilities, see
  // and
  capabilities: {
    'browserName': 'chrome'
    //  'browserName': 'firefox'
    //  'browserName': 'phantomjs'
  params: {
  // ----- More information for your tests ----
  // A base URL for your application under test. Calls to protractor.get()
  // with relative paths will be prepended with this.
  baseUrl: 'http://localhost:3000/',
  // Options to be passed to Jasmine-node.
  jasmineNodeOpts: {
    showColors: true, // Use colors in the command line report.
    isVerbose: true, // List all tests in the console
    includeStackTrace: true,
    defaultTimeoutInterval: 90000
  sauceUser: process.env.SAUCE_USERNAME,
  sauceKey: process.env.SAUCE_ACCESS_KEY

You may ask “how can I use localhost in the configuration, when a remote selenium server is used for testing?” Good question. Sauce Labs provides a very useful feature called Sauce Connect. It is a tunnel that emulates access to your machine from a Selenium server. This is super useful when you need to bypass company firewall. It will be used later in main project CI configuration.

Have an idea for a blog post, webinar, or more? We want to hear from you! Submit topic ideas (or questions!) here.

Bleacher Report’s Continuous Integration & Delivery Methodology: Creating an Integration Testing Server

June 10th, 2014 by Amber Kaplan

Bleacher-report-logoThis is the second of a three part series highlighting Bleacher Report’s continuous integration and delivery methodology by Felix Rodriguez.  Read the first post here.

Last week we discussed how to continuously deliver the latest version of your application to a staging server using Elastic Beanstalk. This week we will be discussing how Bleacher Report continuously runs integration tests immediately after the new version of our app has been deployed.

When our deploy is complete, we use a gem called Slackr to post a message in our #deploys chat room. This is simple enough and just about any chat software can do this. We chose to use Slack because of the built-in integration functionality.

We created an outgoing webhook that submits any posts to our #deploys channel as a post to our Cukebot server. The Chukebot server searches the text, checks for a “completed a deploy” message, then parses the message as a Json object that includes the deploy_id, user, repo, environment, branch, and Github hash.

class Parser
  ## Sample Input:
  # OGUXYCDI: Dan has completed a deploy of nikse/master-15551-the-web-frontpage-redux to stag_br5. Github Hash is 96dd307. Took 5 mins and 25 secs
  def self.slack(params)
    text = (params["text"])
    params["deploy_id"] = text.match(/^(.*):/)[1]
    params["branch"] = text.match(/of\s(.*)\sto/)[1]
    params["repo"] = text.match(/to.*_(.*?)\d\./)[1]
    params["cluster"] = text.match(/to(.*?)_.*\d\./)[1]
    params["env"] = text.match(/to\s.*_.*?(\d)\./)[1]
    params["suite"] = set_suite(params["repo"]) 
    params["hash"] = text.match(/is\s(.*?)\./)[1]
    puts params.inspect
    return params

Once parsed, we have all the information we need to submit and initiate a test suite run. A test suite and its contained tests are then recorded into our postgresql database.

Here is an example of what this suite would look like:

  id: 113,
  suite: "sanity",
  deploy_id: "FJBETJTY",
  status: "running",
  branch: "master",
  repo: "br",
  env: "4",
  all_passed: null,
  cluster: " stag",
  failure_log: null,
  last_hash: "0de4790"

Each test for that suite is stored in relation to the suite like so:

  id: 1151,
  name: "Live Blog - Has no 500s",
  url: "",
  session_id: "20b9a64d66ad4f00b21bcab574783d73",
  passed: true,
  suite_id: 113
  id: 1152,
  name: "Writer HQ - All Article Types Shown",
  url: "",
  session_id: "4edbe941fdd8461ab6d6332ab8618208",
  passed: true,
  suite_id: 113

This allows us to keep a record over time of every single test that was run and to which suite and deploy it belongs. We can get as granular as the exact code change using the Github hash and later include screenshots of the run. We also have a couple of different endpoints we can check for failed tests in a suite only, tests that have passed only, or the last test suite to run on an environment. We wanted to record everything in order to analyze our test data and create even more integrations.

This helps us automatically listen for those completed deploy messages we talked about earlier, as well as to have a way of tracking those tests runs later. After every test suite run we then post the permalink of the suite back into our #cukes chat room so that we have visibility across the company.

Another added benefit is that it allowed us to build a front end for non tech savvy people to initiate a test suite run on any environment.

Check it out for yourself; we just open sourced it.

Stay tuned next week for part two of this mini series! You can follow Felix on Twitter at .

Have an idea for a blog post, webinar, or more? We want to hear from you! Submit topic ideas (or questions!) here.

Guest Post: A Dialectical Theory of Software Quality, or Why You Need Independent QA

June 9th, 2014 by Amber Kaplan

QAProduct quality, and in particular software quality, can be an ephemeral characteristic of the product. It may not be easy to define, but in a sense, it is the opposite of the definition of pornography. You may not recognize it when it’s there, but you know it when it’s not. I propose that anything in a software product, or for that matter any other product, that induces unnecessary aggravation in the user is a detraction from the quality of the product.

For those unfamiliar with the term “dialectical” or its noun form, “dialectics”, these terms can be very roughly defined as an approach to looking at things that sees them as dualities. For example, the concept of “night” is more meaningful when coupled with the concept of “day.” “Good” has more meaning when paired with the concept of “evil”. Creative and constructive processes can be thought of as dialectical, where there is a tension between opposing imperatives and the result of such processes can be thought of as the resolution of these tensions.

As applied to the discipline of software engineering, one dialectic that exists is that between the imperatives of developers and architects and those of users. In the development process, the imperatives of independent QA engineers are those of users and are theoretically opposite to those of developers. Developers are totally absorbed in the technical intricacies of getting from point A to point B. They work to some set of explicit or implicit product functionality items that make up a product requirements set. Their concern is in how to implement these requirements as easily as possible. They work from the inside out, and are intimate with the details of how the functionality requirements are implemented. Independent QA, on the other hand, works from the same set of defined or implicit functionality and requirements but, in theory, does not care about the details of the implementation. QA engineers are intimately concerned with all aspects of how to use the product. By exercising the product, they find the points of aggravation to which the developers may be completely oblivious. To the extent that their findings are heeded, the quality, defined as, among other things, the lack of aggravation, can be enhanced.

In a sense, any piece of software that is run by someone other than the person who wrote it is being tested. The question is not whether the software will be tested, but by whom, how thoroughly, and under what circumstances. Any shortcuts, data formats, dependencies, and so many other elements that a developer used to get their code to run that are not present outside of their development environment may cause a problem when someone else runs that code.

There are many types of software testing. One fundamental division of testing is that between so-called white box testing and so-called black-box. White-box testing is testing carried out with knowledge of the internals of the software. Black-box testing emphasizes the exercise of the software’s functionality without regard to how it is implemented. Complete testing should include both types of tests. The emphasis in the text that follows is on black-box testing and the user experience, where the dialectical view of QA has the most relevance.

Bugs and other manifestations of poor quality cost money. There is a classical analysis that basically says that the cost of fixing a bug increases geometrically the later on in the development cycle it is found. Having your customer base be your principle test bed can prove to be expensive. Another possible source of expense is the support for workarounds for bugs that are not fixed. I can give a personal example of this. Some time ago I purchased an inexpensive hardware peripheral which came with a configuration software package. This package had a bug that, on the surface, is very minor, but when I used it I had problems configuring the product correctly. It took two calls to their support team to resolve the problem. Given the low price of this peripheral, one may wonder if their profit from the sale of this unit was wiped out. If many people call with the same question, how does this affect their earnings? How much does a product that is difficult to use, buggy, or otherwise of poor quality increase the cost of selling the product? Repeat sales cost less to generate then new sales and to the extent that poor quality impacts repeat sales, the cost of sales is driven up.

The scope of independent QA need not be limited to bug hunting. Test-driven development can be done at both the highest level and the unit level. QA can make an important contribution in the earliest phases of product specification by writing scenario documents in response to a simple features list before any detailed design is done. For example, in response to a single feature item such as “login”, a creative QA engineer may specify tests such as “attempt login specifying an invalid user name, attempt login specifying an incorrect password, begin login and then cancel, attempt login while login is in process, attempt multiple login multiple times specifying invalid passwords”, and on and on. Another engineer, seeing this list of tests, may well think of other tests to try. The developer writing the login functionality can see from the list what cases they need to account for early on in their coding. When something is available to test, the QA engineer executes the tests specified in the scenarios document. Those scenarios that turn out to be irrelevant because of the way the login functionality is implemented can be dropped. Other tests and scenarios that the tester thinks of or encounters in testing can be added. Ambiguities encountered in this testing can be brought to the attention of development for resolution early on.

As more and more software is Web-based, runs in Web browsers and is available to more non-technical users, usability issues become more important. How often have you visited Web sites and been unable or have had great difficulty in doing what you wanted? There are all too many feature-rich Web sites based on some usage model known only to the designer. The simplest of actions such as logout may become difficult simply because the hyperlink for it is in some obscure spot in a tiny font. A vigilant QA engineer given the task of testing this Web page may well notice this user inconvenience and report it. A common user scenario such as placing an order and then cancelling it may leave the user unsure about whether or not the order has actually been cancelled. The developer may not have thought of this scenario at all, or if they did, thought only in terms of a transaction that either went to completion or was rolled back. A consideration that is trivial to the developer, however, may cause grave consternation to the end user. A transaction that did not complete for some catastrophic reason such as a connection being dropped unexpectedly could well leave the end-user wondering about the state of their order. The independent QA engineer may identify a need for a customer to be able to log back into the site and view their pending orders.

Current trends in software development such as Agile, as well as the move to continuous integration and deployment, do not negate the need for an independent QA function. Indeed, continually making changes to an application’s UI, functionality, or operating assumptions may prove unnerving to users. Assumptions of convenience, such as the idea that the user community will understand how to work with a new UI design because they are already familiar with some arbitrary user model supporting it, can easily creep in under an environment of constant change carried out by people who do not question these assumptions. Independent QA is still needed to define and execute user scenarios made possible by product change as well as old scenarios whose execution steps may be made different by UI changes. Automated unit testing, programmatic API testing, and automated UI tests created by development-oriented engineers cannot simulate the dilemmas of a user who is new to the product or is confused by arbitrary UI changes. A highly visible example of this is the failure of Windows 8 to gain widespread acceptance and the huge market for third-party software to bring back the Start menu familiar to experienced Windows users. Nor was the smartphone-style UI, based on a platform with more inherentlimitations than the existing Windows desktop, a big hit with them.

The work of independent QA engineers can, among other things, serve as an “entry point” for tests that may later be added to an automated test suite. A set of steps, initially executed by an actual human doing ad-hoc or exploratory testing, that cause an operation to fail inelegantly, can lead to a test program or script that should be added to the suite that is executed in a continuous integration cycle.

None of these considerations invalidate the value of testing based on knowledge of the internals of a product. Unit testing, white box testing, and anything else that one can think of to exercise the application may uncover bugs or usage issues. White-box testing may quickly uncover change- introduced bugs that black-box testing might only find with a great deal of time and effort, or not at all. In this context, automated tests automatically kicked off as part of a continuous integration cycle are an extension of an existing white box regression test suite but not a replacement for actual hands-on, exploratory, black-box QA. You might say that white-box testing is the dialectical negation of black-box QA. It verifies that the individual pieces work, where independent, black-box QA verifies that the product works for the user. The two approaches to testing complement each other. Both are necessary for a more complete assessment of product quality.

By Paul Karsh for Sauce Labs

Have an idea for a blog post, webinar, or more? We want to hear from you! Submit topic ideas (or questions!) here.

Guest Post: Test Lessons at ExpoQA

June 6th, 2014 by Amber Kaplan

This is the first of a three part series by Matthew Heusser, software delivery consultant and writer. 

Every now and again and opportunity comes along that you just can’t refuse. Mine was to teach the one-day version of my class, lean software testing, in Madrid, Spain, then again the following week in Estonia. Instead of coming back to the United States, I’ll be staying in Europe, with a few days in Scotland and a TestRetreat in the Netherlands.

And a lot of time on airplanes.

The folks at Sauce Labs thought I might like to take notes and type a little on the plane, to share my stories with you.

The first major hit in Madrid is the culture shock; this was my first conference where English was not the primary language. The sessions were split between English and Spanish, with translators in a booth making sure all talks were available in all languages.

The Testing Divide

Right now, in testing, I am interested in two major categories: The day to day work of testing new features and also the work of release-testing after code complete. I call this release testing a ‘cadence’, and, across the board, I see companies trying to compress the cadence.

My second major surprise in Madrid is how wide the gap is —and I believe it is getting wider —between legacy teams that have not modernized and teams starting from scratch today. One tester reported a four-month cycle for testing. Another team, relying heavily on Cucumber and Selenium, were able to release every day.

Of course, things weren’t that simple. The Lithuanian team used a variety of techniques I can talk about in another post to reduce risk, something like devOps, which I can talk about in another post. The point here is the divide between the two worlds.

Large cadences slow down delivery. They slow it down a lot; think of the difference between machine farming in the early 20th century and the plow and horse of the 19th.

In farming, the Amish managed to survive by maintaining a simple life, with no cars, car insurance, gasoline, or even electricity to pay for. In software, organizations that have a decades-long head start: banks, insurance companies, and pension funds, may be able to survive without modernization.

I just can’t imagine it will be much fun.

Batches, Queues and Throughput

Like many other conferences, the first day of ExpoQA is tutorial day, and I taught the one-day version of my course on lean software testing. I expected to learn a little about course delivery, but not a lot —so the learning hit me like a ton a bricks.

The course covers the seven wastes of ‘lean’, along with methods to improve the flow of the team – for example, decreasing the size of the measured work, or ‘batch size’. Agile software development gets us this for free, moving from ‘projects’ to sprints, and within sprints, stories.

In the early afternoon we use dice and cards to simulate a software team that has equally weighted capacity between analysis, dev, test and operations —but high variability in work size. This slows down delivery. The fix is to reduce the variation, but it is not part of the project, so what the teams tend to do is to build up queues of work, so any role never runs out of work.

What this actually does is run up the work in progress inventory – the amount of work sitting around, waiting to be done. In the simulation I don’t penalize teams for this, but on real software projects, ‘holding’ work created multitasking, handoffs, and restarts, all of which slow down delivery.

My lesson: Things that are invisible look free —and my simulation is far from perfect.

After my tutorial it is time for a conference day – kicked off by Dr. Stuart Reid, presenting on the new ISO standard for software testing. Looking at the schedule, I see a familiar name; Mais Tawfik, who I met at WOPR20.Mais is an independent performance consultant; today she is presenting on “shades of performance testing.”

Performance Test Types

Starting with the idea that performance testing has three main measurements: Speed, Scalability, and Stability, Mais explains that there are different types of performance tests, from front-end performance (javascript, waterfalls of HTTP requests, page loading and rendering) to back-end (database, webserver), and also synthetic monitoring – creating known-value transactions continuously in production to see how long they take. She also talks about application usage patterns – how testing is tailored to the type of user, and how each new release might have new and different risks based on changes introduced. That means you might tailor the performance testing to the release.

At the end of her talk, Mais lists several scenarios and asks the audience what type of performance test would blend efficiency and effectiveness. For example, if a release is entirely database changes, and time is constrained, you might not execute your full performance testing suite/scripts, but instead focus on rerunning and timing the database performance. If the focus on changes is the front end, you might focus on how long it takes the user interface to load and display.

When Mais asks if people in the organization do performance testing or manage it, only a handful of people raise their hands. When she asks who has heard of FireBug, even less raise their hand.

Which makes me wonder if the audience is only doing functional testing. If they are, who does the performance testing? And do they not automate, or do they all use Internet Explorer?

The talk is translated; it is possible that more people know these tools, it was just that the translator was ‘behind’ and they did not know to raise their hands in time.

Here’s hoping!

Time For A Panel

At the end of the day I am invited to sit on a panel to discuss the present (and future) of testing, with Dr. Reid, Dorothy Graham, Derk-Jan De Grood, Celestina Bianco and Delores Ornia. The questions include, in no particular order:

  •             Will testers have to learn to code?
  •             How do we convince management of the important of QA and get included in projects?
  •             What is the future of testing? Will testers be out of a job?
  •             What can we do about the dearth of testing education in the world today?

For the problem with the lack of education, Dorothy Graham points to Dr. Reid and his standards effort as a possible input for university education.

When it is my turn, I bring up ISTQB The International Software Testing Qualifications Board. – if ISTQB is so successful (“300,000 testers can’t be wrong?”) then why is the last question relevant? Stefaan Luckermans, the moderator, replied that with 2.9 Million testers in the world, the certification had only reached 10%, and that’s fair, I suppose. Still, I’m not excited about the quality of testers that ISTQB turns out.

The thing I did not get to say, because of time, that I want to do is point out that ISTQB is, after all, just a response to a market demand for a 2-3 day training certification. What can a trainer really do in 2-3 days? At most, maybe, teach a single technical tool, turn the lightbulb of thinking on, or define a few terms. ISTQB defines a few terms, and it takes a few days.

The pursuit of excellent testing?

That’s the game of a lifetime.

By Matthew Heusser – for Sauce Labs

Stay tuned next week for part two of this mini series! You can follow Matt on Twitter at @mheusser.

Have an idea for a blog post, webinar, or more? We want to hear from you! Submit topic ideas (or questions!) here.