How to make Selenium tests reliable, scalable, and maintainable

stevebmark · on July 22, 2015

Selenium tests are inherently slow, unreliable and flappy. They have been the bane of developers for every employer I've had. Do yourself a favor and write React and test your components without a browser driver in good ol' JS with the occasional JSDom shim. It removes almost the entire need for Selenium, which should be reserved for only the faintest of smoke tests. And please, if you have to use Selenium, use headless Firefox, because PhantomJS is very bad software.

patio11 · on July 22, 2015

I had a Rails consultancy (Makandra) recently work on a JS-heavy application that I happen to own, and they got Selenium singing on it, which had been beyond my capabilities for years. One of their tricks, which you can inspect the implementation of in their (public) utilities library [+], is using basically a vendored Firefox per project and VNCing into that Firefox to drive things around. It is thus off-screen and out of the way when you're using it, but apparently is more true-to-reality than headless.

The test suite they wrote has about ~600 tests and while they're slower than I'd like (2~3 minutes) they've been bulletproof since we got my dev environment configured properly. It includes some fairly complicated interactions, most relevantly around our calendar interface.

[+] https://github.com/makandra/geordi

logn · on July 22, 2015

I had been using Firefox driver in Xvfb but wasn't happy with the performance/stability. So I built a Selenium driver out of Java only (using JavaFX's embedded WebKit) and used a headless JRE windowing toolkit (Monocle). My project is still a pre-release but the headless capability, Java-only system requirement, and its ajax handling might make it useful to some people currently: https://github.com/MachinePublishers/jBrowserDriver

boundlessdreamz · on July 22, 2015

This looks very interesting!

1. How does it compare with phantomJS?

2. What's the current webkit version?

3. How often does the javaFX webkit update?

logn · on July 25, 2015

1. Not quite sure. I've only used PhantomJS via Selenium Ghost Driver. From that usage they're similar. The main difference is that my driver uses only Java so under the hood the JRE is launching WebKit through JNI and everything runs in the same JRE process.

2. Current WebKit version depends on the JRE used. Oracle Java 1.8.0_45 has WebKit version 537.44.

3. Java maintainers will update WebKit periodically, including within a major version. E.g., here they update WebKit for the 1.8.0_60 JRE: http://openjdk.java.net/jeps/239 ... Other than that I'm not sure.

ericb · on July 22, 2015

Neat, but Affero public license? Ick.

ericb · on July 23, 2015

To clarify--this was something I was interested in helping with, until I saw the license.

logn · on July 25, 2015

Is there any flavor of GPL license you would prefer more? I don't plan on BSD or Apache license in the foreseeable future.

ericb · on July 28, 2015

Honestly, if it was useful, I would probably use this my side project which is commercial, but in no way competes with what you're doing (load testing). I might make changes or improvements, and I would generally contribute those back. Affero doesn't "mix well" so it is pretty much a non-starter for me.

I've found this is true for a lot of projects and it seems like restrictive licenses prevent projects from going mainstream.

tkinom · on July 22, 2015

I have trouble with Selenium's failure rate too. End up writing a test engine in Javascript. It handles async js func call with js function callback when finished to get rid of all the sleep-wait-retry type logic in Selenium.

Works very well. I can run the 100+ test cases in all IE/FF/Chrome/Safari, ios/android browser without change one line of JS/Test code. Runs fine with desktop with wire connect to cellphone browser on cell connection.

It tests out all the app backend db logic also. The time/pass/fail info are submitted back to the test backend db.

boundlessdreamz · on July 22, 2015

Are you planning to open source it?

Can you give an example of how it works? Say navigate to a page, fill in a form, click submit and verify that some text is present after submitting the form

rtpg · on July 22, 2015

Does it really? Does it test for whether a button you thought was present isn't actually clickable?

If you're going to write tests, I think it makes an insane amount of sense to emulate real world conditions as much as feasibly possible (making judgement calls on things that don't matter like speed of the mouse).

masklinn · on July 22, 2015

Most Selenium tests don't test that a button is actually clickable though, they find things through the DOM and if the button is offscreen or hidden they won't realise it.

xirdstl · on July 22, 2015

In my experience, an exception is thrown in my Selenium test if a button is hidden or not clickable and I try to click it.

jmickey · on July 22, 2015

This has been our experience as well. We invested a lot of time and money in making sure Selenium tests run reliably for our clients. Despite this, the best reliability we managed to achieve was 90% with tests that run for 40 minutes, which is obviously not acceptable.

We have compiled a few tips we learned along the way in our blog post - http://novoit.eu/blog/05-5-tips-when-writing-Selenium-browse...

crdoconnor · on July 22, 2015

>the best reliability we managed to achieve was 90% with tests that run for 40 minutes, which is obviously not acceptable.

What was actually going wrong during that 10%?

I get something closer to 100% reliability, so I'm feeling a little perplexed by all of this.

Do you make heavy use of sleeps?

jmickey · on July 22, 2015

Mostly these would be cases where the browser would, seemingly at random, end up in an unpredictable state and all proceeding test scenarios would fail because of this. (Page is white, or a completely unrelated website gets opened. We have seen lots of weird situations so far)

This might be exacerbated by the fact that we use the remote Browserstack Selenium hosting service so that the tests can be executed automatically as a part of our deployment process.

crdoconnor · on July 22, 2015

White page and randomly ending up on an unrelated website both sound like bugs.

_csoo · on July 22, 2015

>tests that run for 40 minutes

This is pretty good actually. It sucks if you're relying on Selenium testing for verifying your code as you're writing it, but before and after deploys to staging and production? This isn't bad at all.

stevebmark · on July 25, 2015

40 minutes from clicking a button to deploy is actually abysmal, especially when you need to worry about things like rollbacks, or deploying at the end of the day, or releasing quick hotfixes to users. In modern build processes, even 10 minutes seems too long.

lhorie · on July 22, 2015

For testing Mithril.js, I wrote a mock window object, which allow you to do things like simulate requestAnimationFrame clicks, JSON-P calls and browser quirks from non-browser environments (e.g. from a Node.js script). So to test, you simply swap `window` with the mock and you can drive your fake browser however you wish.

http://lhorie.github.io/mithril/mithril.deps.html

You can cover a lot of ground with that approach and make an extremely fast test suite that is suitable for a save-refresh-test workflow and then you can put trickier tests in a secondary test suite that you only run once in a while (e.g. before a commit)

technion · on July 22, 2015

Can you expand on this?

The extent of the testing I'm currently interested in is "load a page, does the JS on that page run without error"? It won't execute from a CLI and everyone I talked to pointed me at Selenium.

stevebmark · on July 25, 2015

Testing for a page load without JS error is a fine use case, and is an example of what I meant by the "faintest of smoke tests." It's a test that has very little chance to flap, fail, or force you to write hacky commands around Selenium's unreliable API.

discodave · on July 22, 2015

I pretty much agree with your thoughts but what makes you say PhantomJS is bad software?

t0mbstone · on July 22, 2015

I currently manage a rather large test suite (around 700 different tests) using Selenium, which is all written in Ruby and Rspec (although I've also used Cucumber), and uses the gems Capybara (an abstraction layer for querying and manipulating the web browser via the Selenium driver) and SitePrism (for managing page objects and organizing re-usable sections).

The entire suite runs in around 10 minutes on CircleCI, using 8 parallel threads (each running an instance of the Firefox Selenium driver), and it is rock solid stable.

It took us a while to get to this point, though.

The hard part is handling timing due to Javascript race conditions on the front-end. I had to write my own helper methods like "wait_for_ajax" that I sprinkle in various page object methods to wait for any jQuery AJAX requests to complete. I also use a "wait_until_true" method that can evaluate a block of code over and over until a time limit has been reached before throwing an exception. Once you figure out ways to solve those types of issues, testing things with Selenium becomes a lot more stable and easy.

I have also used the exact same techniques (page objects, custom waiter methods for race conditions, etc) to test mobile apps on iOS and Android with Selenium.

It can be a challenge, but once you have a system down and you know what you are doing, it's not so bad.

paulddraper · on July 21, 2015

The most annoying thing I found with Selenium was that it wouldn't wait for the browser to respond to click events and rerender.

The approach in the blog post (and I think elsewhere ... not sure) is to poll the DOM with a timeout.

Is there a better solution to be add with something like `executeScript`? You could run `requestAnimationFrame`, and then poll for an indicator that the click, etc. handler has indeed finished. That way if it fails, you know about it pretty soon, without the need for long timeouts. This is all just a guess though.

crdoconnor · on July 22, 2015

>Is there a better solution

Yes. And it's pretty simple:

    WebDriver driver = new FirefoxDriver();
    driver.get("http://somedomain/url_that_delays_loading");
    WebElement myDynamicElement = (new WebDriverWait(driver, 10))
  .until(ExpectedConditions.presenceOfElementLocated(By.id("myDynamicElement")));

From : http://docs.seleniumhq.org/docs/04_webdriver_advanced.jsp

drothlis · on July 22, 2015

I'm not sure this satisfies your parent poster's requirement of: "if it fails, you know about it pretty soon, without the need for long timeouts."

crdoconnor · on July 23, 2015

Well, you need to have a timeout.

You can make the timeout shorter when running the test on a dev environment, though, so you get quicker feedback about errors.

pmontra · on July 22, 2015

Ruby's Capybara encapsulates Selenium and waits until elements appear on the page (the default timeout is 2 seconds). So you can write simple sequential code like

    click_link('bar')
    expect(page).to have_content('baz')

and it will work even if the baz element is injected into the page by an Ajax request to the server triggered by clicking on bar. I've been using it for many years but I didn't check how they implement it. Maybe a callback from a MutationObserver? https://developer.mozilla.org/en-US/docs/Web/API/MutationObs...

Documentation at https://github.com/jnicklas/capybara#asynchronous-javascript...

drothlis · on July 22, 2015

According to that documentation you linked, it just polls until `default_max_wait_time` (which defaults to 2 seconds).

glenk · on July 22, 2015

I have had some good results using the F# canopy library(http://lefthandedgoat.github.io/canopy/) for working with selenium. It handles (most) all the waits for you so you don't have to scatter a bunch of sleeps in your tests and is pretty easy to work with.

jakejake · on July 22, 2015

There's some utility methods like FluentWait, but ultimately they're just convenience wrappers for polling the DOM and waiting.

bcjordan · on July 22, 2015

Nice rundown, wish I had read this a year ago!

> One developer designed a way to take a screenshot of our main drawing canvas and store it in Amazon’s S3 service. This was then integrated with a screenshot comparison tool to do image comparison tests.

I would also take a look at Applitools https://applitools.com/ — they have Selenium webdriver-compatible libraries that do this screenshot taking/upload and offer a nice interface for comparing screenshot differences (and for adding ignore areas). Way fewer false failures than typical pdiff/imagemagick comparisons.

drothlis · on July 22, 2015

If using Selenium's Python bindings, you can take a screenshot from Selenium and convert it to OpenCV format like this:

    cv2.imdecode(
        numpy.asarray(
            bytearray(base64.decodestring(driver.get_screenshot_as_base64())),
            dtype=numpy.uint8),
        cv2.CV_LOAD_IMAGE_UNCHANGED)

(where `driver` is your WebDriver object, e.g. `WebDriver.Chrome()`).

Then to match that frame against a previously-captured "template" image, you can use stb-tester's[1] "match" function[2] which allows you to specify things like the region to ignore and tweak the matching sensitivity.

[1] http://stb-tester.com [2] http://stb-tester.com/stb-tester-one/rev2015.1/python-api#st...

novocaine · on July 22, 2015

Everyone in the blogosphere (and at my own company) writing non-app-specific layers on top of selenium suggests that there is scope for a higher level framework that can be used on top of selenium. Or that the selenium api is too thin a layer over webdriver.

Does anyone know of such a project?

crdoconnor · on July 22, 2015

I did the exact opposite. I ripped out some robot framework tests and replaced the code with python using selenium webdriver. Works great.

shicky · on July 22, 2015

can I ask why you decided to do this? The tests were just flakey while in robot framework world?

crdoconnor · on July 22, 2015

I absolutely hated the robot framework. The DSL was just horrible to use. It had weird, unnecessary syntax quirks and gave you the minimal amount of information if something failed (wouldn't tell you which line number it failed on, for instance).

The tests were also flaky as hell but that was more to do with poor environment management. That, admittedly, was also easier to fix in python.

tommyd · on July 22, 2015

You might find Site Prism interesting: https://github.com/natritmeyer/site_prism (there are alternatives such as http://watirwebdriver.com/page-objects/ and https://github.com/sensiolabs/BehatPageObjectExtension, but I have no experience with them).

It provides a "page object model" implementation on top of Capybara, so you can define a model for each page you want to test, which stores the page's relative URL, and has references to all the elements on the page you care about, and methods for all the interactions you want to do with that page.

So for example, you might have a "LoginPage" model, which contains the following:

  class LoginPage < SitePrism::Page
    set_url "/login"

    element :username_input, '.username-input'
    element :password_input, '.password-input'
    element :submit, '.submit-button'

    def login(username, password)
      load # Load the page URL in the Selenium instance
      username_input.set(username) # Fill in username
      password_input.set(password) # Fill in password
      submit.click # Click submit
    end
  end

Then whenever you want to login from one of your steps, you can just do:

  login_page = LoginPage.new
  login_page.login('whoever', 'what3v3r')

I think it's a nice abstraction as it allows more experienced test automation developers to build the page model while less experienced ones can write steps just calling the methods. You still have to pay a lot of attention to things like appropriate use of "wait for element to appear" rather than "sleep", and ensuring tests use isolated data, to get it working reliably, but we've got it working pretty well at my current place.

I should write up how we have it set up at some point as we have our own app-specific framework on top of SitePrism which provides some useful abstractions to make it quicker to develop tests.

azernik · on July 22, 2015

I'm just getting into Play Framework development, and they ship with FluentLenium, which seems to add some a more friendly API and convenience functions. Nothing too fancy, but just looking at the pure-Selenium coffee examples people have posted here shows how dramatic the effect can be.

The one downside is that the developers only seem to tag official releases once in a blue moon; despite the github repo being well updated, the last push to Maven was more than half a year ago, and so depends on a rather old version of Selenium.

noir_lord · on July 22, 2015

I just write my own layers on top of Selenium (with python)

This one is a rough test automation, mostly used for filling in forms etc during development http://kopy.io/LMBKt (old one but to hand) handy to be able to open, login and fill in a form in a few seconds that by hand would take minutes.

I find that way works as the abstraction is only one level removed and I can just throw in methods that relate to that project.

mherrmann · on July 22, 2015

Yes!! http://heliumhq.com (commercial; I'm one of the people behind it)

sry_not4sale · on July 22, 2015

Have you looked into BDD tools like Behat/Behave/Cucumber ?

sjansen · on July 22, 2015

Here's the presentation the post is based on: https://www.youtube.com/watch?v=5K6bwikZulI

bobm_kite9 · on July 22, 2015

The PageObjects tip is a really good one. Previously using Selenium you end up with a complete maintainability nightmare.

I used Geb on a recent project, and I actually felt that the tests I built demonstrated a passable level of engineering discipline. However, Geb was really hard to learn (partly the error messages were really confusing/missing) and you're still on top of Selenium so you still get wacky exceptions and edge cases.

karlosmid · on July 22, 2015

Also, by switching from Java to Ruby ecosystem is one way to improve your selenium tests. For start, use watir-webdriver and page-object gems.

EdwardDiego · on July 22, 2015

Improve them how though? Speed? Reliability? If it's just a nicer API, that's all well and good, but until the key problems I face with Selenium are solved (slow and non-deterministic tests) then a nicer API to it is just rearranging deck-chairs on the Titanic.

karlosmid · on July 22, 2015

It seems that you try to use selenium 2, or webdriver, in order to run your unit tests. Selenium is for browser test, and by its nature it can not run in milliseconds. Its execution time is in seconds. Even when use phantomjs webdriver. It is integration testing approach because it combines execution of several javascript modules. That run in real browser. Selenium has its purpose, but fast test execution is not one of them.

EdwardDiego · on July 22, 2015

> It seems that you try to use selenium 2, or webdriver, in order to run your unit tests.

Nope. Integration tests. But integration tests that start a Firefox instance from scratch and have to be rerun multiple times to pass due to non-determinism are slow.

karlosmid · on July 23, 2015

Could you please provide one example of non-determinism? I would like to understand what exactly do YOU mean by that term.

pmontra · on July 22, 2015

Using Capybara alone one gets most of the stuff they had to implement (page, with, retries, ...) but I'll look into those gems you suggest.

Maybe the Scala ecosystem is still immature on the side of integration testing. They could implement them in Ruby if they are familiar with the language. I don't feel OK about using two languages but at least it could enforce strict separation between integration testing and the application.

defied · on July 22, 2015

Some very good information in this article. It is true that Selenium has its quirks, retrying a failed test can sometimes result in a passing test.

Disclaimer: I work for https://testingbot.com : at my work we offer our customers automatic retries when a test fails. Writing a Selenium test does take its time, but once you run it in parallel across hundreds of browser and os combinations, it's worth it.

crdoconnor · on July 22, 2015

>retrying a failed test can sometimes result in a passing test.

This is usually a sign of either a buggy test or buggy code.

labianchin · on July 23, 2015

I wonder if there are stories about running Selenium tests in production. Something in the lines of semantic monitoring (http://www.thoughtworks.com/radar/techniques/semantic-monito...)

shicky · on July 22, 2015

Great to see a HN post on testing, they seem few and far between to me!

marktangotango · on July 22, 2015

BrowserMob, that was a sweet service (based on selenium). Does anyone know what happened to those guys after they sold? I've always wanted to learn more about their story.

nirvdrum · on July 22, 2015

I don't know about the entire team, but Patrick and Ivan were at Neustar for a while. They're both at NewRelic right now.

derricki · on July 21, 2015

I do find Selenium a overly complicated so thanks for the post.

mjswensen · on July 21, 2015

There is a nice presentation at the bottom with some code examples, too.

gowan · on July 22, 2015

tldr; have developers help maintain automation tests

NegativeK · on July 22, 2015

Summarizing away a technical article makes for a not very useful summary.

ilovefood · on July 22, 2015

using it right now for my latest project, it is a nightmare. I have 1100 tests that have to run per night. I'm using PhantomJS. It is such a mess ! ! !

drothlis · on July 22, 2015

  > getWithRetry takes a function with a return value
  > 
  >   def numberOfChildren(implicit user: LucidUser): Int = {
  >    getWithRetry() {
  >      user.driver.getCssElement(visibleCss).children.size
  >    }
  >   }
  > 
  > predicateWithRetry takes function that returns a boolean and will retry on any false values
  > 
  >   def onPage(implicit user: LucidUser): Boolean = {
  >    predicateWithRetry() {
  >      user.driver.getCurrentUrl.contains(pageUrl)
  >    }
  >   }

At first I didn't get the difference between `getWithRetry` and `predicateWithRetry`, but then I noticed that the former throws an exception whereas the latter returns false. I infer that `getWithRetry` will handle exceptions thrown by the retried function.

In stb-tester[1] (a UI tool/framework targeted more at consumer electronics devices where the only access you have to the system-under-test is an HDMI output) after a few years we've settled on a `wait_until` function, which waits until the retried function returns a "truthy" value. `wait_until` returns whatever the retried function returns:

  def miniguide_is_up():
      return match("miniguide.png")

  press(Key.INFO)
  assert wait_until(miniguide_is_up)
  # or:
  if wait_until(miniguide_is_up): ...

(This is Python code.)

Since we use `assert` instead of throwing exceptions in our retried function, `wait_until` seems to fill both the roles of `getWithRetry` and `predicateWithRetry`. I suppose that you've chosen to go with 2 separate functions because so many of the APIs provided by Selenium throw exceptions instead of returning true/false.

  > doWithRetry takes a function with no return type
  >
  >   def clickFillColorWell(implicit user: LucidUser) {
  >    doWithRetry() {
  >      user.clickElementByCss("#fill-colorwell-color-well-wrapper")
  >    }

Unlike Selenium, when testing the UI of an external device we have no way of noticing whether an action failed, other than by checking the device's video output. For example we have `press` to send an infrared signal ("press a button on the remote control"), but that will never throw unless you've forgotten to plug in your infrared emitter. I haven't come up with a really natural way of specifying the retry of actions. We have `press_until_match`, but that's not very general. The best I have come up with is `do_until`, which takes two functions: The action to do, and the predicate to say whether the action succeeded.

  do_until(
      lambda: press(Key.INFO),
      miniguide_is_up)

It's not ideal, given the limitations around Python's lambdas (anonymous functions). Using Python's normal looping constructs is also not ideal:

  # Could get into an infinite loop if the system-under-test fails
  while not miniguide_is_up():
      press(Key.INFO)

  # This is very verbose, and it uses an obscure Python feature: `for...else`[2]
  for _ in range(10):
      press(Key.INFO)
      if miniguide_is_up():
          break
  else:
      assert False, "Miniguide didn't appear after pressing INFO 10 times"

Thanks for the article, I enjoyed it and it has reminded me to write up more of my experiences with UI testing. I take it that the article's sample code is Scala? I like its syntax for anonymous functions.

[1] http://stb-tester.com [2] https://docs.python.org/2/reference/compound_stmts.html#the-...

jarr416 · on July 22, 2015

Thanks for the comment. We actually originally had a waitUntil function that was basically used for all three of the cases I mentioned above. In some sections of the code, it was just there to eat errors, other sections get some text, and yet others it was wrapped in an assert and needed to return a boolean. This led to chronic misuse around the code (I found 4-5 tests that simply forgot to wrap it in an assert effectively rendering the test completely worthless). The main benefit we got from splitting the methods out was making it clear to developers what it did. Catching all the exceptions thrown by Selenium instead of returning booleans was just an added benefit.

And you are correct, we are using Scala. There are some really cool things about the language, case classes, pattern matching, first order functions, and traits just to name a few.

drothlis · on July 24, 2015

> This led to chronic misuse around the code (I found 4-5 tests that simply forgot to wrap it in an assert effectively rendering the test completely worthless).

Yes, I've been bitten by that too -- it's too easy to forget the "assert". This morning it occurred to me that I could write a pylint (static analysis) checker to catch that, so I've done just that: https://github.com/stb-tester/stb-tester/commit/5e5bdbb

mherrmann · on July 22, 2015

I'm working for a startup that addresses this by means of a simpla wrapper API: http://heliumhq.com. Human-readable tests with no more HTML IDs, CSS selectors, XPaths or other implementation details.