When you’ve been using Selenium for UI-based testing by using browser automation as long as I have, you occasionally see questions repeat themselves. Nearly any website of any complexity has functionality that relies on user authentication. Nowadays, that’s most often done with some sort of form-based UI, which creates a session that gets tracked by the browser via a cookie. The variations of those are wide, but they don’t require any additional tooling to handle via Selenium.
However, every now and again, someone will post to one of the Selenium mailing lists or on the IRC/Slack channel asking about a site they have to automate that relies on the browser itself asking for the credentials with which to attempt authentication. Selenium is very good at automating pages within a browser, but makes little or no attempt at automating the parts of the browser outside the page being displayed, like file download selection dialogs, print dialogs, or, and most relevant here, browser-displayed credential dialogs. This usually means that the user tries to find another way to manipulate the dialog, and that often means a language-specific tool (like Java’s Robot class) or a platform-specific one (like AutoIt), or an approach that once worked with browser, but has now been deprecated and disallowed by nearly all as a security risk (putting the user name and password directly in the URL). The challenge with these approaches is that they rarely scale well, and almost never work correctly in the remote or grid case.
This happens often enough that I’m going to start a series of posts about authentication, and how to effectively automate it with Selenium, without resorting to single-language or single-platform utilities. The last time I posted a series of blog posts about how to accomplish something using Selenium combined with other tools, one of the comments I got was that they were “not impressed it [took] three blog articles to explain how to accomplish” the task.
The blog post series in question could easily have been done in one post, with a simple code example, but without any explanation of what the code was actually doing. I don’t believe that approach is worthwhile, and is actually detrimental to learning.
So let me lay this out ahead of time. The content of this post series could be done in a single post. It would be extraordinarily long, and a lot of it would end up saying, “go look at the sample code to get the full picture.” I’d rather not take that approach. I’d much rather take my time, and at least attempt to give the relevant details in smaller, more quickly digested chunks. If you find that approach lacking, and would rather “just give me teh codez,” I’ll respectfully suggest heading elsewhere. As I add posts to the series, I’ll try to keep an updated list of the parts of the series at the bottom of each post so that you can jump to the section you need.
One last thing, I’ll be showing code using C#. Assuming you have similar libraries available in your language of choice, the code can be ported to other languages and libraries. Doing so in this space will be beyond the scope of this post series. Also, I’ll be using a more verbose coding style than is possible with modern versions of C#. This is a stylistic choice for explicitness and clarity; you’re welcome to use more modern syntax in your own code.