Evaluating our evaluations – proving improvements in design

Here at Bunnyfoot, we’re big believers in using solid evidence – we want to be able to prove that every stage of user testing what we do improves the overall user experience of the projects we work on.

A lot of thought by professionals and academics has gone into producing ways of measuring something as intangible and hard to describe as an individual’s experience – and to see its change over time. I thought you might be interested in how this is done in our industry. This blog post will give you insight into how Bunnyfoot arrived at its own evaluation methods.

Keep reading to get a glimpse into the complex but fascinating world of website usability and evaluation.

What evaluation methods are out there?

Finding and fixing issues in the design of a website isn’t always enough proof of improvement. Our clients always want evidence to show how their, and our, efforts are continually improving the user experience of their site across multiple tests – from the initial interactive prototype, to the beta code, through to the final live new site.

We’re often asked to use established evaluation questionnaires to measure the success of design improvement. These established questionnaires – like the System Usability Scale (SUS), System Usability Measurement Inventory (SUMI), or IBM’s After Scenario Questionnaire (ASQ) – are completed by participants following a usability testing session to get their opinions and levels of satisfaction.

There are good reasons for using the well-known and respected survey tools mentioned above. However, at Bunnyfoot, we’re reluctant to use them during our user testing sessions for several reasons:

  • they require responses from more participants than are used in traditional usability testing before they can be considered reliable. SUMI, for example, recommends 30 responses.
  • they are time consuming to administer and complete within an usability study, which typically lasts less than an hour.
  • we don’t know how reliable and valid they are for use with low fidelity prototypes (site wireframes or an initial mock-up).
  • the language used in many of these assessment tools is pretty complex which causes problems for participants with below average English literacy levels and for non-native English speakers.

Introducing Bunnyfoot’s Experience Questions (ExQ)

We realised that in order to give our clients a reliable way of showing continual improvements in design we would need to create our own usability evaluation questionnaire – one that is succinct and quick to complete.

So with none of the existing evaluation surveys fitting our needs exactly, we found, critiqued, cropped, and adapted our questions from existing ones to come up with a definitive set that works for us.

We tested our findings with groups of varying literacy levels, ages and internet skills which helped refine the phrasing and the number to a set of just 10 statements which participants are asked to rank in terms of their experience.

Even though these look deceptively simple, a lot of work went in to ensuring that we are asking exactly the right thing. Here are the 10 Experience Questions (ExQs) we present to our participants of website usability tests:

  1. I am satisfied with this website
  2. I felt frustrated while using this website
  3. This website did what I wanted it to do
  4. This website is wasting my time
  5. This website kept my attention
  6. The website style was wrong for me
  7. I was able to understand the text on the website
  8. It took too long to learn how to use this website
  9. The information on the website is laid out clearly
  10. Finding my way around this website was difficult

We are confident that our ExQs are understood by most participants (including those people for whom English is a second language, and some people with below average literacy levels) and allow participants to feel like they have provided an overview summary of their experience.

The results of our ExQs help us record how each iterative design has improved from the previous version.

Nothing’s perfect. There are always issues with this style of design evaluation. Here are some of the main issues we’ve uncovered, and how we’re tackling them.

Participants tend to report having a better experience than experts have observed. This effect appears with other questionnaires we’ve used. If the over estimations are consistent we could introduce an ‘adjustment’ to the scores which makes them more realistic. We’re investigating this possibility.

A few participants don’t read the questions and assume they are all positive statements. Sometimes participants can give the inverse mark to the one they intended.  At the moment our facilitators sit with the participants as they complete the questionnaire and read-out any answers that appear inconsistent with what the participant is saying – providing the opportunity for them to be changed if necessary.

The questions may not be sensitive enough to measure a change in the experience where a change actually exists. For example, if a second usability test produces less UX issues than the first; will the questionnaire responses also show that change? We’re considering different ways of working out how sensitive the questions are in detecting changes to user experience.

How this works for you

What really seems to work well with this approach is that the relationship between what your customers say, and what needs to be improved, is very clear.  For example, if potential customers provide low scores for being able to read the text on the website, then the clear action item is to improve the website copy. This can be backed by specific examples from testing.

Find out what other tools we have up our sleeve to help you get the most from a user testing session.

Further reading

Interested in an in-depth look at usability testing and evaluation? Below are some of the research papers that have fed into the development of our Experience Question set:

  • Bangor, A., Kortum, P.T. and Miller, J.A. (2008) An empirical evaluation of the System Usability Scale (SUS). International Journal of Human-Computer Interaction 24(6). 574-594
  • Brooke, J. (1996). “SUS: a “quick and dirty” usability scale”. in P. W. Jordan, B. Thomas, B. A. Weerdmeester, & A. L. McClelland. Usability Evaluation in Industry. London: Taylor and Francis
  • Finstad, K. (2010) Response Interpolation and Scale Sensitivity: Evidence Against 5-Point Scales. Journal of Usability Studies Vol.5(3) pp.1-4-110
  • Lewis, J. R. (1995) IBM Computer Usability Satisfaction Questionnaires: Psychometric Evaluation and Instructions for Use. International Journal of Human-Computer Interaction 1995 v.7 n.1 p.57-78
  • Lewis, J.R. & Sauro, J. (2009). The factor structure of the system usability scale. international conference (HCII 2009), San Diego CA, USA
  • Medlock, M.C., Wixon, D., Terrano, M., Romero, R., and Fulton, B. (2002). [1] Using the RITE method to improve products: A definition and a case study. Presented at the Usability Professionsals Association 2002, Orlando Florida.
  • Oppenhiem,A. (1992) Questionnaire design, interviewing and attitude measurement. London: Pinter
  • SUMI http://sumi.ucc.ie/ (Retrieved 27/9/2010)
  • Tullis, T & Albert, B (2008). Measuring the user experience NY: Morgan Kauffman
  • Tullis, T & Stetson, J (2004) A comparison of questionnaires for measuring usability. http://www.upassoc.org/usability_resources/conference/2004/UPA-2004
  • TullisStetson.pdfVirzi, R.A., Refining the Test Phase of Usability Evaluation: How Many Subjects is Enough? Human Factors, 1992. 34(4): p. 457-468