Observation involves watching, recording and then analyzing events and behaviour. Observation occurs in laboratory experiments and field studies, where users are watched while executing pre-set tasks, or real-life activities, respectively. While observing users, researchers should record what is done, how long tasks take, and reactions to events (verbal, facial and body language responses). Written logs, audio and video recordings are the most common methods of capturing what occurs in a usability test.
As with interviews, observations can be highly structured to fit a pre-determined framework, or adaptable to situations as they arise. When observations are highly structured and prior categorisations of behaviour are used, the process is really just a type of experimentation; the more flexible approach where the participants behaviour determines what is studied, is akin to action research.
Interaction with participants can be non-existent (e.g. if observers are simply counting and timing events), marginal (e.g. if observers talk to participants to capture qualitative data), complete (if the observer joins the group as a fully-fledged member working on the task at hand) or partial (somewhere in between the extremes of non-existent and complete participation, depending on the evaluation).
Observation in a field study is similar to observation in a controlled environment, except that there is generally a greater emphasis on observing interaction between people in the field. Field studies are important because computers are not used in isolation but within a particular leisure or work environment; so complexities and cultural nuances introduced by the real-life environment should be included in observation records.
Since writing is slow and distracts the observer, some backup recording equipment is advisable. Audio tapes are a comparatively unobtrusive backup, but take long to transcribe, particularly if the person doing the transcribing is not a researcher able to distinguish noteworthy points from insignificant comments. If a visual record is also required, photographs are usually adequate, and are cheaper and easier to process than video recordings. Particularly in a field study, photographs of artifacts produced (sketches, notes, etc.) are a useful adjunct to mere recording of activities. The presence of a video camera can be intimidating to subjects, and make researchers too focused on the specific scene being filmed while neglecting other aspects. Of course, video is the most complete and indisputable form of recording a particular scene that the camera is focused on. Tools such as Observer Video-Pro (Noldus 2000) record time on video tape, allow marked parts to be copied to an edit list, and can synchronise keyboard entries and other observational data with the video.
Computer-generated logs of activities and associated times also give wholly objective evidence. They provide useful backup data which can be easily analysed by appropriate software, and can also be used in large studies involving a great many participants (e.g. on the Web). Since computer logs can be completely indiscernible, professionalism requires that researchers inform subjects what is being recorded and why.
In theory, observers can participate fully in the process being observed, or they can play no part in the process at all and remain strictly observers only; but in practice something between these two extremes will often occur. It is worthwhile considering beforehand whether seeing the observer is likely to affect the behaviour of subjects, and whether a natural or artificial environment is most appropriate. This can be established by first conducting a short pilot study.
The location of equipment users will employ, of observers, and of cameras and microphones, needs to be decided and tested in advance. For example, one camera may point to the keyboard and another to the subject. Equipment must be checked to ensure it is functional and correctly used (with appropriate focus, volume, etc.) Users need to be found, a convenient time arranged, and consent forms produced and signed. Initial instructions need to be written out and checked. Find out a little about participants so that you can make them comfortable when they arrive by knowing something of what interests them. In the case of a field study, observers should acquaint themselves with the environment and task being studied beforehand; this will give them greater credibility in the community and also enable them to keep better records from the outset.
Once the task has been designed and the behaviour to monitor decided, a detailed report is needed, describing the purpose of each step in the process being observed, what to look for at each such step, and how to record this. This is especially important if a team of observers is employed, to ensure that they operate in the same way.
When there is a team of observers, they need to decide what and who each team member will observe. The fewer people or events one is watching, the more accurate and detailed ones notes can be, but it is also helpful to have more than one person observing the same thing so that they can compare notes. If there are enough people, the job can be divided up, not by the number of observers, but by the number of observer-pairs. Then each part of the observation job is done by a pair of observers and can thus benefit from having two perspectives and from observers having someone to discuss events/problems with. By varying the pairing over time, each observer ends up doing a good cross-section of the work and has shared experiences with a number of fellow-observers. The assignment of tasks to an observation team needs to be carefully documented in advance.
Since field studies do not have the benefit of structure, a framework is useful in keeping observers focused and providing some way of organizing their records. This comprises a checklist to adhere to, containing reminds such as: actors, goals/tasks tackled, activities, events, feelings [Goestz and LeCompte 84, Robson 93]. Such a framework guides researchers to know what and whom they should observe when, and what questions/aspects should be addressed in their notes. This framework should take into account the real-world context and include both detail points and observations of the bigger picture.
References:
J.P. Goetz & M.D. LeCompte. Ethnography and qualitative design in educational research. Academic Press:1984.
C. Robson. Real World Research. Blackwell:1993.
Observers can be present in the test environment, or can observe through a one-way glass or via a remote screen. Observers should cause as little disruption as possible, even it they are marginally, partially or fully participating in what the subjects are doing. Audio, photographic and video equipment should also be located and handled as unobtrusively as possible.
The main source of information recorded during an observation is the notes and sketches observers make by hand (or perhaps using a laptop, but remember that battery charge does not last very long). In the course of observing subjects, it can be helpful to both parties if the observer shows the participant their notes at a convenient time during the study. Software tools such as NUDIST can sort and search field notes for words, phrases and categories, and perform content analysis of large bodies of text.
Quantitative observations are easily recorded as counts in a diagram or table (for example with separate rows for different characteristics/behaviours, and columns for different conditions under which they were observed). Qualitative information is harder to capture, and requires a judicious mix of adhering to the prepared plan and being guided by interesting developments that occur during the observation. When making notes, observers should flag uncertainties or issues they want to come back to. After processing observations and synthesizing them, it is best to go over this with the subjects, to detect any misconceptions. In a field study, this should be done daily to check that observers are interpreting correctly what they see happening.
Since it is impossible for any observer or observation equipment to know what a subject is thinking, a useful technique is to ask them to keep up a running commentary (spoken out loud) on what they are doing (including what they are looking at) and what they are thinking. The problem is that people will become silent sooner or later, typically just when the task becomes interesting or difficult. A request by the observer to keep talking is obtrusive. One helpful technique for limiting silences is to have subjects working in pairs, as it is more natural to talk to another person than to oneself.
In some user studies, participants are asked to keep diaries as a record of their activities and feelings, possibly supplemented by photographs of what they have done, if cameras can be supplied. A template for the diary helps to structure this information, which, particularly if it is electronic, can be relatively easily transferred into a text database.
In a field study it can be difficult to know when to stop if there are neither time constraints nor a need to observe until a specific task is completed. In these situations observers should be guided by the amount they learn each day; they can stop when this becomes insignificant.
It is best to process notes and recordings as soon as possible after the observations are made. Details can be checked, and observers can consult each other in sorting out ambiguities. Facts should be distinguished from opinions. Interesting and unanticipated events or trends should be identified and the possibility of exploring them further discussed. A multimedia database should be constructed to collect and protect all the different types of information.
Qualitative information is best handled by a team of researchers who discuss what they saw and what the observed behaviour says to them. Researchers should first skim through observation data - looking for what stands out, for patterns, for answers to research questions questions, and for interesting or unexpected events. Such situations are typically when people are at a loss, or have made a mistake, or have reached a conflict situation, etc. If researchers are aware of body language, facial expression, silence, inactivity, voice tone and so forth they are more likely to detect such incidents in audio/video material.
Another type of analysis of qualitative data that is sometimes performed is conversational analysis (e.g. in chat rooms, bulletin boards and the like). This is highly subjective as it relies on interpreting how people use language, and studying conversations in fine detail.
The interpretation of qualitative observation data is best reported in written form but supplemented with statistical analysis, photos, audio or video footage where appropriate. This involves collecting sets of evidence illustrating each specific pattern, theme, trend, fact or opinion. Software tools such as NUDIST can sort and search field notes for words, phrases and categories, and perform content analysis of large bodies of text.
The written report documenting an observation project should combine quantitative and qualitative feedback, both for formative studies (done during development, to guide improvements) and summative studies (done at the end of a project, to evaluate its success or test a hypothesis).
It is possible to convert qualitative information obtained from notes, cameras and tapes into quantitative data by categorizing and counting events. Categorisation of qualitative information makes it possible to analyse such data, as long as the categories are clear and non-overlapping. The inter-research reliability rating of a categorization scheme measures the percentage of cases categorized in the same way by two researchers. This should be measured in a pilot study with a small number of subjects, to test if a practical categorization scheme has been defined. If this rating is too low, either the researchers were not adequately trained in categorizing what they observe, or else (more usually) the categorization is poor and needs to be revised. Discussion of pilot study cases can help identify problems with a proposed categorization.
Tools such as Observer Video-Pro (Noldus 2000) record time on video tape, allow marked parts to be copied to an edit list, and can synchronise keyboard entries and other observational data with the video. If observers use packages to mark videos when specific kinds of event occur, then this information, as well as computer-generated logs, can be used to analyse duration and frequency of events of interest. Occurrence counts are also useful (taken from such video packages or by counting observer marks in hand-written notes or audio tapes).
Quantitative data can be analysed using simple statistics to find maxima, means, standard deviations, T-tests, etc.