A Usability Evaluation Experiment of 3D User Interfaces for Elderly: a pilot study

Health professionals have used 3D user interfaces as support tools for the elderly rehabilitation, offering fun and beneficial resources for the practice of physical and cognitive activities to them. In this context, it is necessary to establish mechanisms to evaluate the usability of these interfaces, in order to achieve a balance between functionality, ease of use and sense of well-being. This paper aims to report a pilot usability study for a virtual reality game developed specifically for the elderly, as a means to identify the needs of this public regarding 3D user interface evaluation. An initial methodology was tested exploring two points of view in the game - first-and third-person, showing good results for seniors. However, the need to include training periods was noted, and an evaluation with a heterogeneous group of seniors to consolidate and optimize the proposed approach, as well as readjust the instruments used.


Introduction
Usability is the variety and the degree to which system features can be used e ciently so that the user can accomplish tasks e ectively and intuitively (Karray et al., 2008). The interactive systems only can be useful and practical if they have good usability, and the real e cacy of a system is obtained when there is balance between the functionality and the usability. Therefore, it is important to evaluate the usability of a system to reach best results.
According to Nielsen (2012Nielsen ( , 1993, the main usability characteristics to evaluate are the easily and the e ciency during the task performance, the easily to reuse resources, the reestablishment of the services after system faults, and the satisfaction experienced by the participant during the use of the system. Evaluate the usability is fundamental to establish the relation between the quality of the interactive system and the quality of the interaction (Cockton, 2012). This author mentions that the methods and the metrics contribute for determinate the usability extension, measuring the robustness, the goals and the reliance -when the usability evaluation points the utility of a system or device. Because of this, it is necessary the utilization of a method or protocol that include and can evaluate reliably all these issues. Tridimensional user interfaces (3DUI), as Virtual Reality (VR) applications, are becoming popular in the game area, and requires of a usability evaluation protocol to test its interaction process. Serious Games for elderly, for example, have been used in clinical intervention of rehabilitation (Fiorin et al., 2014). This kind of games can help to stimulate the practice of bene cial activities to the human body and increase the interest of the patient for the treatment, because the traditional intervention usually is slow and painful (Broeren et al., 2008).
Applications for seniors require, as any other system, of an evaluation method to test the quality of the interaction, in order to demystify the lack of access, practice or fear of this public (Carvalho and Ishitani, 2013). Besides that, it is important to ensure that the solution be proper to elderly pro le. This can motivate a greater use and greater production of these solutions, including by the tendency that the population, in a near future, will be older and prepared for the technological news.
With this in mind, this paper aims to report a pilot usability study for a virtual reality game developed speci cally for the elderly, as a means to identify the needs of this public regarding 3DUI evaluation. Therefore, it is possible to obtain subsidies for further development of a speci c evaluation methodology for 3DUI used only by elderly. We chose to apply a methodology proposed by Simor (2016), used with seniors in 2D interfaces, making some adjustments to apply it in 3DUI context. It consists of a sequence of steps to identify which interface features meet the speci cations foreseen and which need review, contributing to the future project of an evaluation methodology.
This article is organized as follow: Section 2 presents the related work; Section 3 describes our approach; Section 4 demonstrates the approach validation; Section 5 presents the results of the experiment; Section 6 presents the discussions; and Section 7 shows our conclusions about this study. Sheu et al. (2015) address issues of how to project a serious game based in gestures so that elderly can play in a safe, conveniently and nice way. The study used two games developed by the authors, EG I and EG II (being the second an optimized version of the rst), using the Kinect device. Participants with ages between 60 and 77 years interacted with the applications, executing all the tasks. Preliminary, the authors showed that, in mean, the tasks were performed more quickly in EG II, comparing with EG I. That is why the users also obtained a better score, suggesting that the improvements in the EG II interface were satisfactory. According to the authors, the work also shows that the technique of selection adopted for the games is tiring and not proper to the elderly public -once that was needed to use the movement of both arms to move the cursor across the screen. Fang et al. (2015) present a game for training of balance in elderly, and seek to verify the experience positive or negative of the user in relation to the environment developed. Therefore, the authors used the EFS (Evergreen Fitness System), a prototype game that adds six exercises selected by health specialists with the goal to train the balance and strengthen the elderly lower limbs. This study determined that the elderly participants (with ages between 60 and 80 years) appreciated the exercises based in game and showed a positive experience using the EFS. Tests provided important feedbacks about the improvement of the conception of the system, about the adequacy of the six exercises, of the system operation, game design and demonstrated the willing to use. They also veri ed that the system might include the navigation requiring less learning, corrective feedback and timely warnings when idle. Harrington et al. (2015) approach the usability challenges of the device Kinect based in exergames for elderly, pointing out which aspects of these programs are of di cult assimilation by elderly people. Tests with ten elderly with ages between 60 and 69 years and ten elderly with ages between 70 and 79 years used two prototypes of games that stimulate physical activities. The research showed the satisfaction of the participants, which admitted that the exergames are bene cial to health and are useful for the incentive to the practice of exercises. However, the approach also showed that there are usability problems in these applications, conforming the age advances. Most of participants in the group 60-69 years old agreed that the interface is user-friendly, while most people of the group 70-79 years old disagreed about the ease to use. Palacio et al. (2017) evaluate the usability perception of elderly about the use of games with di erent control devices. Twenty-four elderly participated in the study (12 women, 12 men; mean of 69 years old) and eight children (mean of 8 years old). The con guration included two Kinect motion sensors, three computers, three projectors, three video cameras, two audio devices, the games Angry Birds and Happy Sky, one Xbox 360, one device Nintendo Wii, and one touch screen. Equipment was con gured to play in pairs, being one elderly and one child. The data were collected according to the user's experience and social interaction, through individual and group interviews. In the interviews, questions were used to evaluate the perception and the apprehension of the participants while playing. During the tests, the authors evaluated the characteristics of e ciency and number of errors. This study suggested that the game devices for the elderly might be adapted to balance their functional, sensorial and cognitive limitations.

Related Work
Considering the related work, we noted that the authors do not apply an evaluation method to the elderly particularities. Since these studies are recent researches, we understand that o er a suitable evaluation process is interesting for designers and developers to comprehend and attend the particularities of elderly, providing better experiences to them. In addition, the systematic review presented by Simor et al. (2016) points to similar results, showing the lack of human-computer interface evaluation mechanisms designed speci cally for the elderly users.

Methodology
Our approach consists in apply the methodology proposed by Simor (2016), making some adjustments to apply it in 3DUI context: use a VR game and new questionnaires.
We de ned three conventional stages: rstly, select candidates and collect important information according to the aim's research (pre-test); secondly, execute tests with participants using a 3DUI and collect performance data (test); and thirdly, evaluate the 3DUI for elderly, considering user preferences by questionnaires and/or interviews, and user performance and system performance data (post-test). In all these stages, data could be collected using registers in paper, software, audio or video.
It is important to highlight that this work is an initial project to study if this approach is useful to evaluate 3DUI interfaces, and to identify possible changes or needs observed in the experiment. Thus, we can contribute to the design of a methodology speci cally for evaluations with the elderly subjects.
For the experiment, we used a VR serious game in two moments: rstly, a preliminary evaluation with voluntaries, in May 2016; and, after, an evaluation with the elderly, in August 2016. The local research ethics committee (protocol number CAAE 53589116.8.0000.5342) approved this project.
GDS-15 is a super cial evaluation to verify if any person has some mild degree of depression. According to Sheikh and Yesavage (1986), a depressive subject tends to provide unreliable data because there is a possibility that one's psychological state interferes in the results. Therefore, this test is useful to select participants presenting severe degrees of depression, not indicated to continue the experiment.
The questionnaire informs about the satisfaction of the participants with their life and themselves, answering always only "yes" or "no". We used a cut-o point of ≥ 5 to indicate clinically important depressive symptoms (Almeida and Almeida, 1999). MMSE is a quickly test (≤ 10min) to evaluate the cognitive function of the person.
It does not require speci c material and uses a pointscale.
Like the GDS 15, it is useful to select outlier participants, not indicated to continue the experiment. The questionnaire deals with spatial and temporal orientation, immediate and evocate memory, calculation, language naming, repetition, understanding, writing and drawing copy. We used a cut-o point of ≥ 25 to literate and ≥ 19 to illiterate elderly (Lourenço and Veras, 2006).
Our sociodemographic and background questionnaire aim to characterize the sample. The questions approach about education, physical and cognitive disabilities, and familiarity with technologies used during the tests.
ICF is a form to explain about the study that you are considering, and for getting permission before conducting an intervention on a person. It also preserves the individual integrity and the collected information only for research studies.

Test
At this stage, the participant rstly may receive and read an overview document of the experiment, explaining the aims of the test, how to use the application resources (devices and interfaces), and the description of the user tasks. This document also serves to reinforce that the study will evaluate, only and exclusively, the software and the equipment -and not the participant.
The document also informs that the user tasks need to be made naturally. Besides that, it must inform that interaction process registers would be collect by the application itself and by a software of capture of the computer's screens and lming, preserving personal image.
Each participant needs of instructions to verbalize and externalize their actions and thoughts during the interaction process, using the Think Aloud Protocol (Nielsen, 2012). This method helps the observer to perceive some user's di culties or facilities during the interaction process. After, the observer must ask to the participant if there are any remaining doubts, because questions cannot be answered during the test.
Afterwards, it is time to start the experiment with the elderly. Each participant will test each level of application using VR devices during thirty seconds. This time was de ned in order to avoid mental and physical exhaustion (Simor, 2016) -but it can be changed according to the approach. We adopted twominute time interval for rest between each level and the use of the each equipment. This time can be changed too, considering the e ort over time. I felt good, the interaction with the game and the equipment did not cause discomfort, such as motion sickness, headache, dizziness or nausea 5 I felt oriented because the equipment provided a better visual perception of the 3D space When I used the Smart TV 3D... 6 I felt comfortable during the interaction in the game, using di erent equipment 7 I felt immersed, the interaction with the game was transparent, arresting my attention 8 I felt present within the virtual scene of the game, as if I were part of it 9 I felt good, the interaction with the game and the equipment did not cause discomfort, such as motion sickness, headache, dizziness or nausea 10 I felt oriented because the equipment provided a better visual perception of the 3D space About the 3D User Interface, it allowed... 11 To use and interact easily in the game 12 Clarity on the steps to be followed to perform the tasks in the game 13 Adequate and su cient time for the execution of the tasks 14 Naturally to perform the tasks of the game, without di culties 15 Easily to visualize, interpret and understand the interactive elements of the game (visual aspects) 16 To listen and assimilate easily the sound elements of the game 17 Easily to pick up objects in 3D space 18 To have a fun experience Another questions 19 The theme of the game is associated with your age 20 Rest intervals during the experiment were su cient

Post-test
In this stage, the participant receive a usability evaluation questionnaire about the experiments. Initially, the usability evaluation contemplates questions about the visualization devices considering these aspects: comfort, immersion, presence, welfare and visual perception.
The next part of the questionnaire considers the evaluation of the 3DUI: easily in use the application, easily of the task performance, clarity about the procedure, adequate time to execute the task, quality of the visual and aural elements, connection between scene and task, adequate time interval. The questionnaire uses 5-point Likert Scale. Table 1 shows the statements. Our questionnaire considers the main usability characteristics de ned by Nielsen (1993), and use as basis the questionnaire developed by Simor (2016).
During the lling, the participant can comment about the test openly, allowing a collect of complementary information by the observer. In the end of the session, the observer thanks him/her for the participation.
The time required for the application of our protocol is less than 40 minutes. Table 2 presents the order of steps to apply it.

Experiment
In order to validate our approach, we realized an pilot study evaluation using an exergame named Motion Rehab AVE 3D (Trombetta et al., 2017). According to Fiorin et al. (2014), it is a software to help health professionals in activities of motor and cognitive rehabilitation of elderly. Fig. 1 shows the game interface, rst-person (left) and third-person (right). To interact, the user can wear a head-mounted display or use a Smart TV 3D, plus a Kinect motion sensor.
For this evaluation, each participant interacted twice in the same level of the game, using two di erent visualization devices: an Oculus Rift DK 1 (HMD), and a Smart 3D TV 46" (TV). The goal was to evaluate the usability di erence between the experiments with a group of people using the same game scenario on two di erent display devices. We also considered the methodology used in this study.
We determined to select a small group of elderly volunteer subjects, balanced by gender. Based on Benyon (2014), the reduced number of participants is justi ed because the evaluation is destined to a group relatively homogeneous. As inclusion criterion, the individuals must be literate, with 60 years old or more, without cognitive or motor commitment and without severe depressive symptoms. All the participants received orientations about the goals of the research and signed the ICF.

Subjects
For the experiment, we obtained the voluntary adhesion of twenty subjects (60+ years) of a center of reference and attention to the elderly (CREATI). This center o ers programs and services to the elderly, with varied educational, physical, technical, mental, cultural, social, civic and a ective activities.
In order to balance the comparison we de ned that half of the participants (ten subjects) would test the game using the rst-person version, and the other half (ten subjects) the third-person version. In both cases, the two display devices were used (Oculus Rift and Smart TV 3D).
We also de ned the counterbalance of the participants to use the visualization devices. For each version tested of the game, half of the participants of each group ( ve subjects) tested rstly with the HMD, and secondly with the TV, and the other half ( ve subjects) inversely: TV and HMD. Table 3 shows the distribution of groups.
In this context, we described the following groups for future analysis: • HMD1 and HMD2: participants using the HMD, testing it as the rst or second device during the experiment, respectively; • TV1 and TV2: participants using the TV, testing it as the rst or second device during the experiment, respectively; • HMDTV: participants using the HMD rstly, and the TV secondly; • TVHMD: participants using the TV rstly, and the HMD secondly.
For the statistical analysis, we applied the Shapiro-Wilk test to verify the normality, and the Mann-Whitney U to compare the samples and test the hypotheses.

Task
The task considers exercises of the game level 1. They encourage the use of the upper and lower limbs and memorization of the objects. During the interaction process, the subject remains standing. Fig. 2 illustrates the use of game with the two devices.

Test Environment
Material and con gurations, as well as the instructions for a good progress of the experiment are described in Table 4.

Results
The experiment was attend by 23 participants. From the sociodemographic and background data, we noted that three users did not meet the inclusion criteria, and their results were excluded. Therefore, the sample analyzed considered 20 participants with ages between 60 and 81 years old, 16 females and 4 males. Of these, 25% ( ve subjects) already have played some computer game and 10% (two subjects) have known one of the devices used in the test.
A ten people group rstly played the game in rst- person version, and secondly in third-person; and the other ten inversely. Within each group, half of the participants ( ve) started wearing the HMD and after using the TV; and the other half inversely too. We elaborated four alternatives hypotheses to analyze the results: • There is di erence between using the HMD and the TV to play the Motion Rehab AVE 3D (HA, General); • There is di erence between using the HMD in the rst experience, and using the HMD in the second experience (HB, HMD1 x HMD2); • There is di erence between using the TV in the rst experience, and using the TV in the second experience (HC, TV1 x TV2); • There is usability di erence between the groups of the experiment (HD, HMDTV x TVHMD).
The following subsections present the results of the statistical analysis, posteriorly considered in the Discussion Section.

Results independent of version for Motion Rehab AVE 3D
The tests applied to evaluate HA, HB and HC did not present results statistically signi cant, rejecting the alternative hypotheses (Table 5). The evaluation did not consider the game version ( rst-or third-person).
For the HA (U-critical value = 127), the comparison considered the 20 participants. For the HB and HC hypotheses (U-critical value = 23), the comparison considered the order of the devices used (between groups of 10 participants). Table 6 presents the results of the HD hypothesis (U-critical value = 23), checking for some usability di erence at the evaluated game. It considers elements about the 3DUI interface (e.g. visual and aural feedbacks and game theme) using di erent devices. The tests did not present statistically signi cant results. The evaluation compared HMDTV and TVHMD groups of 10 participants.

Results for Motion Rehab AVE 3D -First-Person Version
For the rst-person version, the analysis did not point to statically signi cant results, shown in Table 7. For the HA (U-critical value = 23), the comparison considered 10 participants. For the HB and HC hypotheses (U-critical value = 2), the comparison considered the device used during rstperson experiences (groups of 5 participants). Table 8 presents the results of the HD hypothesis (U-critical value = 2), checking for some usability di erence at the evaluated game during the rstperson experience -without statistically signi cant results. It is essential to calibrate the Kinect sensor and the Oculus Rift for each user, in order to provide a natural and enjoyable interaction. Illumination To recognize the user movements correctly, the local must have a good illumination. Chair It must be close to the participant for rest and support, at least 1.2m from the motion sensor. Tables  It is necessary one table to

Results for Motion Rehab AVE 3D -Third-Person Version
The analysis also did not suggest statistically signi cant results for third-person version ( Table 9). The methodology analysis is the same of the Section 5.2, and the U-critical values remain unaltered. Table 10 presents the results of the HD hypothesis (U-critical value = 2). In the same way that the previous analyzes, the results are not statistically signi cant.

Discussion
Next sections discuss the results about our methodology initial proposal (Section 6.1) and about the usability of the Motion Rehab AVE 3D (Section 6.2).

Methodology Evaluation
As stated in Section 5, there are not statistically signi cant results for any hypothesis presented. These results demonstrate that, for elderly, both the devices (HMD Oculus Rift and Smart TV 3D) can o er sense of comfort and well-being during the interaction process. The use of di erent devices does not interfered in the ease of use, in the ease of execute the task, in the procedures' clarity, in the time to rest, and during the user interface interaction. Palacio et al. (2017) also report for this panorama, indicating that elderly might not percept any di cult during their rst experience with new equipment. They take more time to adapt to new technologies.
In related work (Section 2), the authors used metrics as speed to execute the task and ranking points , ease of use (Harrington et al., 2015), and accuracy (Palacio et al., 2017) to evaluate the system usability. All these metrics refer to the task performance and user preference during the use of system, but the authors do not determinate a default evaluation method to these requirements for elderly. For this reason, we propose a protocol to consider these and other usability metrics noted by Nielsen (2012).
According to the tests' observers, they perceived that the elderly were not comfortable in a 3D space because they are not accustomed with this technology. This indicate that our protocol needs a training moment, before each experiment, in order to familiarize users with the equipment. It is important to o er an adaptation period, because the elderly can have a critical view about the experiment during the evaluation process. This behavior can contribute to improve the user task performance too. Another situation that highlights the need of a training session is that, to the large majority of the participants, the game was not intuitive (although its simplicity). The elderly do not understand the game in the rst contact with the interface and they request tips about how to interact with it.
In particular, 40% (eight elderly) of the participants commented that, if they could test the game before using it, they could have better performance during the test. This happens because of the lack of intimacy with the devices, and the lack of practice and access for this type of technology (Carvalho and Ishitani, 2013). In addition, a training session can make the new users have a gradual contact and curiosity with the equipment, avoiding cybersickness and some discomfort like dizziness and tiredness (cited by some participants).
During the experiments, we perceived that it is possible to understand better the feedback of this kind of public through of interviews. Palacio et al. (2017) also used interviews (collective and individual) in order to obtain data, because this instrument allows social interaction to identify the lived experiences of the users.
We also identi ed that the use of Likert Scale for the elderly is complicated, because it is needed to guide how they could evaluate considering the points of scale. In this context, it is very important to detail each statement in a simple, straightforward way, in order to keep the participant focused. A glossary can also be o ered, containing expressions and less familiar words to this age group (e.g., immersion, virtual reality, interface) to easily understand the context. Revisiting the related work, we noted that no speci c questionnaire for elderly is used.
During the experiment, we observed that the abstraction of terms as game, 3D and virtual environment, is complicated to the elderly. They were simply lost when they read expressions like "I felt immersed", "3D user interface", "steps to be followed in order to execute game tasks", "interactive game elements" and "I felt oriented because the equipment provided a better visual perception of the 3D space".
These expressions sound easy in the entertainment context.
On the other hand, for elderly is important to take simple approaches in questionnaires and interviews. Explicitly, only two participants commented that the post-test questionnaire and Likert-type Scale are hard to understand. However, the observer indicated that 60% (12 participants) had di culties in the comprehension of the questionnaire.
To solve this problem, we suggested the use of a text description for each point of the scale and the use of a glossary about the terms of the game. Another possibility to improve the descriptions of the questionnaire is to present them before the experiment. So, doubts about words and expressions could be explained by the researches verbally or using videos.
In our experiment, none user was excluded by GDS-15 and MMSE, becoming questionable the importance of these instruments in Pre-test stage. However, we considered the use of them, because our sample were a homogeneous group of healthy elderly, participating of speci c programs for their age. Anyway, it is important to validate the importance of MMSE and GDS-15 in this protocol with a heterogeneous group. Related work no elucidated the use of these kind of instruments.
For this reason, we suggest for future evaluations that the experiments must realize in two sessions: a session only to Pre-test stage and training, and another exclusive to Test and Post-test stages.

Usability Evaluation
Although we have been identi ed points to improve the proposed protocol, the process also allowed analyzing the quality of the game, considering the usability evaluation of the 3DUI for two di erent visualization devices.
We identi ed that 20% (four participants) presented some di culties in terms of spatial orientation. They did not capture the objects proposed during the task because they did not perceive that it was necessary to open their arms in a slightly larger angle (physically) to reach the objects. Another 20% (four participants) presented this same di cult in the beginning of the interaction process -but, along of the experiment they perceived what should be done and they were able to execute the test normally.
The tests also showed, subjectively, considering the user opinion and the observations of the observer, that visual and aural feedbacks must be more intuitive to help in the user understanding and cause more immersion. Moreover, also subjectively, we viewed that the people that tested the game in rst-person had more di culty using the TV and more facility using the HMD, whilst those who tested in third-person had more facility using the TV than using the HMD.
Regarding our observations, 25% ( ve participants)  presented the best performance (hit everything) during the task, choosing all the selectable objects and ignoring the distractors objects. On the other hand, 20% (four participants) got terrible performance (missed everything), not being able to hit any of the selectable objects.
However, independently of the users' task performance, all the participants demonstrated very interest in exergames and technologies projected for elderly. They classi ed the experience as interesting and bene cial to the age.
In addition, although it is not part of the protocol, we suggested collect user performance data (e.g. hits, errors, total time) in order to contribute for the protocol validation. We believed that this data could reduce the subjectivity of the user satisfaction, considering situations as feeling of immersion, spatial orientation and 3DUI quality.

Conclusion
The evolution of our approach in a speci c evaluation methodology for 3 DUI can be useful for the creation of new VR solutions, like games or simulators for elderly, exclusively. We also understood, because of the results of our experiment, that this kind of approach guarantee that the interface at least meets the accessibility needs of this age group, once that became explicit the elderly user experience with the interface. Researchers and developers also may direct e orts to improve the quality of the technology, and o er more comfort, welfare and satisfaction to the user. In addition, a speci c evaluation methodology for 3DUI used only by elderly may be useful to evaluate new projects of same purpose, either academic or professional.
We still pretend to do the suggestions indicated in our protocol and evaluate the necessity of the GDS-15 and MMSE instruments. Regardless of these two instruments, it is necessary an evaluation protocol di erentiated to elderly for age-related issues, as speed of understanding of technological terms and use of more recent equipment (HMD, motion sensors, smartphones, etc.).
It is important to highlight that, at the end of each experiment, all participants classi ed verbally as a great and wonderful experience for the age. In order to identify elderly users' preferences, we suggested that the inclusion of the interview in Post-test stage.
We concluded that this kind of experiment is not exhausting to the elderly and it can trigger the curiosity for technological news. Even without understanding or being able to play and achieve a good performance, the participants showed interest to use games for fun, for physical and mental exercise, and for a healthy lifestyle. As future work, we intended to validate the protocol with di erent elderly groups, in order to present a nal instrument for the academic and professional communities.