Volume 38, Issue 3 , Pages 390-400, September 2009
The Brief Fatigue Inventory: Comparison of Data Collection Using a Novel Audio Device with Conventional Paper Questionnaire
Article Outline
- Abstract
- Introduction
- Methods
- Results
- Participant Characteristics
- Global BFI Scores (Mean of 9 Items, Maximum Possible Value=10)
- Global Scores at First and Second Administrations of the BFI
- Mean Fatigue Severity Scores (Mean of BFI Items 1–3, Maximum Possible Value=10)
- Mean Fatigue Interference Scores (Mean of BFI Items 4–9, Maximum Possible Value=10)
- Fatigue Severity Classification (Severe/Nonsevere)
- Agreement Between Fatigue Severity Categories
- Time Taken to Complete Paper and EPIC-Vox Audio Forms of the BFI
- Evaluation Questionnaire
- Discussion
- Acknowledgments
- References
- Copyright
Abstract
An Electronic Portable Information Collection audio device (EPIC-Vox) has been developed to deliver questionnaires in spoken word format via headphones. Patients respond by pressing buttons on the device. The aims of this study were to determine limits of agreement between, and test-retest reliability of audio (A) and paper (P) versions of the Brief Fatigue Inventory (BFI). Two hundred sixty outpatients (204 male, mean age 55.7 years) attending a sleep disorders clinic were allocated to four groups using block randomization. All completed the BFI twice, separated by a one-minute distracter task. Half the patients completed paper and audio versions, then an evaluation questionnaire. The remainder completed either paper or audio versions to compare test-retest reliability. BFI global scores were analyzed using Bland-Altman methodology. Agreement between categorical fatigue severity scores was determined using Cohen's kappa. The mean (SD) difference between paper and audio scores was −0.04 (0.48). The limits of agreement (mean difference
±
2SD) were −0.93 to +1.00. Test-retest reliability of the paper BFI showed a mean (SD) difference of 0.17 (0.32) between first and second presentations (limits −0.46 to +0.81). For audio, the mean (SD) difference was 0.17 (0.48) (limits −0.79 to +1.14). For agreement between categorical scores, Cohen's kappa
=
0.73 for P and A, 0.67 (P at test and retest) and 0.87 (A at test and retest). Evaluation preferences (n
=
128): 36.7% audio; 18.0% paper; and 45.3% no preference. A total of 99.2% found EPIC-Vox “easy to use.” These data demonstrate that the English audio version of the BFI provides an acceptable alternative to the paper questionnaire.
Key Words: Assessment, Brief Fatigue Inventory (BFI), data collection, electronic data capture, fatigue, health literacy, literacy, patient acceptance, reproducibility, questionnaires
Introduction
The Brief Fatigue Inventory (BFI) is a self-administered questionnaire developed to assess fatigue in cancer patients.1 It has been used with a variety of patient groups and in clinical trials.2 It is recommended that assessment tools for self-reported health care data should be “appropriate, valid, and reliable for the group concerned.”3 However, conventional paper questionnaires in English are inappropriate for patients with poor language or literacy skills. Sixteen percent of adults in England aged 16–65 years have a reading age of 11 years or below.4 Patients with poor language or literacy skills are excluded from conventional self-reported health assessment and from participation in research requiring self-administered questionnaires.5 The exclusion of such patients may bias research findings. The paper BFI has been validated in languages other than English, but literacy skills are still required. Some South Asian languages have no written form3 and so it is impossible to provide conventional health-assessment questionnaires in these languages. To address these problems, “innovative techniques … for collecting and evaluating health outcomes data” are required.6 Following a survey of Asian language and communications, the Communication and Survey Policy Studies Institute7 recommended the use of audio media for collecting data from ethnic communities.
In response, an Electronic Portable Information Collection audio device (EPIC-Vox) has been developed to present questionnaires in audio format. Patients listen to prerecorded questions through headphones and respond by pressing buttons on the device. Data are stored in an internal memory, with date and time of collection and patient identification code. Download to a computer database uses Universal Serial Bus key technology, enabling clinicians to retrieve data on any personal computer. A Personal Identification Number ensures data security. EPIC-Vox has been used successfully with the Brief Pain Inventory (BPI)8 and to assess depression in people with poor language and literacy skills.9
Electronic data collection (EDC) can eliminate missing, illegible, or invalid data and transcription errors and record time and date. Audio Computer-Assisted Self-Interviewing (ACASI) allows respondents to see and hear questions simultaneously and respond via a computer keyboard10 or touch screen.11 However, it uses expensive hardware and software and may require computer literate participants. EPIC-Vox performs a similar function but is a low power, portable, handheld device, which could be produced in volume at lower cost, requiring no computer skills or lengthy operating system booting.
Studies comparing data collected by ACASI and conventional survey methods show increased reporting of sensitive behaviors due to the confidentiality associated with ACASI.12 However, when nonsensitive health-related data are collected using validated questionnaires, it is essential that data are not affected by the method of collection. Several studies13, 14, 15, 16 have assessed the equivalence of data collected by electronic and paper methods. Measures of data concordance included the intraclass correlation coefficient (ICC), Pearson correlation coefficient, and proportion of responses in agreement.
With the exception of ACASI, EDC devices have provided questionnaires in visual format. Comparative studies have used tests of association to measure consistency between, and repeatability of, electronic and conventional data collection methods. Bland and Altman caution against using Pearson's correlation coefficient17 and ICC18 to determine agreement between two measurement methods or repeatability over time. Their method of determining agreement between two measurement methods17 is widely cited and appropriate for the present study. The primary aim was to determine limits of agreement between data collected using (1) the audio form of the English BFI via EPIC-Vox and (2) the paper version of the questionnaire. The secondary aim was to determine test-retest reliability of each method.
Methods
The study was approved by the local research ethics committee. Permission to use the BFI was granted by the copyright holders.
Sample Size
Sample size was calculated using Bland-Altman's “agreement between two methods of measurement” analysis.17 A difference of ±1 on the BFI rating scale was taken to indicate “agreement” between the paper and audio versions. To determine the 95% limits of agreement, the following formula was used:

Participants and Setting
Participants (≥18 years) were outpatients with sleep apnea, attending Leicester Sleep Disorders Service for annual review. Invitation letters were sent to patients two weeks before their appointments. Participants were of any ethnic origin but fluent in English. Those with illiteracy, learning difficulties, impaired fine motor control, or uncorrected visual or hearing impairments were excluded. Invitation letters continued to be sent out each week until 260 patients completed the study.
Materials
The BFI comprises nine items printed on an A4 sheet. The first three, assessing fatigue severity, require respondents to rate their fatigue “now” and “usual” and “worst” levels of fatigue during the past 24 hours. Responses are made on an integer scale of 0 (no fatigue) to 10 (“fatigue as bad as you can imagine”). The remaining six items assess the extent to which fatigue has interfered with everyday life during the past 24 hours, in terms of general activity, mood, walking ability, normal work, relations with other people, and enjoyment of life. For these items, 0 means “does not interfere” and 10 means “completely interferes.”
EPIC-Vox (Fig. 1) is a hand held, 9v battery-operated device, 14
cm
×
8
cm
×
3
cm. Buttons on the top of the device are numbered from 0 to 10 and there is a start/repeat question button. The BFI was recorded by a female voice.
Distracter Task
Distracter tasks are commonly used in studies to disrupt short-term memory.19 In the present study, the aim was (1) to deter respondents from remembering responses to the first presentation of the BFI and (2) to standardize the interval between first and second presentations. A coding task was used, consisting of an A4 sheet of paper at the top of which were printed the numbers one to nine, with a corresponding letter below each number. On the remainder of the sheet, nine letters were printed in a fixed random order in horizontal rows of 10, with an answer box below each number. Participants were asked to refer to the key and then write the appropriate corresponding number below each letter, working as quickly as possible, for one minute. This task, a modified version of the Symbol-Digit Substitution Task from the Wechsler Adult Intelligence Scale,20 demands concentration and attention.
EPIC-Vox Evaluation Questionnaire
Patients who completed both paper and electronic versions of the BFI also completed an evaluation questionnaire asking about ease of use of EPIC-Vox and preference for audio or paper.
Design
Patients were allocated to one of four groups using block randomization techniques.21 Using a counterbalanced crossover design, Groups 1 and 2 completed paper and audio versions to determine agreement between the two data collection methods. Groups 3 and 4 completed either paper or audio versions to compare test-retest reliability.
Procedure
Patients were approached by a researcher on their arrival at clinic. Those who gave written, informed consent participated in the study after their consultations. Each patient, seated at a table in a quiet room with the researcher present, completed the BFI twice, separated by the distracter task. The order of presentation of the two forms of the BFI was as follows: Group 1: Paper (P)-Audio (A); Group 2: A-P; Group 3: P-P; Group 4: A-A.
The time taken to complete the paper BFI was recorded and completion time for the audio version was logged by EPIC-Vox. Patients in Groups 1 and 2 also completed the evaluation questionnaire.
Statistical Analysis
Statistical analyses were performed using SPSS version 14.0 (SPSS Inc., Chicago, IL, USA) and all P-values were two-tailed. “Global” fatigue scores (mean of all nine response scores) and categorical “fatigue severity” were calculated according to the methods used for validating the BFI.1 Fatigue severity is the score from Item 3 only, that is, “fatigue worst;” a score of 7–10 is “severe” and 0–6, “not severe.” Validation studies of the BFI in other languages22, 23, 24, 25 reported composite outcome scores in addition to global and “fatigue worst” scores. “Composite fatigue severity” is the mean score from the three severity items and “composite fatigue interference” is the mean score from the six interference items. Based on these reported outcomes, the present study used the following measures for data analyses: (1) global BFI score (mean of all nine BFI items); (2) mean fatigue severity score (mean of first three BFI items); (3) mean fatigue interference score (mean of BFI Items 4–9); and (4) categorical fatigue severity (severe/not severe) based on response to BFI Item 3 only.
Using the methods of Bland and Altman,17 limits of agreement for paper and audio versions of the BFI were determined. Differences between each pair of scores were plotted against their means and limits of agreement were estimated from the mean difference ±2 SD of the differences. Ninety-five percent of the differences between measurement methods can be expected to occur between the upper and lower limits. Similar analyses were performed to compare test-retest reliability of the two BFI formats. Normality checks revealed that mean difference data were not normally distributed. Bland and Altman advise logarithmic transformation of such data; consequently log10 transforms were performed. The transformed data were still not normally distributed and on the advice of a medical statistician, untransformed data were used. Agreement between audio and paper categorical fatigue severity scores was determined using Cohen's kappa statistic.26
Results
Participant Characteristics
Invitation letters were mailed weekly until 260 patients (204 [78.5%] male; mean age 55.7 years, range 28–81 years) had completed the study. To meet this target, invitation letters were sent to 435 patients (353 [81.2%] male; mean age 55.1 years, range 23–85 years) who were due to attend the clinic for annual review. Forty percent (175 of 435) of patients who were sent invitation letters did not take part in the study. Of the 175 nonparticipants, 83 (47.4%) did not participate because they either canceled their appointments or failed to attend. A further 69 of 175 (39.4%) patients declined to participate; approximately two-thirds (n
=
48) of this group refused due to lack of time (often because of car-parking restrictions), 19 were not interested or gave no reason, and two said they were too nervous to participate. Fourteen of the 175 nonparticipants (8%) were not approached as the researcher was unavailable and nine (5%) met at least one exclusion criterion.
The gender distributions of participants (204 of 260 [78.5%] male) and nonparticipants (149 of 175 [85.1%] male) were not significantly different, χ2
=
2.63 (df
=
1), P
=
0.105. The mean (SD) age of both participants and nonparticipants was 5510 years.
There were two extreme outliers in Group 1(P-A), one of whom had a total BFI score of 22 at first presentation and 55 at the second, whereas the other scored 36 and 67. Eight of the nine BFI items relate to the past 24 hours and the other is a rating of “fatigue now.” Therefore, because the two presentations were separated by only a minute, it was concluded that in both cases the data were unreliable. Consequently, the data from these two patients were excluded from further analyses.
Table 1 gives the age and gender distribution of participants by group. One-way between groups analysis of variance (ANOVA) showed that there were no significant differences between the groups with regard to age, F(3, 254)
=
2.06, P
=
0.11. A Chi-square test showed that there was no association between group and gender: χ2 (3, n
=
258)
=
3.95, P
=
0.26. There was an imbalance in the gender distribution of participants (203 of 258 [78.7%] male). A Chi-square test for goodness of fit showed that this difference was significant, χ2
=
84.9 (df
=
1), P
=
0.000.
Table 1. Age and Gender of Participants by Group (n
=
258)
| Characteristic | Group 1 (P-A) (n | Group 2 (A-P) (n | Group 3 (P-P) (n | Group 4 (A-A) (n | All (n |
|---|---|---|---|---|---|
| Age in years (SD) | 57.9 (11.3) | 53.4 (10.0) | 55.4 (9.5) | 56.0 (10.7) | 55.7 (10.4) |
| Male (%) | 49 (77.8) | 54 (83.1) | 54 (83.1) | 46 (70.8) | 203 (78.7) |
Global BFI Scores (Mean of 9 Items, Maximum Possible Value
=
10)
No responses were omitted from either the paper or audio versions of the BFI.
For paper and audio global scores (Fig. 2a), the mean difference (SD) was 0.04 (0.48), with an upper limit of 1.00 (95% confidence interval [CI] from 0.85 to 1.15) and lower limit of −0.93 (95% CI from −1.07 to −0.78). For paper at test and retest (Fig. 2b), the mean difference (SD) was 0.17 (0.32), with an upper limit of 0.81 (95% CI from 0.67 to 0.95) and lower limit of −0.46 (95% CI from −0.60 to −0.32). For audio at test and retest (Fig. 2c), the mean difference (SD) was 0.17 (0.48), with an upper limit of 1.14 (95% CI from 0.93 to 1.35) and lower limit of −0.79 (95% CI from −1.00 to −0.58).

Fig. 2
Limits of agreement between BFI global scores collected by (a) paper and audio (n
=
128); (b) paper (test and retest) (n
=
65); and (c) audio (test and retest) (n
=
65).
Global Scores at First and Second Administrations of the BFI
Table 2 shows scores at the first and second administrations of the BFI by group. “Mixed between-within subjects” ANOVA27 was used to compare mean global scores at first and second administrations. This showed a statistically significant main effect for time, Wilks' Lambda
=
0.88, F (2, 254)
=
34.32, P
<
0.0005, but no significant effect for group and no interaction between group and time. This indicates that mean global scores were significantly lower at the second administration of the BFI than at the first, regardless of paper or audio format.
Table 2. Mean Global Scores at First and Second Administrations of the BFI (n
=
258)
| Group | n | Mean (SD) | Mean (SD) |
|---|---|---|---|
| Global Score at t1 | Global Score at t2 | ||
| Paper audio | 63 | 3.10 (2.37) | 2.92 (2.37)a |
| Audio paper | 65 | 3.16 (2.21) | 3.05 (2.35)a |
| Paper paper | 65 | 3.04 (1.95) | 2.86 (2.01)a |
| Audio audio | 65 | 3.03 (2.63) | 2.86 (2.73)a |
| All participants | 258 | 3.08 (2.29) | 2.92 (2.37)a |
aDifference significant at P |
Mean Fatigue Severity Scores (Mean of BFI Items 1–3, Maximum Possible Value
=
10)
Table 3 shows the mean differences and limits of agreement for mean fatigue severity scores for paper and audio forms of the BFI, in addition to test-retest data for each version.
Table 3. Limits of Agreement for Mean Fatigue Severity Scoresa (n
=
258)
| Group | n | Mean Difference (SD) | Upper Limit (95% CI) | Lower Limit (95% CI) |
|---|---|---|---|---|
| Paper audio/audio paper | 128 | 0.05 (.96) | 1.98 (1.69 to 2.27) | −1.88 (−2.17 to −1.59) |
| Paper (test–retest) | 65 | 0.25 (.61) | 1.46 (1.20 to 1.72) | −0.97 (−1.23 to −0.71) |
| Audio (test–retest) | 65 | 0.26 (.84) | 1.93 (1.57 to 2.29) | −1.41 (−1.77 to −1.05) |
aMean severity scores from BFI Items 1–3. |
Mean Fatigue Interference Scores (Mean of BFI Items 4–9, Maximum Possible Value
=
10)
Table 4 shows the mean differences and limits of agreement for mean fatigue interference scores for paper and audio versions of the BFI, also test-retest data for each version.
Table 4. Limits of Agreement for Mean Fatigue Interference Scoresa
| Group | n | Mean Difference (SD) | Upper Limit (95% CI) | Lower Limit (95% CI) |
|---|---|---|---|---|
| Paper audio/audio paper | 128 | 0.03 (0.55) | 1.14 (0.97–1.31) | −1.08 (−1.25 to −0.91) |
| Paper (test–retest) | 65 | 0.14 (0.35) | 0.84 (0.69–0.99) | −0.56 (−0.71 to −0.41) |
| Audio (test–retest) | 65 | 0.13 (0.48) | 1.10 (0.90–1.31) | −0.84 (−0.63 to −1.05) |
aMean score from BFI Items 4–9. |
Fatigue Severity Classification (Severe/Nonsevere)
Fatigue severity is reported in the BFI literature in two ways. In some papers, it is based on the response to Item 3, “Please rate your fatigue (weariness, tiredness) by circling the one number that best describes your WORST level of fatigue during past 24 hours,” only. The score from Item 3 can be classified as severe fatigue7, 8, 9, 10 or not severe fatigue (0–6). Other studies have reported a composite severity score, which is the mean of Items 1–3 (see Methods). Based on the response to Item 3 only, 93 of 258 (36%) participants in the present study were classified as having severe fatigue.
Agreement Between Fatigue Severity Categories
For this analysis, a score of 7–10 on Item 3 only of the BFI was taken to indicate “severe” fatigue, with 0–6 indicating “not severe” fatigue. The extent to which these fatigue severity categories agree between paper and audio, and each of the versions at test and retest, are shown by the Cohen's kappa statistic26 in Table 5. This statistic takes into account the probability of agreement between binary data occurring by chance. A Cohen's kappa value of 1 indicates perfect agreement, 0 indicates chance agreement, and a value of −1 indicates complete disagreement. Values of 0.61–0.80 can be interpreted as indicating “substantial agreement” and 0.81–0.99, “nearly perfect agreement.”28
Table 5. Agreement between Fatigue Severity Categories (Severe Fatigue vs. Nonsevere Fatigue)a (n
=
258)
| Group | n | Cohen's Kappa |
|---|---|---|
| Paper audio/audio paper | 128 | 0.73 |
| Paper (test–retest) | 65 | 0.67 |
| Audio (test–retest) | 65 | 0.87 |
aSevere fatigue |
Time Taken to Complete Paper and EPIC-Vox Audio Forms of the BFI
Table 6 shows the time taken to complete each version of the BFI. Groups 1 and 2 completed each version once, whereas Group 3 completed the paper BFI twice and Group 4 completed the audio version twice. For Group 3 patients (paper test-retest), the mean times taken to complete the BFI were 92.6 seconds at the first administration and 48.0 seconds at the second. A paired t-test showed this time difference to be significant: t
=
14.7 (64), P
<
0.005. For Group 4 (A-A), the mean times taken to complete the BFI were 211.2 seconds at the first presentation and 209.5 seconds at the second. There was no significant difference in time taken to complete the two audio administrations (n
=
65).
Table 6. Time in Seconds to Complete Paper and Audio Versions of the BFI
| BFI Version | n | Minimum | Maximum | Mean | SD |
|---|---|---|---|---|---|
| Paper (Groups 1 and 2) | 128 | 30 | 320 | 91.9 | 47.4 |
| Audio (Groups 1 and 2) | 128 | 115 | 248 | 204.0 | 11.1 |
| Paper (Group 3 first test) | 65 | 50 | 210 | 92.6a | 29.2 |
| Paper (Group 3 retest) | 65 | 10 | 105 | 48.0a | 20.0 |
| Audio (Group 4 first test) | 65 | 199 | 273 | 211.2 | 9.6 |
| Audio (Group 4 retest) | 65 | 202 | 253 | 209.5 | 8.9 |
aDifference significant at P |
The mean (SD) time taken to complete the audio version of the BFI was 207.2 (10.7) seconds (n
=
258). The mean (SD) time taken to complete the paper BFI was 81.0 (42.3) seconds (n
=
258). An independent t-test showed that there was a significant difference between the mean times taken to complete paper and audio forms of the BFI, t
=
46.4, P
=
0.000.
Evaluation Questionnaire
Patients who completed both paper and audio forms of the BFI (n
=
128) also completed an evaluation questionnaire. In response to the question “Did you find the EPIC-Vox easy to use?,” 127 of 128 (99.2%) participants said that they did. Respondents read five statements relating to ease of use of EPIC-Vox and were asked to indicate whether or not they agreed with the statements. All but one respondent agreed that “it was easy to listen to the questions” and 118 of 128 (92.2%) agreed that “it was easy to press the buttons.” One hundred twenty-two (95.3%) agreed that “the instructions were easy to understand” and 89 of 128 (69.5%) agreed that “it didn't take long.” Only 44 of 128 (34.4%) agreed with the statement “I like using electronic gadgets.”
Participants were also asked which version of the BFI they preferred using. Forty-seven respondents (36.7%) preferred audio, 23 (18.0%) preferred paper, and 58 (45.3%) had no preference. Thirty-nine respondents added an optional comment to explain their preference. Of these, 12 said that EPIC-Vox made it easier to understand or concentrate on the BFI. Five commented on the advantages of EPIC-Vox for people who had forgotten their reading glasses or had literacy problems. Five said that the audio version was clear, pleasant, more personal, comforting, or friendly. In contrast, 10 respondents who preferred the paper version said that it gave more time for a considered answer and four patients said that the audio version took too long. Three patients commented that they found the buttons on the device either too small or too close together.
Discussion
This study has demonstrated that data collected using the BFI in English spoken word format are in good agreement with those collected using the conventional paper-and-pencil questionnaire. The upper limit of agreement of 1.00 (95% CI
=
0.85 to 1.15) and lower limit of −0.93 (95% CI
=
−1.07 to −0.78) indicate that 95% of BFI global scores collected by paper and audio methods can be expected to be within ±1. The secondary aims of the study were to determine test-retest reliability of both data collection methods. We showed excellent agreement between global scores at test and retest with the paper BFI, as demonstrated by the upper limit of 0.81 and lower limit of −0.46. Repeatability in the audio version was good, with limits of agreement of 1.14 and −0.79 for global scores on repeated testing. The main strengths of the present study were (1) the use of a novel, handheld, audio device to administer a self-report questionnaire, with direct data transfer to a computer database or spreadsheet and (2) simultaneous measurement of test-retest reliability of both methods of data collection and agreement between the two methods.
The original validation of the BFI in English1 reported reliability in the form of internal consistency but did not assess reliability at repeated administrations. The majority of validation studies of the BFI in other languages have not reported test-retest reliability statistics either. Exceptions are the validation of the BFI in German22 and Taiwanese,23 both of which reported Pearson correlation coefficients. The first of these studies evaluated test-retest reliability in 117 chronic pain patients before and after clinic consultations and reported, for total BFI mean scores, a correlation of 0.91 between the two administrations. The second study of 439 Taiwanese cancer patients reported test-retest reliability after a three-day interval in a subset of only 12 patients. No information was given regarding the criteria used to select this sample. The authors did not report test-retest data for total BFI scores but reported correlation coefficients of 0.89 and 0.91 for fatigue severity and fatigue interference scores, respectively. We are not aware of any studies that have reported test-retest reliability of the English BFI and the present study, therefore, makes an important contribution in this field.
Previous studies comparing electronic and paper collection of data using validated health questionnaires have used the visual medium only, reported measures of association rather than agreement, and have rarely reported test-retest reliability of each measurement method. The use of Bland and Altman's alternative approach is important because correlation does not necessarily imply agreement. Also, it is important to measure test-retest reliability in conjunction with agreement, because lack of repeatability of one method can result in poor agreement between methods.
There was a high rate of nonparticipation, with 175 of 435 (40%) patients who received invitation letters not participating. The reasons for nonparticipation were, in the majority of cases, not related to the study. To avoid sending more than one invitation to a patient, those attending the clinic for annual follow-up were invited to take part in the study. However, because appointments were made a year in advance, almost half (47.4%) of nonparticipants either failed to attend or changed their appointments.
One limitation of the present study was that completion of the audio BFI took considerably longer than the paper version and this could have influenced the difference in repeatability between methods. The elapsed time between repeated administrations of individual items of the BFI was on average two minutes longer for audio than it was for paper and this could account for the higher variability in the test-retest mean differences for audio than for paper. However, the additional time requirement should be weighed against the advantages that audio assessment confers. The long-term aim is to use the device to administer questionnaires in languages other than English, particularly where no written form exists; it also allows patients who speak a language but do not read it well to participate. Although data collection may take a little longer, the time required for analysis is greatly reduced because data are downloaded directly to a computer, avoiding transcription time and errors. It is acknowledged that in some circumstances, particularly where rapid assessment of an individual is required, the need for data download may be inconvenient. However, audio questionnaires are not intended to replace conventional assessment, only to provide a reliable alternative when conventional assessment is impossible.
The use of a one-minute distracter task was intended to standardize the time between administrations of the BFI, but the difference in completion times introduced an unforeseen variable. It could be argued that the distracter task was too short to be effective. However, when determining repeatability of a fatigue measure in patients with sleep disorders, there is a trade-off between elapsed time from first to second administration and change in levels of fatigue over time. An alternative approach would have been to ask patients to complete the BFI before and after their clinic consultations. However, if sleep-disordered patients discuss their sleepy conditions during the consultation, it is possible that this focus on sleepiness may result in higher ratings for “fatigue now” after the consultation than before. In fact, there was a significant decrease in global scores at the second administration of the BFI, regardless of audio or visual modality (Table 2). It is difficult to explain this finding, but it is possible that the distracter task not only discouraged recall of previous responses, but also distracted patients from their fatigue.
Another limitation of the study was that only about one-fifth of the participants were female, which might have created bias in the audio device evaluation findings. However, the participants had sleep apnea and the gender distribution reflects that of this patient population.
With regard to fatigue severity, Mendoza et al.'s original study1 provided evidence to support the classification of fatigue severity according to the “fatigue worst” score only. Using the same classification method, we found good agreement between fatigue severity category for paper and audio and for repeated presentations of the paper form of the BFI. We found excellent categorical agreement for repeated presentations of the audio BFI via EPIC-Vox (Table 5). With regard to the prevalence of fatigue severity, the findings of the present study were comparable with those from other studies. Severe fatigue (7–10 on “fatigue worst” item) was found in 36% of the sleep-disordered patient sample compared with 35%, 34%, and 31% in the English, Japanese, and Chinese BFI validation studies with cancer patients,1, 24, 29 respectively. Also, 35% of a German sample with cancer-related pain and 45% with other chronic pain had severe fatigue.22
The evaluation questionnaire showed that only 18% of the patient sample preferred the paper to the audio BFI, a finding that is in line with other studies comparing EDC with paper. A possible limitation of these studies generally is that blinding is not possible and respondents may express a preference for EDC to please the researcher. All but one of the respondents to the evaluation questionnaire reported that EPIC-Vox was easy to use, despite the finding that only about a third of respondents actually like using electronic gadgets. This emphasizes the fact that EPIC-Vox is easily used by a wide cross-section of patients, regardless of whether they normally enjoy using electronic devices.
The preference rate of 36.7% for the electronic device appears to be rather low when compared to the findings of Gaertner et al.,30 who reported that 83% of 24 patients preferred using an electronic pain diary (palm-top computer) to a conventional paper pain questionnaire. However, there are several differences between the studies, the most important being that the present study used an audio rather than visual medium and that the long-term aims of the studies were different. The EPIC-Vox was designed so that patients had to listen to each question before they could respond. Patients may have felt that this was unnecessarily time-consuming because they could have completed the paper version more quickly; this may have contributed to the lower preference for the audio version. In the paper test-retest group, the mean time taken to complete the BFI at the first presentation was 92.6 seconds and 48.0 seconds at the second presentation. This suggests that, once respondents were familiar with the BFI, they were able to complete it much more rapidly. It was not possible to decrease the time taken to complete the audio version in the same way—as shown by the mean times taken by the audio test-retest group of 211.2 seconds at first presentation and 209.5 seconds at repeat presentation. However, participants in the present study were able to use paper and audio versions of the BFI equally well; the long-term aim is to make health-assessment questionnaires available to patients who are unable to use conventional visual versions because of language or literacy problems. The high preference for the palm-top computer also may have been attributable to the fact that it had novelty value, but the disadvantage was that some patients were unable to participate in the study because cognitive or physical limitations prevented them from using a palm-top computer.
It is the eventual aim to make the BFI, and other health-assessment questionnaires, available in spoken word format in languages other than English. However, before embarking upon this future work, it is essential to show that outcome data are not influenced by the use of an audio, rather than visual, collection method. This study has shown that there is an acceptable level of agreement between BFI data collected by audio and visual methods. Furthermore, the design, methods, and results of the present study have provided a foundation for this future work.
The M. D. Anderson Cancer Center already offers the BFI in audio format (interview or Interactive Voice Response system) as well as in conventional paper form.31 The present study (1) provides evidence that the EPIC-Vox audio form of the BFI in English is a reliable and acceptable alternative to the paper questionnaire, and (2) shows for the first time that the paper form of the BFI in English has excellent test-retest reliability.
Acknowledgments
The authors would like to thank the patients and staff of the Leicester Sleep Disorders Service.
References
- The rapid assessment of fatigue severity in cancer patients: use of the Brief Fatigue Inventory. Cancer. 1999;85:1186–1196
- . A multicenter, placebo-controlled study of modafinil augmentation in partial responders to selective serotonin reuptake inhibitors with persistent fatigue and sleepiness. J Clin Psychiatry. 2005;66:85–93
- . Self reports in research with non-English speakers. BMJ. 2003;327:352–353
- . DfES Research Report The Skills for Life Survey: A national need and impact survey of literacy, numeracy and ICT Skills. 490. The Stationery Office; 2003;
- The impact of literacy on health-related quality of life measurement and outcomes in cancer outpatients. Qual Life Res. 2007;16(3):495–507
- . Health outcomes assessment in vulnerable populations: measurement challenges and recommendations. Arch Phys Med Rehabil. 2003;84:S35–S42
- . Asian language and communications survey. London: CSPSI; 1994;
- Comparison of an electronic speaking data recorder with the short form brief pain inventory in chronic pain patients. Br J Anaesth. 2005;95:566–579
- . Development of a method of collecting questionnaire data from people with mixed language and literacy skills: a tool for use in diabetes research. Diabet Med. 2006;23:S2
- Adolescent sexual behavior, drug use, and violence: increased reporting with computer survey technology. Science. 1998;280:867–873
- The talking touchscreen: a new approach to outcomes assessment in low literacy. Psychooncology. 2004;13:86–95
- Implementation of audio computer-assisted interviewing software in HIV/AIDS research. J Assoc Nurses AIDS Care. 2007;18:51–63
- Electronic pain questionnaires: a randomized, crossover comparison with paper questionnaires for chronic pain assessment. Pain. 2004;110:310–317
- . A comparison of paper with electronic patient-completed questionnaires in a pre-operative clinic. Anesth Analg. 2005;101:1075–1090
- . Does electronic implementation of questionnaires used in asthma alter responses compared to paper implementation?. Qual Life Res. 2001;10:683–691
- Validation of electronic data capture of the Irritable Bowel Syndrome Quality of Life Measure, the Work Productivity and Activity Impairment Questionnaire for Irritable Bowel Syndrome and the EuroQol. Value Health. 2006;9:98–105
- . Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986;1(8476):307–310
- . A note on the use of the intraclass correlation coefficient in the evaluation of agreement between two methods of measurement. Comput Biol Med. 1990;20:337–340
- . Short-term retention of individual verbal items. J Exp Psychol. 1959;58:193–198
- . Neuropsychological assessment. 3rd ed.. New York: Oxford University Press; 1995;
- . Statistics notes: how to randomise. BMJ. 1999;319:703–704
- Validation of the German version of the Brief Fatigue Inventory. J Pain Symptom Manage. 2003;25:449–458
- Validation of the Taiwanese version of the Brief Fatigue Inventory. J Pain Symptom Manage. 2006;32:52–59
- Validation study of the Chinese version of the Brief Fatigue Inventory (BFI-C). J Pain Symptom Manage. 2004;27:322–332
- Validation study of the Korean version of the Brief Fatigue Inventory. J Pain Symptom Manage. 2005;29:165–172
- . A coefficient of agreement for nominal scale. Educ Psychol Meas. 1960;20:37–46
- . Using multivariate statistics. 3rd ed.. New York: Harper Collins College Publisher; 1996;
- Regional variability in use of a novel assessment of thoracolumbar spine fractures: United States versus international surgeons. World J Emerg Surg. 2007;
- Validation study of the Japanese version of the Brief Fatigue Inventory. J Pain Symptom Manage. 2003;25:106–117
- . Electronic pain diary: a randomized crossover study. J Pain Symptom Manage. 2004;28:259–267
- . Symptom assessment tools. Available from http://www.mdanderson.org/departments/prg/display.cfmAccessed October 10, 2007
The study was supported by the University of Leicester NHS Trust.
PII: S0885-3924(09)00538-7
doi:10.1016/j.jpainsymman.2008.11.015
© 2009 U.S. Cancer Pain Relief Committee. Published by Elsevier Inc. All rights reserved.
Volume 38, Issue 3 , Pages 390-400, September 2009

