| HOME | ARCHIVE | SEARCH | TABLE OF CONTENTS |
|---|
| ||||||||||||||||||||||||||||||||
Correspondence: Address correspondence to Nicholas G. Castle, PhD, RAND, 201 North Craig Street, Suite 102, Pittsburgh, PA 15213-1516. E-mail: CASTLE{at}RAND.org
| Abstract |
|---|
|
|
|---|
Key Words: Satisfaction Response formats Surveys
Despite these recent works, research on factors that can help improve the information collected from long-term care settings is still needed. Several authors have noted potential difficulties in obtaining survey information from elders. These include problems with response rates (El-Guebaly, Toews, Leckie, & Harper, 1983), cognition (Simmons et al., 1997), acquiescent response bias (Ross, Steward, & Sincore, 1995), and lack of response variability (Pascoe & Attkisson, 1983).
In fact, a factor common to the results of many satisfaction surveys is a lack of response variabilityin the case of elders, the use of the upper end of a response format (Pascoe & Attkisson, 1983). On the one hand, a simple dichotomous scale (e.g., yes or no) is most prone to lack of response variability; this scale is also limited in that it gives no indication of the relative intensity of satisfaction or dissatisfaction. On the other hand, the inclusion of more categories (i.e., using a five-category Likert scale) does not appear to resolve the problem. Elders still tend to use the upper end of a response format. Some research with elders also suggests that this may be problematic as more categories can lead to confusion.
Clearly, for satisfaction information from elders to be used more effectively, it would be advantageous to increase the response variability. One way to do this may be to include an easily understood response format. Asking elders which response formats they prefer and which formats they find easiest to use may be a useful first step in increasing response variability. In addition, empirical analyses measuring the response variability between various formats may be informative for future and current satisfaction initiatives for elders. With these questions in mind, in this paper we first examine elders' preferences among five response formats, and then we examine the response variability of these five commonly used formats.
Background
Measures of satisfaction have been shown to be useful in many ways. First, results from satisfaction surveys are measures of accountability. Second, satisfaction measures provide customer-focused quality-of-care information that is generally not available from other sources. Third, satisfaction information can be used by providers for quality-improvement initiatives.
The stated goals of organizations promoting and monitoring quality stress the accountability of satisfaction measures. For example, the National Committee on Quality Assurance (NCQA) states that its mission "is to provide information that enables purchasers and consumers of managed health care to distinguish among plans based on quality, thereby allowing them to make more informed health care purchasing decisions" (NCQA, 2003). By publishing satisfaction scores from nursing homes, states such as Michigan, Ohio, and Vermont (Lowe, Lucas, Castle, Robinson, & Crystal, 2003) have also increased the accountability of providers to consumers.
Various structure, process, and outcome measures are addressed in existing facility surveys in long-term care (Cherry, 1991; Harrington, 1991). These are important, especially those measuring clinical outcomes. However, these outcomes may differ from outcomes defined as important by elders (Bond & Thomas, 1992). Moreover, consumers can find clinical outcomes difficult to understand (American Association of Retired Persons [AARP], 2001).
Satisfaction information can be used by providers. The information is useful in at least two ways. First, satisfaction information can be used by providers to compare the quality of care of their own facilities with that of others, although clearly this is most useful when providers are using the same satisfaction instruments. Second, satisfaction information can also be used by providers for quality-improvement initiatives (Harrington, 1991). Noelker, Ejaz, and Schur (2000) found that 98% of nursing homes collecting satisfaction information used the information for quality improvement.
These benefits of satisfaction information rely heavily on adequate and reliable discrimination between satisfaction scores. Clearly, satisfaction surveys with a lack of response variability will provide limited information for accountability, use by consumers, benchmarking, or provider quality-improvement purposes.
Lack of response variability is an issue inherent to health care surveys in general (Pascoe & Attkisson, 1983). Ware, Snyder, and Wright (1977), for example, found approximately 90% of consumers to be satisfied with health care services. Lack of variability would appear to be especially problematic in surveys involving elders, where most elders respond that they are satisfied (Ross et al., 1995). This, of course, is only truly an issue of concern if elders are dissatisfied with the quality of health care.
No gold-standard measure of resident satisfaction exists. However, two factors led us to believe that elders are dissatisfied with the quality of health care, and measurement problems are contributing to lack of response variability in surveys of elders. First, the high satisfaction scores found in most studies conflict with public opinion polls (AARP, 2001), government studies (Institute of Medicine [IOM], 2001), and research (Cherry, 1991) on the general quality of long-term care.
Second, the general survey literature has identified significant associations between the methods of measuring satisfaction and the results obtained. Ware and associates (1977) examined survey-response variability in hospital patients. They used five different measurement approaches for the same participants. The positive responses varied from 1% to 77%, depending on the scale used. In a similar manner, using seven different satisfaction instruments at a Veterans Administration outpatient clinic, Ross and colleagues (1995) found positive responses to vary from 63% to 82%. Counte (1979), examining responses from outpatients at a multiple sclerosis center, found lower satisfaction scores with a 5-point scale than with a 6-point scale. Ware and Hays (1988), using outpatient data from general medical clinics, compared the response variability, reliability, and validity of a 6-point response scale and a 5-point response scale. The 5-point response scale had significantly more variability. These studies would seem to indicate that methods of measuring satisfaction influence the results obtained. However, for the most part, these studies confound different measurement approaches with different response scales, and, as one would expect, given the diverse populations examined, few generalizations can be made from these study findings.
Ross and colleagues (1995) examined the variability in satisfaction evaluations by using seven different measurement methods. These authors conclude that "unreliability of measurement may be a significant problem in satisfaction measurement, especially for the oldest and most ill patients" (p. 392). Despite the conclusion of these authors, no study has examined the association between lack of response variability in surveys of elders and the satisfaction scale used.
The published literature on satisfaction instruments in long-term care we identified are summarized in Table 1. The literature consists of 20 studies, but few generalizations can be made from this table. The number of items included in the surveys varies considerably, from as few as 10 (Mitchell & Kemp, 2000) to as many as 79 (Ejaz, Straker, Fox, & Swami, 2003). The mean number of items is 34. Likewise, the number of respondents is highly varied. One study had only 14 respondents (Smith & Sullivan, 1997), whereas the largest number was 1,374 (Pearson, Hocking, Mott, & Riggs, 1993). The mean number of respondents was 232. Six of these publications provided reliability or validity information of their survey instruments. The majority of studies we identified in our literature review were set in nursing homes. Most of these studies are reviewed in the text by Cohen-Mansfield and colleagues (2000), which includes a chapter summarizing published satisfaction scales (Kruzich & Cohen-Mansfield, 2000). Of most significance to this paper, we could not find any consensus on the type of response format. Several studies (seven) used a Likert format; however, the number of possible responses in these scales varied from three to five. Other formats commonly used include simple dichotomies, such as yesno and open-ended response formats.
|
| Methods |
|---|
|
|
|---|
Some authors have described types of response formats used in satisfaction surveys. For example, Krowinski and Steiber (1996) gave examples of four general categories of response formats: nominal, ordinal, interval, and ratio. We first tried using these conventional categories to classify the response formats of commonly used surveys. We found most response formats to fall into the ordinal category. Thus, because of the lack of discrimination between response formats using this approach, at least for our purposes, we more finely divided satisfaction surveys by using the type of rating scale used (also called scaled-response formats; Krowinski & Steiber, 1996).
We identified eight commonly used scaled-response formats: open ended, dichotomous (e.g., yesno), Likert, evaluation, frequency, satisfaction, visual analogue, and Chernoff faces. The format of open-ended and dichotomous response formats is self-evident. Likert formats are a series of opinions (e.g., agree, strongly agree; see Likert, 1932). An evaluation format is a series of responses of ordinal value (e.g., poor, good, very good). A frequency format is a series of responses with ascending or descending value (e.g., all of the time, some of the time). A satisfaction format is a scale that determines the frequency of satisfaction (e.g., very satisfied, satisfied). A visual analogue format (also called graphic scaling) is a pictorial scale that usually has some interval value (e.g, scale from 1 to 10). Chernoff faces are a pictorial representation with Likert-scale or evaluation-scale types of values (faces have smiles and frowns). Examples of these eight types of scaled-response formats are given in the appendix.
Because it was not possible to include measures using all potential variations in response scales and response formats, we used five common types. Our criteria for choosing these were as follows: (a) they were commonly used in existing long-term-care surveys; (b) they were commonly used in health care surveys; (c) they were commonly used by mental health care providers; and (d) positive opinions of the scales or formats were published. We excluded open-ended questions because they can be problematic in creating empirical scores. We excluded dichotomous scales because they limit the degree of information obtained on satisfactiondissatisfaction; for example, the number of satisfied residents can be determined, but no information is gained on how satisfied they are. We also excluded frequency scales because they were inappropriate for many questions we used to evaluate the one-time medical encounter (surgery center visit) investigated. Using this process, we identified five response formats.
First, we used a 5-point Likert format (5LF), anchored by 5 (strongly agree) and 1 (strongly disagree). An example of this was the following question: "I am satisfied with my overall experience in the surgery center" (strongly disagree, disagree, not sure, agree, strongly agree).
Second, we used a 5-point satisfaction format (5SF), anchored by 5 (very satisfied) and 1 (very dissatisfied). An example of this was the following request: "Please rate your overall satisfaction with our surgery center" (very dissatisfied, dissatisfied, neither satisfied or dissatisfied, satisfied, very satisfied).
Third, we used a 5-point evaluation format (5EF), anchored by 5 (excellent) and 1 (poor). An example of this was this request: "Please rate your overall experience in our surgery center" (poor, fair, good, very good, excellent).
Fourth, we used the four Chernoff face format (4CF), anchored by 4 (excellent) and 1 (poor). Under the faces the ratings excellent, very good, fair, and poor were printed. An example of this was the following request: "Please rate your overall experience in our surgery center" (poor, fair, very good, excellent).
Fifth, we used a visual analogue format (10VAF), anchored by 1 (very poor) to 10 (excellent). These scales themselves have a lot of variability. They vary in the anchors used and whether demarcations are included on the scale (Pascoe & Attkisson, 1983; Ross et al., 1995). We included demarcations from 1 to 10, and respondents were asked to mark on the scale the point that best identified their experience.
Data Sources and Sample Selection
Satisfaction and demographic data were self-reported by patients in four outpatient surgery centers during 1998 and 1999. In addition, self-reported questions of general health taken from the Short-Form 36 Health Survey (SF-36; Ware & Sherbourne, 1992) were included. The SF-36 has five questions that are combined into a general health perceptions scale. In this investigation only data from patients 65 years or older are reported.
We used six different survey instruments. Five instruments varied in response format, as already described (5LF, 5SF, 5EF, 4CF, 10VAF). In these five satisfaction questionnaires, we kept other factors, such as size of the font and the order of question, constant. The wording of questions was also extremely similar in all cases. However, we could not use identical wording for the questions, as minor changes had to be made to make the question appropriately match the response scale. In the sixth survey instrument, we used questions from two satisfaction domains (art of care and global satisfaction) and asked each question multiple times by using the five different response sets. Then a series of questions asked for the respondents' preferences between the different response sets. We used the art of care and global satisfaction domains, because in pilot analyses these domains had the least missing data.
Surveys were given to participants from a randomly ordered stack. We randomized the order of the surveys by using a table of random numbers. All survey instruments had a face page, so the type of survey given to each patient could not be readily identified. We performed regular checks to ensure that the administration of the different survey instruments was random. In addition, we have no reason to believe that administration of the different survey instruments was biased because of staff preferences.
Instrument Development and Administration
In the published literature, we were unable to find a short, valid, satisfaction instrument for use in an outpatient surgery center. Therefore, we developed our own survey by using five criteria from the Centers for Medicare and Medicaid Services (CMS) for assessing the validity of our instrument (construct validity, face validity, reliability, clinical validation, and applicability).
No attempt was made to assess satisfaction with all aspects of outpatient surgery center care. The main criterion was to simply collect reliable and valid information that could be used for quality-improvement purposes. However, in an emulation of previous studies, three dimensions of elder satisfaction were purposefully evaluated: art of care, technical quality, and efficacy. This was based on the work of Ware, Davis-Avery, and Stewart (1978).
The art of care assesses provider characteristics. In our case this includes courtesy and comfort in asking questions. The technical quality of care assesses competence of caregivers. We include the knowledge and explanations given by physicians and nurses. Efficacy assesses the degree to which patients feel they were helped by the care given. We include the care given by physicians and nurses. In addition, we included four questions regarding the amenities of the care environment and one global item. The amenities items included temperature, cleanliness, and comfort of the facility. The global item assessed overall satisfaction with the outpatient surgery center.
We report on the reliability of the instrument in the results section. To ensure face validity we used a multidisciplinary hospitalsurgical center team including clinical and administrative staff to develop the survey. In addition, nurses in daily contact with patients were key participants in the project. From a comprehensive list of questions compiled by our team and from a literature search, we discussed which information we actually required. This, we believe, ensured the applicability of the survey.
We sought to use simple language and short questions. In all cases, a FleshKincaid Grade reading level of 5.0 or lower was used (the average for all questions was 4.7). The resulting 17 questions were pilot tested with patients for one month (n = 112). This resulted in no changes to the questions and only minor wording revisions to the survey instructions given to the patient. The data from the pilot test are not included in the results we report.
| Survey Administration |
|---|
|
|
|---|
Patients were provided with instructions to deposit the completed questionnaire in a locked box in the waiting area, or mail it back with an attached postage-paid envelope. The instructions also described how the information would be used and that completing the survey was voluntary. No signed informed consent was used; however, patients were assured that no individuals would be identified in any use of the data. In fact, it is important to note that no identifying marks were used on the survey, and once it was deposited in the locked box the patient could not be identified.
A potential bias to satisfaction surveys involves recall bias. That is, patients may have difficulty in accurately responding to some questions. Our survey administration process ensured that patients received a questionnaire when their experience was still memorable and thus decreased bias caused by potential time lags. Other biases may still exist. For example, patients may have a social desirability tendency. That is, patients may respond to the satisfaction questions as they think they are expected to respond. Patients may also have a fear of reprisal, although the anonymity of patients was an essential part of our survey process.
Analyses
We first provide some of the psychometric properties of the satisfaction instrument. This follows the work of McHorney, Ware, Lu, and Sherbourne (1994) and includes completeness of data, item-discriminant validity, and reliability of scale scores.
We determined the percentage of patients not providing responses for each question. This is important because a score for each scale cannot be confidently computed if a high number of individual items comprising that scale are missing (McHorney et al., 1994). Item-discriminant validity determines the degree to which items correlate within the scale compared with other scales (see McHorney et al. for a discussion of this technique). Finally, to examine the internal consistency, we calculated Cronbach's alpha for each scale.
Respondents' preferences for each response scale are presented. We use t tests, with Tukey corrections to account for multiple comparisons, to compare the significance of the difference in values between the groups.
Using a methodology similar to that of Ware and Hays (1988), we transformed the satisfaction scores from the 5LF, 5SF, 5EF, 4CF, and 10VAF formats to a common 0100 scale, with higher scores reflecting greater satisfaction. For each scale, item responses in the lowest category were recoded as 0, and those in the highest category were recoded as 100. Item responses between the highest and lowest categories were recoded to give uniform divisions between 0 and 100. For example, using the 5EF, we recoded responses of poor, fair, good, very good, and excellent as 0, 25, 50, 75, and 100, respectively. We also calculated the coefficient of variation (CV) by dividing the standard deviation by the mean and multiplying by 100 (Ware & Hays, 1988). The CV is a unitless measure, and it enables us to compare the response variability of the different satisfaction formats used. There is no simple parametric distribution for a test statistic that would allow us to test the difference between two CV values (Rao & Bhatt, 1995). Therefore, we bootstrapped a 95% confidence interval for each of the CV estimates to determine whether the differences between the CVs are due to more that just sampling error (Davison & Hinkley, 1997).
| Results |
|---|
|
|
|---|
A total of 2,450 valid questionnaires were received. Census records indicate that 3,122 elders received care in the outpatient surgery centers during the period of observation. This gives a response rate of 78.5%. In addition, response rates for the five different questionnaires were similar, varying from 76.3% (5SF) to 79.8% (10VAF). Loss of 672 elders may be representative of those with lower levels of satisfaction or more severe illness. As already discussed, some patients may have fear of reprisal in answering the questionnaire, and presumably they are dissatisfied. Other patients may not have responded as a result of their illness. Only 7% of the sample was from mailed questionnaires. In sensitivity analyses excluding these questionnaires, findings were similar to those presented. That is, values excluding these questionnaires were all within 1% of the values reported. In addition, response rates, use of the mail to return the questionnaire, levels of perceived physical health, and average age were not significantly different among the six groups of respondents that received different questionnaire formats (analysis not shown).
Descriptive statistics (not shown) show that, on average, patients rated their overall health as above average (M = 4.61). Only 11% of respondents rated their health as very poor. This would be expected, as patients in very poor health are more likely to receive surgery in a hospital setting. The average age of residents was 69 years.
Table 2 presents the results for patient preferences for the survey response formats used. This information came from the sixth survey instrument, which used questions from two satisfaction domains (art of care and global satisfaction) and asked each question multiple times with the five different response sets. The 4CF was liked the least, with only 5% of the respondents preferring this format. The 10VAF was liked the most, with 39% of the respondents preferring this format.
|
|
| Discussion |
|---|
|
|
|---|
The CV for the satisfaction domains used in the 10VAF was also higher than those in identical domains using the other response formats. This would seem to indicate that the 10VAF is less prone to a ceiling effect than the other formats. In fact, the mean scores for the 10VAF were all approximately 5 points lower than those of the next best format, and the CV was about 3 points higher. We are careful to stress, however, that although the 10VAF would appear to be less prone to a ceiling effect than the other scales, this does not necessarily mean the 10VAF is immune from ceiling effects. It is also important to note that in all cases our results were skewed to the positive end of all response formats.
Our results should also be tempered by several limitations inherent to this investigation. First, we only have data from relatively well-functioning community-dwelling elders. Although we have little data to the contrary, one could envisage elders with low cognitive functioning to be confused with a 10-item format and the extra wording needed to introduce the format (the word stemon a scale of 1 to 10, please rate ...). Indeed, in nursing home residents with low cognitive functioning, dichotomous scales appear advantageous and are advocated by several researchers (e.g., Simmons et al., 1997; Uman et al., 2000). In addition, in most other long-term care settings, residents are older and frailer than those we investigated. Our results may have limited generalizability and may not be applicable to the institutionalized elderly population.
The second limitation of our research is that we only examine five different response formats. Our results are probably only representative of these specific formats. We cannot extrapolate from our results, for example, to state that the 10VAF would perform better than a three-item Likert format (not investigated). Nor can we presume that the 10VAF is the best visual analog format. A similar format with 0 to 10 responses (rather than 1 to 10), a format with fewer responses such as from 1 to 5, or a format using different anchors may perform better. In addition, because four of the formats examined used 5 or fewer response options, and only one used 10 response options, it would seem especially appropriate to examine response options in between (e.g., a seven-item response option). In short, the response formats we examine vary on two dimensions: scale format and number of response options. With the number of response formats we tested, we cannot determine whether the 10VAF was favored because it used 10 response options or because it used a visual analogue format. Of course, we also do not investigate why participants preferred the 10VAF.
We examine common satisfaction questionnaire response formats. These response formats have face validity, but we have no way of knowing whether less commonly used response formats would perform better than the 10VAF. We also do not examine response formats that use a dual-rating system, that is, a format using a yesno question first, followed by a specific response format (such as a Likert scale) if yes is given as an answer.
A further limitation of our research is that some bias may have come from patients' relatives, who may have helped in completing the questionnaire. We did not include a section on the survey to identify whether such help was used. In retrospect this may have been useful information to collect.
Several initiatives are currently underway to standardize the collection of satisfaction information in long-term-care settings. The most notable are the nursing home Consumer Assessment of Health Plans Survey (CAHPS; Agency for Healthcare Research and Quality, 2001) and the Ohio Department of Aging instruments (Straker, Ehrichs, Ejaz, & Fox, 2003; Straker & Ejaz, 2001). The full benefits of measuring and reporting resident satisfaction can only be realized with such standardized instruments. Benchmarking, accountability, and quality-of-care information especially require valid and consistent measures. The architects of these satisfaction instruments have a considerable number of pertinent issues to consider in order to ensure the validity of their instruments. These issues include, but are not limited to, which domains to measure, number of questions, length of questions, survey administration, and cognitive screening. We propose that one further consideration to be included in the development of future satisfaction surveys is the response format used.
|
| Footnotes |
|---|
Decision Editor: Linda S. Noelker, PhD
Received for publication December 3, 2002. Accepted for publication September 23, 2003.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
N. G. Castle Nursing Home Administrators' Opinions of the Nursing Home Compare Web Site Gerontologist, June 1, 2005; 45(3): 299 - 308. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Castle Family satisfaction with nursing facility care Int. J. Qual. Health Care, December 1, 2004; 16(6): 483 - 489. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||
| HOME | ARCHIVE | SEARCH | TABLE OF CONTENTS |
|---|