| HOME | ARCHIVE | SEARCH | TABLE OF CONTENTS |
|---|
| ||||||||||||||||||||||||||||||||
a Division of Geriatrics and Gerontology, Department of Internal Medicine
b Division of Biostatistics, Washington University School of Medicine, St. Louis, MO
Correspondence: Ellen F. Binder, MD, Division of Geriatrics and Gerontology, Washington University School of Medicine, 4488 Forest Park Boulevard, Suite 201, St. Louis, MO 63108. E-mail: ebinder{at}imgate.wustl.edu.
Decision Editor: Eleanor S. McConnell, RN, PhD
| Abstract |
|---|
|
|
|---|
Key Words: Aging Physical performance Activities of daily living
Performance-based measures are well recognized as a tool for identifying community-dwelling older adults at risk for functional decline and death (
Guralnik, Ferrucci, Simonsick, Salive, and Wallace 1995
;
Guralnik, Simonsick, and Ferrucci 1996
;
Reuben, Rubenstein, Hirsch, and Hays 1992
;
Reuben, Siu, and Kimpau 1992
;
Seeman et al. 1994
) and have become an important component of geriatric assessment (
Applegate, Blass, and Williams 1990
;
Branch and Meyers 1987
). Brief physical performance measures can aid in the detection and delineation of physical limitations by providing a standardized measure of a particular physiologic domain. For example, timed chair rise has been used to assess skeletal muscle strength (
Csuka and McCarty 1985
), and the 6-min walk test has been used as a measure of endurance and cardiovascular fitness (
Peeters and Mets 1996
). For purposes of functional assessment, performance-based measures provide standardized observation of tasks that can serve as proxy measures for activities of daily living (ADL) tasks that require more time to complete, and therefore to observe. Physical performance measures offer other advantages over self-report and observer-rated measures of function with regards to validity, reproducibility, and sensitivity to change, and they may be less influenced by education, language, and caregiver expectation. They also allow for distinctions between an individual's usual function and the maximum level of function possible for a particular task (
Guralnik, Branch, Cummings, and Curb 1989
).
To date, the application of performance-based functional assessment to individuals with low levels of ADL function has been very limited. Patients with severe levels of disability, such as those in nursing homes (NH), are frequently unable to perform many available validated performance tests without assistance and therefore score at the "floor" (minimum score) for a number of items (
Reuben and Siu 1990
;
Tinetti 1986
). This makes it difficult to detect change in performance over time. The Physical Performance and Mobility Exam, which was developed with a sample of frail hospitalized patients, contains only mobility-related items (
Winograd et al. 1994
), and therefore does not provide more comprehensive functional assessment. Some more comprehensive performance-based instruments have been developed for NH residents with mild dementia and for community-dwelling individuals with dementia. The 54-item Physical Disability Index (PDI;
Gerety et al. 1993
) was developed with NH residents who had mild to moderate cognitive impairment. Although the PDI captures a wide range in performance ability, its application to research and clinical settings has been limited by the training and time required for test administration. Instruments that have been developed with community-dwelling older adults with dementia (
Loewenstein et al. 1989
;
Mahurin, DeBettignies, and Pirozzolo 1991
;
Skurla and Sunderland 1988
) include test items that rely primarily on cognitive abilities, are more relevant to tasks necessary for independent function in the community, and do not measure mobility function.
A brief performance-based instrument could be useful in the NH setting for interdisciplinary team assessment and care planning and baseline and outcome assessment for a variety of interventions aimed at improving physical functioning, including rehabilitation programs and clinical drug trials. Since 1991, the Minimum Data Set (MDS) instrument (
Morris, Nonemaker, and Murphy 1995
) has been mandated for use in federally certified NHs as a standard-ized functional assessment instrument. Although the MDS is a valid and reliable instrument when administered according to protocol (
Hawes et al. 1995
), mobility and ADL function are measured in relation to how the NH resident has performed the activity over the previous 7 days. Factors such as NH staffing limitations can make it difficult for NH residents to perform at or near maximal capacity, so that the MDS may reflect not what individuals are actually able to do, but rather what they are allowed to do by NH staff. The MDS does not include timed performance measures, which may be more sensitive to a change in a resident's condition. For example, information about speed of performance is not included in the MDS even though there is evidence that speed measures are highly predictive of health status (
Gill, Williams, and Tinetti 1995
). In addition, MDS measures may not detect subtle changes in performance that reflect new or worsened ADL disability (
Snowden et al. 1999
) that may also be amenable to interventions.
The primary aim of this study was to develop and validate a multidimensional physical performance instrument for assessment of ADL and mobility function in the NH setting. Our intent was to develop an instrument that would allow for greater discrimination among individuals whose physical function is at the lower end of the spectrum. Another hypothesis was that an instrument that includes measures of speed and endurance would allow greater sensitivity to change than existing measures of functional ability.
| Methods |
|---|
|
|
|---|
|
Participants and Sites
Participants were recruited from two not-for-profit urban community NHs. Eligibility criteria included the following: 65 years of age or older, ability to follow a one-step command, and absence of acute illness or unstable medical condition. Participants were excluded for the following criteria: enrollment in a hospice program or anticipated survival of less than 6 months, enrollment in a subacute care program, inability to follow directions due to sensory impairments, or chronically bed-bound or requiring Hoyer lift for transfers. Participants or a proxy provided informed consent for study participation as approved by the Washington University Human Subjects Committee.
Performance Testing Protocol
The NHPPT was administered twice 48 hr apart at baseline, 1 week later, and 6 months later. Two trials were performed for all test items except standing balance, sit-to-stands, chair transfer, and 6-m and 6-min walk/wheel, for which only one trial was performed. (The first 6 m of the 6-min walk were timed using the same preferred mode of mobility for both, e.g., walking or wheeling.) Testing was conducted in the resident's room except for the walking/wheeling procedure, which was performed in a predesignated low-traffic hallway. Two research staff members administered the performance test battery: one instructing, prompting, or assisting the participant, the other scoring and timing the items. The staff alternated these responsibilities across days of testing for each resident.
Scoring Procedures.
To create an instrument with discriminative power and sensitivity to change over time, task performance was measured in two ways. A score was assigned to describe the amount of assistance and/or prompting required to complete each task, ranging from no assistance and no verbal prompting (score = 4), to verbal prompting without any physical assistance (score = 3), minimal physical assistance (score = 2), maximal physical assistance (score = 1), or unable (score = 0). Assistive devices were allowed but not included in the assistance level scoring. Procedures for prompting, assistance, and scoring were defined a priori for each item and are described in the procedure manual (available from corresponding author on request).
Task performance was also measured in relation to the time required to complete the task or, in the case of sit-to-stands, the number completed or, for the 6-min walk/wheel, the distance traveled. These measures were not recorded for participants who were unable to perform the test item (assistance score = 0) or for participants who required maximal physical assistance (assistance score = 1).
Concurrent Measures
Because there is no "gold standard" to measure functional status and mobility in NH residents, we used construct and concurrent validation procedures. Three validated instruments of functional status were administered: the MDS Version 2.0 (
Morris et al. 1995
) ADL Long Form (or MDSADL; range 0 28;
Morris, Fries, and Morris 1999
), the Multidimensional Observational Scale for Elderly Subjects (MOSES; self-care subscale, range 036;
Helmes, Caapo, and Short 1987
), and the Katz instrument (range 618;
Katz, Ford, and Moskowitz 1963
). For all three instruments, a higher score is indicative of poorer functional status. Research staff followed standardized procedures for administration of these instruments.
The Short Blessed Test of Orientation, Concentration, and Memory (SBT) (
Blessed, Tomlinson, and Roth 1968
;
Katzman et al. 1983
) was administered to assess participants' cognitive function.
The MDS, MOSES, Katz, and SBT were administered at baseline and at 6 months. NH records were used to collect information about demographics, medical history, and medication use.
Statistical Analyses
We analyzed the data using SAS statistical software (
SAS Institute 1994
). Intraclass correlation coefficients (ICCs) were calculated to examine interrater and test-retest reliability for each item. To evaluate concurrent validity, Spearman rho correlation coefficients were calculated between the NHPPT items and the MDSADL, Katz, and MOSES items. The internal consistency of hierarchical scales was evaluated by calculating Cronbach's coefficient alpha. Statistical significance was defined as p
.05.
Scale Development
Because the distributions of most of the measures of time, distance, or number (hereinafter referred to as "quantitative") were skewed, statistical outliers were identified using a box plot and eliminated. A maximum value was assigned for each item on the basis of the distribution of values for that item. The maximum value was substituted when an individual's performance was greater than the maximum value (including individuals with outlying values) or when the individual had an assistance score of 0 or 1. Quartile ranges were calculated for each item (except standing balance), and a categorical score was assigned for each quartile ranging from 0 (unable, or measured performance greater than maximum value) to 4 (best performance).
Item Reduction
A factor analysis was performed using principal-components analysis with varimax rotation to determine which items to retain in the final instrument. The results of the factor analysis were interpreted using both the Kaiser-Guttman rule and examination of the scree plots of eigenvalues. Factor scores were constructed by summing the respective scores for items with a factor loading greater than 0.50. Additional criteria used for item reduction included internal consistency of items within the factor structure and redundancy. Redundancy was defined as pairs of items with a bivariate correlation coefficient greater than .80. A total NHPPT score was constructed by summing the factor scores.
Assessment of Stability/Responsiveness to Change
To examine the stability of the NHPPT scale relative to other functional scales, paired t tests were performed for each scale (6 month value vs baseline value). Effect sizes were calculated for each scale (mean change divided by the standard deviation of the change;
Cohen 1988
), as well as the estimated sample size needed to detect the observed mean changes if one were testing the efficacy of an intervention in a controlled clinical trial (
Hulley and Cummings 1988
), using a two-tailed alpha of .05 and a power of .80.
The analyses regarding validity, item reduction, and stability were performed using both the quantitative scores and the assistance level scores, separately and as combined (summed) factor and total NHPPT scores.
| Results |
|---|
|
|
|---|
18) comprised 37% of the sample.
|
|
|
The analyses described next were performed using the quantitative raw measurements and scores, the assistance level scores, and a summed score in which the two scores were added together. We found that the assistance level scores did not add significantly to the validity and internal consistency of the NHPPT and did less well in relation to stability and sensitivity to change. For this reason we elected to use only the quantitative measures for the NHPPT instrument; data related to assistance level scores are not presented.
Construct and Concurrent Validity
As expected, Spearman rho correlations between NHPPT items and MDSADL, MOSES, and Katz items were stronger for items that had face validity for similar ADL domains and ranged from .40 and .77. For example, the correlations between scooping applesauce and the MOSES and MDSADL eating items and Katz feeding were -.63, -.40, and -.59, respectively. The correlations between chair transfer and MDSADL and MOSES toileting items were -.56 and -.57, respectively. The correlation between put on and take off sweater and MDSADL and MOSES dressing were -.56 and -.58, respectively. All coefficients were significantly greater than zero (p < .001).
Item Reduction
Results from the factor analysis of the total 13-item NHPPT score are presented in Table 5 . The 10-s standing balance task was excluded from this analysis because of redundancy with 60-s balance scores. Two factors had eigenvalues above 1.0. Transfer, mobility, and dressing items loaded on Factor 1, whereas the other items loaded on Factor 2. Validation of the two-factor structure was conducted by summing NHPPT item scores that had factor loadings greater than 0.5 (Factor 1 score and Factor 2 score) and then correlating the factor scores with item scores on the MDSADL, MOSES, and Katz scales (e.g., eating, grooming, transfer, etc.). As expected, correlations were higher between the Factor 1 score and transfer, mobility, and dressing item scores on the other scales and between Factor 2 score and eating and grooming item scores.
|
Using objective criteria, we eliminated NHPPT items from each of the factors, with the aim of making the instrument as brief as possible without significantly reducing construct validity or internal consistency. Correlations between the timed sit-to-stand item and the number of sit-to-stands, standing balance, and chair transfer items were greater than .80. Due to the hierarchical testing protocol, participants who were unable to stand or required maximal assistance did not complete the number of sit-to-stands, standing balance, and chair transfer items. For these reasons, those three items were eliminated from the Factor 1 score. The eat applesauce item was eliminated because (a) it was highly correlated with the scoop applesauce item (r = .74), (b) a number of individuals either could not (n = 2) or refused (n = 25) to perform the item, and (c) it had slightly lower correlations than the scooping item with the MDSADL, MOSES, and Katz eating items. Other items were retained in the factor structure on the basis of the combination of items that produced the highest Cronbach's coefficient alpha. This resulted in a 6-item battery that included the following measures: Factor 1(three items): sit-to-stand (timed), put on/take off sweater, and 6-m walk/wheel; and Factor 2(three items): scooping applesauce, face washing, and telephone use. Cronbach's coefficient alpha of the factor scores and the six-item NHPPT battery were as follows: .83 for the Factor 1 score, .88 for the Factor 2 score, and .92 for the six-item NHPPT total score.
Validity of Summary Scales
Construct validity for the factor scores and NHPPT total score were evaluated by calculating Spearman rho correlations with the total scores for the SBT, MDSADL, MOSES, and Katz scales (Table 6 ). The NHPPT total scores were moderately correlated with the SBT (range: rs = -.33 to -.42, p < .001) and were highly correlated with the functional scales (range: rs = -.72 to -.84, p < .001). The Factor 1 scores were poorly correlated with the SBT (range: rs = -.11 to -.13, ps = .25 to -.35), whereas the Factor 2 scores were moderately correlated with the SBT (range: rs = -.42 to -.58, p = .0001), suggesting that Factor 2 tasks were more related to cognitive function.
|
|
| Discussion |
|---|
|
|
|---|
We were able to demonstrate concurrent and construct validity of NHPPT items and summary scores through comparisons with other measures of functional status appropriate for an NH population. The magnitude of the correlations between the NHPPT and ADL scales suggests that the NHPPT may offer information related to function that is not captured by the other ADL scales. Our factor analysis suggests that the NHPPT captures constructs related to gross motor function, balance, flexibility, fine motor coordination, and task sequencing that are required for mobility and ADL activities. As might be expected, the Factor 2 score, which included tasks requiring fine motor coordination and task sequencing, was more highly correlated with the SBT score. Similarly, the Factor 1 score, with tasks requiring gross motor function, flexibility, and balance, was poorly correlated with the SBT. The separate factor scores may enable clinicians to better distinguish between declines in these two domains.
Although the NHPPT captured a greater range in function than previous performance-based instruments, at least one third of the residents at our study sites did not meet our eligibility criteria. Most of these individuals had end-stage dementia such that they were not testable (e.g., they could not follow a one-step command) and required complete assistance in ADLs. Thus, although our instrument is not applicable to all NH residents, most of those who are unable to complete NHPPT items may not be testable using currently available performance-based methods. Further study is needed to identify whether other testing strategies such as environmental cueing might capture some of the population that was not testable in this study.
We measured performance in two ways (assistance level and quantitative measures) with the idea that the information from each type of measurement might better discriminate between individuals and provide greater sensitivity to change. We found, however, that the assistance level scoring did not improve the reliability, validity, or discrimination of the instrument. This may have been because the majority of individuals were either independent or completely dependent for task completion, particularly for the mobility items.
The effect sizes and sample estimates for the NHPPT scales were better than those for the other functional scales, suggesting that the NHPPT may have greater sensitivity to change. Our ability to draw inferences is limited, however, because we did not have a gold standard to evaluate the clinical significance of the score changes observed. Further research will be necessary to evaluate the NHPPT's utility in documenting early changes in clinical status, the effects of interventions, and the ability of the NHPPT to predict functional decline in the NH setting.
The six-item NHPPT battery has construct validity that is nearly equivalent to the longer test battery, with a similar calculated effect size and a minimal decline in internal consistency. This version of the test takes approximately 15 min to complete and therefore has potential applicability to both clinical and research settings.
In summary, our work demonstrates the reliability and validity of the NHPPT as an objective measure of physical function among NH residents who do not have end-stage dementia. The NHPPT may prove useful as a tool for predicting subsequent functional decline and monitoring the effects of interventions in the NH setting. For example, because the NHPPT includes measures of speed, NH personnel may be able to detect improvements in function more readily than more difficult and slow changes in assistance levels. Likewise, a lack of improvement over short periods of time or an abrupt decline in performance may alert staff that intervention strategies need to be instituted or modified. Future research and clinical applications will be necessary to clarify the utility of the NHPPT.
| Acknowledgments |
|---|
We recognize Dr. David Reuben, Dr. John Schnelle, and Dr. Martha Storandt for their suggestions and constructive comments during the course of this research and for review of versions of this article. We also acknowledge the technical assistance of Joan Hirst, Amy Davis, and Alexandra Georges.
| Footnotes |
|---|
Received for publication November 17, 2000. Accepted for publication June 25, 2001.
| Appendix |
|---|
|
|
|---|
|
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
K. A N D Nijs, C. de Graaf, F. J Kok, and W. A van Staveren Effect of family style mealtimes on quality of life, physical performance, and body weight of nursing home residents: cluster randomised controlled trial BMJ, May 20, 2006; 332(7551): 1180 - 1184. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||
| HOME | ARCHIVE | SEARCH | TABLE OF CONTENTS |
|---|