Measurement and statistics in exercise science
This is an excerpt from Introduction to Exercise Science With HKPropel Access by Duane V. Knudson.
By Matthew Mahar
Measurement is fundamental to professional practice and research in exercise science. Health professionals, including physical therapists, athletic trainers, coaches, and personal fitness trainers, use measurements to diagnose and monitor their clients and to prescribe exercise and rehabilitation programs. Without accurate and consistent measurement of the variables important in exercise science we cannot place any credibility in the results or use them to guide our exercise prescriptions. This holds true for all subdisciplines within exercise science. The measurement and evaluation course that you will take will cover topics of validity and reliability of measurement. These are indispensable constructs; gaining a basic understanding of them at this time will help you to start thinking like an exercise scientist.
Most, although not all, of the subdisciplines and research within exercise science have a quantitative focus, with an interest in measuring variables by assigning numbers to observations according to rules. For example, a researcher might be interested in whether time spent in vigorous physical activity is related to cognitive function. To examine this question, both vigorous physical activity and cognitive function must be measured. Vigorous physical activity might be quantified as the number of minutes spent per day in vigorous physical activity using an instrument known as an accelerometer. Cognitive function might be measured with software that quantifies attention during performance of a task on a computer. To answer the question about whether the variables of vigorous physical activity and cognitive function are related, researchers would apply a selected statistical analysis to the data they measured. Exercise science has a long, rich history of research on developing accurate tests, measurements, and statistical analyses of health and performance variables.
In this chapter, you will learn historical highlights and benefits of measurement research in exercise science. Next, you will be introduced to essential concepts that help exercise science professionals document the quality of measurements and appropriateness of the interpretation of measurements. Basic descriptive statistics and statistical analyses commonly used to examine relationships among variables or differences among groups will also be presented. This chapter will not cover concepts used in analytical and qualitative research. These fields of exercise science use logical argument or systematic, qualitative judgments about observations in their research.
Benefits and History in Exercise Science
One of the oldest and most robust subdisciplines of exercise science is measurement, sometimes referred to as measurement and evaluation. Since the 1920s, researchers have focused on developing accurate tests and measurements of a wide variety of human movement, health, and performance variables. Many physiological and medical tests commonly used today are based on research and knowledge developed by measurement and evaluation scholars from exercise science. Here are just a few measurement innovations and how they are used in research and professional practice.
Measurement and evaluation researchers have made critically important contributions to the field of youth fitness assessment. In the 1950s, Kraus and Hirschland (1954) published a study that showed that European children scored higher on the Kraus-Weber test of minimum muscular fitness than American children. Although the Kraus-Weber test was not a true test of youth fitness, it was the motivating force behind the development of the first national youth fitness test in the United States, the American Association for Health, Physical Education, and Recreation (AAHPER) Youth Fitness Test, first published in 1958. This test included motor fitness items that measured running, jumping, and throwing—items that encouraged athletic excellence.
A major shift in youth fitness testing away from an athletic emphasis to a health-related emphasis occurred in the 1970s. A group of measurement specialists and exercise physiologists met to discuss the growing medical evidence that supported the role of physical activity and fitness in health. This meeting led to the development of the term health-related fitness and the publication of an important paper that influenced the trajectory of youth fitness testing (Jackson et al., 1976). In the United States, a youth fitness test battery called FitnessGram continues to provide the most scientifically validated tests of health-related youth fitness. Health-related fitness components have included aerobic capacity, body composition, muscular strength and endurance, and flexibility. Research evidence that muscle power is associated with bone health, that muscular strength is a good predictor of health and function, and that core endurance is linked to balance and spinal stability led the FitnessGram Advisory Board to include measures of lower-body power via the vertical jump (Mahar et al., 2022), maximal strength via a handgrip test (Saint-Maurice et al., 2018), and core endurance via the plank test (Laurson et al., 2022) in their youth fitness testing battery. As new research evidence accumulates to suggest changes to current practice, exercise scientists need to be willing to embrace new ways of doing things so their practices and decisions remain evidence based.
Regression analysis, a statistical concept addressed later in this chapter, has been used by measurement researchers to produce findings that have greatly influenced research and practice in exercise science. For example, Jackson and Pollock (1978) and Jackson and colleagues (1980) developed generalized regression equations to estimate percentage of body fat from the sum of skinfolds in men and women, respectively. A skinfold is a pinch of skin and the underlying subcutaneous fat and is measured with calipers. These measurement procedures have been used in thousands of subsequent studies to estimate percent fat and by practitioners throughout the world to provide feasible estimates of percent fat for their athletes, clients, and patients. Over time, other field-based measures of body composition, such as bioelectrical impedance analysis, have been developed to ease measurement for practitioners. These approaches to body composition estimation also use regression analyses to obtain estimates of percent fat and to quantify the accuracy of those estimates. Understanding the science behind the tests and measurements you choose will inform your decisions and communications to your clients and is part of what will make you a quality exercise science professional.
Measurement and evaluation researchers also have been at the forefront of understanding, developing, and using criterion-referenced cut points to interpret health status. After a test is administered and a score is obtained (i.e., measurement), we then need to make a value judgment about the measurement. That value judgment is an evaluation. To evaluate a score—for example, determining whether the value is in the healthy or unhealthy range—the score needs to be compared to something. When we compare the measurement with a predetermined standard, we are using criterion-referenced standards. Setting criterion-referenced standards for health status is a complex measurement issue that requires collaboration among measurement experts and content experts, like exercise physiologists, athletic trainers, and physical therapists, emphasizing the need to address measurement issues from a multidisciplinary perspective. For youth fitness, scientifically based criterion-referenced standards for aerobic capacity, body composition, muscular strength, muscular endurance, and muscular power have been developed using sophisticated statistical techniques, like lambda-mu-sigma (LMS) growth curves and receiver operating characteristic curve analysis.
With a criterion-referenced framework based on measurements from youth fitness tests, participants are categorized into two or more fitness zones (e.g., Healthy Fitness Zone, Needs Improvement Zone, or Needs Improvement—Health Risk Zone). Setting the appropriate criterion-referenced standards for these zones is an important validity issue. Because validity is about the appropriate use of test scores, a diagnostically accurate criterion-referenced standard helps users of these tests, like physical education teachers or pediatricians, make valid decisions. If, for example, a participant is classified into the Needs Improvement—Health Risk Zone for aerobic capacity, an appropriate interpretation is that they have insufficient aerobic capacity. The exercise science professional might then provide a specific exercise prescription to improve aerobic capacity. Rikli and Jones (2013a, b) published the Senior Fitness Test, a widely used functional fitness test for older adults. They established clinically supportable, criterion-referenced standards for these tests to help test users evaluate the fitness of older adults and to subsequently provide appropriate exercise prescriptions to improve health and fitness of their clients.
Professional Issues in Exercise Science
Does It Matter How You Measure Physical Activity?
Physical activity can be measured a number of ways, including by self-report and with objective measurement devices such as pedometers and accelerometers. Self-reported physical activity can be inaccurate because it is difficult for some people to accurately remember the type, time, and intensity of physical activity they did and because of substantial reporting bias. That is, people may tend to report more activity than they actually performed because they think it is socially desirable to be more active. Objective measurement devices, like wearable physical activity trackers, can assess steps taken and the intensity of movement. In a classic study, Troiano and colleagues (2008) demonstrated that it matters—a lot—how physical activity is measured. These researchers used accelerometer data to objectively measure physical activity in a large, nationally representative sample from the National Health and Nutrition Examination Survey (NHANES). The accelerometer data showed that less than 5% of adults obtained the recommended levels of physical activity. This can be contrasted with self-reported measures of physical activity from NHANES, in which approximately 51% of adults met physical activity recommendations. Although error no doubt exists in every measure of physical activity, it is likely that such a large discrepancy in physical activity levels between self-reported and objectively measured physical activity is mainly due to people greatly overestimating their own physical activity when asked via self-report. The lack of confidence in self-reported physical activity may have been one of the factors that led to the great explosion of commercially available wearable activity trackers such as Fitbit, Apple Watch, Garmin Vívoactive, and Polar Grit, among many others.
Troiano, R.P., Berrigan, D., Dodd, K.W., Masse, L.C., Tilert, T., & McDowell, M. (2008). Physical activity in the United States measured by accelerometer. Medicine and Science in Sports and Exercise, 40(1), 181-188. https://doi.org/0.l249/mss.0b013e31815a5lb3
SHOP
Get the latest insights with regular newsletters, plus periodic product information and special insider offers.
JOIN NOW
Latest Posts
- Stages of learning new motor skills: Bernstein’s model
- Development of the skeletal system during childhood and adolescence
- Characteristics of early overarm throwing
- Execute a perfect pancake takedown to dominate your opponents
- Advocacy, how to best prepare for success, and self-care
- Hydration, sweat loss, and fluid needs