Dietary assessment and physical activity measurements toolkit



Validity is the extent to which a measurement instrument assesses the true exposure of interest.  Validity is a different construct to reliability and is more difficult to measure (Wareham & Rennie, 2000).  The difficulty in validating dietary and physical activity measurements is because ‘TRUTH’ is never known with absolute certainty.  Validity refers to the accuracy of a measurement. Thus a valid dietary report is one that measures true intake during the study period; a valid assessment of physical activity measures accurately physical activity undertaken during the study period.  Poor validity is a result of systematic errors.

Traditionally there are three broad categories of validity:

  • Content validity: the degree to which a test content is tied to the instructional domain it intends to measure
    Example 1
  • Criterion validity: the degree to which a test predicts some criterion;
    Example 2
  • Construct validity: the degree to which a test measures the theoretical construct it intends to measure.
    Example 3
    Example 4

There is an argument that construct is the only ‘type’ of validity as criterion and content validity are estimation strategies for construct validity (Patterson, 2000).
Other definitions

  • Relative validity – an instrument is compared to an instrument of the same kind (there is a potential correlated error).
  • Absolute validity – an instrument is compared to the gold standard measurement.
  • Face validity – the degree to which a questionnaire or other measurement appears to reflect the variable it has been designed to measure.
  • Convergent validity is examined by having several different instruments measure the same construct, with a high degree of agreement or concordance between instruments, including good convergent validity (Macfarlane DJ et al, 2006).
    Example 5

Assessing validity
Methods of assessment are often validated against another method of greater accuracy. In both diet and physical activity measurement, the comparison method should be an objective measure such as biomarkers, doubly labelled water or an accelerometer or combined motion sensor.  Comparison with a non-objective method is strictly speaking a relative validity or a calibration study, rather than a validation study. To avoid the risk of correlated error the comparison method should rely on different methods to obtain data; comparing a 24-hour recall to an estimated food diary carries the risk of correlated under-reporting or an activity diary and a questionnaire may be subject to similar misreporting by individuals.

In physical activity validation of assessment methods have been undertaken in controlled laboratory conditions e.g. accelerometry, heart rate monitoring or combined motion sensors.  This provides robust validation data for these activities however the data may be less robust for other activities.  Using a measure of fitness such as VO2 max as a reference is problematic as it is not necessarily related to physical activity or energy expenditure.  Higher correlations using VO2 max are likely in studies of questionnaires assessing vigorous activity where there is a link to fitness rather than questionnaires looking at total physical activity (Rennie & Wareham, 1998).  For this reason such studies have also been undertaken in semi-controlled conditions in the field encompassing a range of activities over a range of intensities.   

Assessing absolute validity in dietary assessment is also problematic. To detect changes in usual food intake during a study, observations of actual food intake during and either before or after the study period should be compared (Gibson, 2005). This is time consuming and burdensome for both the participants and reseachers.
Relative validity however can be assessed. The 'test' dietary method is evaluated against a 'reference' method which has a greater degree of demonstrated validity. The following should be taken into account though:

  • The reference dietary method used in a validation study must have the same objective and measure dietary intake over the same time period (i.e. current, past or usual intake) as the test method.
  • Usually, the test method should be administered prior to the reference method and the two methods should be spaced apart enough so that completion of the test method does not influence the responses given in the reference method.

Example 6
Example 7

The populations in which validation or calibration studies are undertaken should be considered to ascertain if the results are likely to be generalisable to the population understudy.  This is known as external validity. Age, sex, ethnic origin, socio-economic status may all limit generalisability.

Statistical assessment of validity
There is a consensus that assessment tools used to measure exposures in epidemiology should be validated; the quality of the research is directly related to the quality of the instruments.

Discrepancies in the literature with respect to the statistical methods for the validation of questionnaires and recommendations have been comprehensively reviewed (Schmidt & Steindorf, 2006).  The authors of this paper also performed a systematic review of validations of physical activity questionnaires, which resulted in 46 papers being identified. The majority of these papers (89%) used Pearson’s or Spearman’s correlations to measure validity.  

Criterion validity has typically been assessed by correlation coefficients in the dietary assessment literature also. The method is problematic as it is a measure of association and the presence of correlated error will result in high correlation coefficients between two methods with inherent inaccuracy. 

Checking the agreement between a measurement and its reference measure graphically should be a mandatory step when investigating validity (Schmidt & Steindorf, 2006).  The levels of agreement proposed by Bland and Altman (1988) permit systematic and random errors to be described separately. Using this method the direction of error can be determined, and heteroscedasticity can be estimated. If differences are approximately normally distributed and not related to the magnitude of the measures (homoscedasticity), the systematic bias is estimated by the mean of the differences (m) and the random error is estimated by the standard variation of the differences.  If there is heteroscedascity the data may be logged.  Bland and Altman have defined 95% limits of agreement as m ± 1.96SD.  The interpretation of these limits is that for a randomly selected subject of the general population, the difference between both assessments would be expected to lie within the limits of agreement with approximately 95% probability (Schmidt & Steindorf, 2006). Click here for a graphical representation of Bland Altman plots.

Bland Altman plots are used by the authors in this paper to validate a semi-quantitative food-frequency questionnaire used among 2-year-old Norwegian children: Example 8

Valid methods of measuring diet and physical activity remain a challenge and invalid measures may yield erroneous or inaccurate data which has serious implications.  The hope is that technological advances in both fields will yield methods with increased validity. 


Web design by Studio 24