skip to primary navigation skip to content

Studying at Cambridge

DAPA Measurement Toolkit

 

Harmonisation

;

Introduction

This section outlines two case studies illustrating some of the harmonisation concepts described on the general data harmonisation page . The first example is from the InterConnect consortium, which has used existing data from different study cohorts across the world. The second example is a simulation of harmonisation using validation data. Further case studies are available on the corresponding diet (case study 1) and anthropometry (case study 4) harmonisation pages.

The InterConnect consortium aimed to investigate the association between physical activity during pregnancy and neonatal anthropometric outcomes using a federated meta-analysis approach. The research question required the collection of meta-data from each participating study specifically relating to physical activity. For this exposure, information was collected for:

  • The different dimensions of physical activity assessed
  • Duration, frequency, and intensity of activities
  • Type of work and employment status
  • Information on commuting to and from work

Table P.4.1 summarises the physical activity domains that were assessed in the participating studies. Three studies had assessed household physical activity and four studies had asked about physical activity relating to travel mode. In contrast, all studies had assessed leisure-time physical activity and most had assessed work physical activity or at least had information on occupation type.


Table P.4.1 Physical activity domains that were assessed in the participating studies.
Physical activity domain Study 1 Study 2 Study 3 Study 4 Study 5 Study 6 Study 7 Study 8
Leisure-time Y Y Y Y Y Y Y Y
Work (Y) Y Y Y Y Y Y
Household Y Y Y
Travel Y Y Y Y
Abbreviations: Y indicates the domain was assessed; (Y) indicates the domain was partially assessed.

After discussion, it was decided to focus on harmonising leisure-time physical activity since all studies had assessed this domain. In addition, leisure-time physical activity is important from a public health perspective since it is potentially more amenable to change than other physical activity domains. Within this physical activity domain, it was decided to create the following primary exposure target variables (Note: MET stands for metabolic equivalent of energy expenditure):

  • Duration of leisure-time physical activity (hours per week)
  • Leisure-time physical activity energy expenditure (MET.hours per week)
  • Duration of moderate-vigorous physical activity (hours per week)

Study specific meta-data were reviewed in detail in order to establish which study could create which target variable. When necessary further clarifications on the methods used were sought from participating studies. Table P.4.2 shows the harmonisation potential across studies in the physical activity research exemplar.


Table P.4.2 Harmonisation potential across participating studies.
Target exposure variable Study 1 Study 2 Study 3 Study 4 Study 5 Study 6 Study 7 Study 8
Duration of LTPA Y Y Y Y Y Y Y N
Duration of MVPA Y Y Y Y Y Y (Y) Y
EE of LTPA Y Y Y Y Y Y (Y) (Y)
Abbreviations: Leisure-time physical activity (LTPA); moderate to vigorous physical activity (MVPA); energy expenditure (EE), expressed in metabolic equivalent of energy expenditure (MET) per hour; Y = can be created; (Y) = can be created but with more assumptions and less precision; N = cannot be created.

Having defined the exposure target variables and assessed harmonisation potential, the next task was to create the processing rules to transform the variables from each study into the common target variable format. The processing rules were then explicitly agreed with each participating study. Examples of the available data from two studies and the algorithms or creation rules that were developed to derive each of the three above target variables are shown below in Figures P.4.3 and P.4.4.

Table P.4.3 Pre-existing data and algorithms used to derive three target variables for study A.
Data available Target variable Units Creation rule
Q1. How much do you do the following at present? 
• Jogging • Aerobic
• Ante-natal exercises
• Keep fit exercises • Yoga
• Squash • Tennis/badminton
• Swimming • Brisk walking
• Weight training
• Cycling • Other exercises

Q1 units: >7h/w, 2-6h/w, <1h/w, never

Q2. Nowadays, at least once a week do you engage in any regular activity like brisk walking, gardening, housework, jogging, cycling, etc. long enough to work up a sweat?
 
Q2 units: hours/week
Duration of leisure-time physical activity Hours per week Assign 7 h if >7 h/w, 4 if 2-6 h/w, 0.5 h if <1 h/w, 0 h if never.
Sum up hours/w for all activities
Duration of moderate-vigorous physical activity Hours per week Sum up hours/w for: Jogging, aerobic, keep fit, squash, tennis/badminton, swimming, brisk walking, weight training, cycling, other exercise.
Leisure-time physical activity energy expenditure MET.hours per week Assign intensity (METs): Jogging=7; aerobic=6.5; ante-natal classes/yoga=2.5; keep fit ex=5.5; squash=12; tennis/badminton=5.7; swimming=6; brisk walking=3.8; weight training=3; cycling=6
For target variable 1, ranges from categorical data were recoded to single values; these were then summed.
For target variable 2, the duration data from target variable 1 were cross-checked with MET intensity data from the Compendium of Physical Activities. The duration of an activity was included in the total if its corresponding MET value was ≥3.0.
For target variable 3, the duration data from target variable 1 were combined with MET intensity data from Compendium of Physical Activities by multiplication to derive the estimated energy costs of each activity. These were then summed.
Abbreviations: LTPA: Leisure-time physical activity; MET: Metabolic equivalent task.

Table P.4.4 Pre-existing data and algorithms used to derive three target variables for study B.
Data available Target variable Units Creation rule
Q1. In your spare time did you:

A. Did you take walks for fun in the past week?

B. Did you ride a bicycle in the past week?

C. Did you play sports in the past week? (for example: tennis, handball, gymnastics, fitness, skating, and swimming)

D. Did you do any other physical exercise in your spare time in the past week; for example, working in the garden and doing odd jobs around the house (do not include household activities).

For each question: At what pace do you usually do this?

• relaxed pace
• average pace
• brisk pace

Q1 units: minutes per week
Duration of leisure-time physical activity Hours per week Using Q1 A-C, convert minutes to hours. Add up hours of all activities (at any pace)
Duration of moderate-vigorous physical activity Hours per week Using Q1 A-C, sum up hours/w for:
Walking at average or brisk pace, riding a bicycle at any pace, playing sport at any pace
Leisure-time physical activity energy expenditure MET.hours per week Assign intensity (METs):
Walk relaxed pace=2.5; walk average pace=3.3; walk brisk pace=3.8; cycle relaxed pace=4; cycle at average pace=6; cycle at brisk pace=8; sport relaxed pace=3.8; sport average pace=5.5; sport brisk pace=7
For target variable 1, minutes of all activities were converted to hours; these were then summed.
For target variable 2, the duration data from target variable 1 were cross-checked with MET intensity data from the Compendium of Physical Activities. The duration of an activity was included in the total if its corresponding MET value was ≥3.0. Responses to question 4 were not included as intensity of activity could not be established.
For target variable 3, the duration data from target variable 1 were combined with MET intensity data from Compendium of Physical Activities by multiplication to derive the estimated energy costs of each activity. These were then summed.
Abbreviations: LTPA: Leisure-time physical activity; MET: Metabolic equivalent task.

The short EPIC Physical Activity Questionnaire (EPIC-PAQ) categorises individuals using a four-level “Cambridge index” derived from participant responses to two items (see Figure P.4.5). The index has been validated in 1,941 healthy individuals from ten European countries using the criterion of individually-calibrated combined heart-rate and movement sensing [2]. The two item responses can also be used to derive a binary index (inactive or active), or a 16-level index based on the four levels of occupational cross-tabulated with four levels of leisure-time physical activity shown in Figure P.4.5. Three indices were therefore derived:

  • Binary (inactive or active)
  • Cambridge index (inactive, moderately inactive, moderately inactive, active)
  • 16-level index
Figure P.4.5 Derivation of the four level “Cambridge index”.
Source: [2].

Mapping

The validation data were used to map each of the three indices to corresponding individual estimates of physical activity energy expenditure (PAEE) in kJ/kg/day, as shown in Figure P.4.6. This provided estimates of PAEE for each index category, which could then be assigned to responses from individuals who had not been assessed using the criterion.

Figure P.4.6 Use of validation data to map categories from three indices to estimates of physical activity energy expenditure (kJ/kg/day) assessed by criterion method for men (top) and women (bottom).

Meta-analysis using mapped data

As proof of concept, Cox regression was used to examine the association between PAEE and type 2 diabetes in each InterAct cohort [1]. This was conducted using PAEE mapped from each of the three indices separately. Meta-analyses were conducted across cohorts for men and women (see Figure P.4.7).

Figure P.4.7 Hazard ratios of type 2 diabetes by physical activity energy expenditure exposure mapped from three questionnaire indices in men (top) and women (bottom).

Conclusions

  • Hazard ratios from meta-analysis across PAEE target variables derived from three questionnaire indices indicated that mapping using validation data achieved inferential equivalence.
  • This method of harmonisation could be applied to all studies where validation of assessment method was available.

Limitations/further work

  • Measurement error correction not performed
  • Uncertainty propagation not performed
  • Necessary to develop method for indirectly estimating uncertainty in the mapping procedure
  • Weighting meta-analysis by validity of primary variables
  • Separate modeling of regression dilution ratio using repeated measures sub-studies

References

  1. Langenberg C, Sharp S, Forouhi NG, Franks PW, Schulze MB, Kerrison N, et al. Design and cohort description of the interact project: An examination of the interaction of genetic and lifestyle factors on the incidence of type 2 diabetes in the epic study. Diabetologia. 2011;54(9):2272-82.
  2. Peters T, Brage S, Westgate K, Franks PW, Gradmark A, Tormo Diaz MJ, et al. Validity of a short questionnaire to assess physical activity in 10 european countries. Eur J Epidemiol. 2012;27(1):15-25.
;