Skip to main content

Table 1 Examples of rules for recognition of study design, population, exposure, outcome, covariate and effect size in epidemiological abstracts

From: Mining characteristics of epidemiological studies from Medline: a case study in obesity

Characteristic (number of rules)

Examples

Identified span (in bold)

Study design (16 rules)

Rule

[@st

a(types)]

Methods: This was a cross-sectional study of 214 overweight/obese …

cross-sectional

study

Population (119 rules)

Rule

a(totals)

re(‘(of|on|in)’)

[@stats

a(clusters)]

Sibling study in a prospective cohort of 208,866 men from …

cohort

of

208,866

men

Rule

@multiple

re(‘with|in|on’)?

[a(clusters)

re(‘with|without’)

@multiple]

bone mineral density in patients with type 2 diabetes

bone mineral density

in

patients

with

type 2 diabetes

Exposure (134 rules)

Rule

a(relations)

eq(‘between’)

[@multiple]

eq(‘and’)

@multiple

… and analyze the association between body mass index and blood pressure in …

association

Between

body mass index

and

blood pressure

Rule

[@multiple]

a(be)

a(related)

a(with)

eq(‘onset’)?

eq(‘of’)?

Short sleep duration is associated with onset of obesity

Short sleep duration

is

associated

with

onset

of

Outcome (100 rules)

Rule

@factors

eq(‘of’)

[@multiple]

Cardiovascular and disease related predictors of depression

predictors

of

depression

Rule

@multiple

a(be)

a(adverbs)

a(related)

a(with)

[@multiple]

Conclusions coffee intake is inversely associated with t2dm in Chinese.

coffee intake

is

inversely

associated

with

t2dm

Covariate (28 rules)

Rule

a(adj)

eq(‘for’)

[@multiple]

… after adjusting for age, smoking status, and clinical history of diabetes mellitus.

adjusting

for

age, smoking status, and clinical history of diabetes mellitus.

Rule

eq(‘including’)

[@multiple]

eq(‘as’)

@synonyms

… including visceral adipose tissue (vat) and subcutaneous adipose tissue (sat) as covariates.

including

visceral adipose tissue (vat) and subcutaneous adipose tissue (sat)

as

covariates

Effect size (15 rules)

Rule

@multiple

[a(preva)

a(be)

@perce]

Hernia prevalence was 32.4%

Hernia

prevalence

was

32.4%

Rule

@multiple

@or

 

@ci

… more likely to have elevated blood pressure (or = 9.05, 95% ci: 1.44, 56.83)

elevated blood pressure

(or = 9.05,

 

95% ci: 1.44, 56.83)

  1. The rule components in square brackets are the extracted spans that denote the key characteristic; the rest of the rule (if any) specifies the context. The rules use explicit matching of spans (e.g. eq(‘onset’)), regular expressions (re) for matching specific verbs or prepositions (e.g. re(‘(of|on|in)’)), various vocabularies that contain single (e.g. a(types) – matching words that indicate the conduction of a study (e.g. study, analysis, review)) and multiword terms (e.g. @st, a vocabulary of epidemiological study designs (e.g. case control)). totals contains words that suggest the participant population; stats is a dictionary that contains numbers and words that express numeric values (e.g., one hundred); clusters includes the variations that a population sample can be described (e.g., men, patients, individuals); multiple contains single or multi-word biomedical concepts (e.g., depression, type 2 diabetes); relations is a dictionary with single words that describe an association between concepts (e.g., relationship, link, association); factors contains single or multi-word terms that describe risk factors (e.g., risk factors, predictors); or is a dictionary that contains noun phrases in which the effect size “odds ratio” can be expressed, including the ways in which its numeric value is presented (e.g., odds ratio = 1.34, or = 2.56); ci follows a similar pattern for confidence interval with its assigned numeric value e.g., (95% ci = 0.91, 95% ci: 4.36, 5.48).