KneeTex: an ontology–driven system for information extraction from MRI reports
© Spasić et al. 2015
Received: 27 March 2015
Accepted: 21 August 2015
Published: 7 September 2015
In the realm of knee pathology, magnetic resonance imaging (MRI) has the advantage of visualising all structures within the knee joint, which makes it a valuable tool for increasing diagnostic accuracy and planning surgical treatments. Therefore, clinical narratives found in MRI reports convey valuable diagnostic information. A range of studies have proven the feasibility of natural language processing for information extraction from clinical narratives. However, no study focused specifically on MRI reports in relation to knee pathology, possibly due to the complexity of knee anatomy and a wide range of conditions that may be associated with different anatomical entities. In this paper we describe KneeTex, an information extraction system that operates in this domain.
As an ontology–driven information extraction system, KneeTex makes active use of an ontology to strongly guide and constrain text analysis. We used automatic term recognition to facilitate the development of a domain–specific ontology with sufficient detail and coverage for text mining applications. In combination with the ontology, high regularity of the sublanguage used in knee MRI reports allowed us to model its processing by a set of sophisticated lexico–semantic rules with minimal syntactic analysis. The main processing steps involve named entity recognition combined with coordination, enumeration, ambiguity and co–reference resolution, followed by text segmentation. Ontology–based semantic typing is then used to drive the template filling process.
We adopted an existing ontology, TRAK (Taxonomy for RehAbilitation of Knee conditions), for use within KneeTex. The original TRAK ontology expanded from 1,292 concepts, 1,720 synonyms and 518 relationship instances to 1,621 concepts, 2,550 synonyms and 560 relationship instances. This provided KneeTex with a very fine–grained lexico–semantic knowledge base, which is highly attuned to the given sublanguage. Information extraction results were evaluated on a test set of 100 MRI reports. A gold standard consisted of 1,259 filled template records with the following slots: finding, finding qualifier, negation, certainty, anatomy and anatomy qualifier. KneeTex extracted information with precision of 98.00 %, recall of 97.63 % and F–measure of 97.81 %, the values of which are in line with human–like performance.
Magnetic resonance imaging (MRI) is a technique used to visualise internal body structure by recording radio waves emitted by the tissues in the presence of a strong magnetic field. MRI better differentiates between soft tissues than does X-ray imaging, which uses high frequency electromagnetic waves that pass through soft parts of the human body to create a radiograph, an image resulting from the different absorption rates of different tissues. MRI can also produce three dimensional images. When it comes to diagnosing knee pathology, MRI has the advantage of visualising all structures within the knee joint, i.e. both soft tissue and bone. When used in conjunction with medical history and physical examination, this makes MRI a valuable tool for increasing diagnostic accuracy and planning surgical treatments [1–5]. For example, meniscal tears are a relatively common knee injury, having a prevalence of 22.4 % among all soft tissues injuries seen in a trauma department . The accuracy of diagnosing meniscal tears using individual physical tests is reported to be 74 %, but increases to 96 % when MRI is used . When MRI results are combined with clinical assessments (namely, locking, giving way and McMurray’s test), then their diagnostic performance increases respectively as follows: accuracy – 88.3 %, 89.9 % and 89.4 %, sensitivity – 95.7 %, 97.4 % and 97.4 %, specificity – 74.2 %, 75.8 % and 74.2 %, positive predictive value – 87.5 %, 88.4 % and 87.7 %, and negative predictive value – 90.2 %, 94.0 % and 93.9 % . More recently, the importance of MRI in diagnosis and treatment planning for cases of symptomatic early knee osteoarthritis has been highlighted. If an X–ray image of the knee is found to be normal, but clinical examination produces specific findings, then MRI scan can be performed to establish more accurate diagnosis. It can be used to identify an appropriate surgical or nonsurgical treatment target and decrease the need for costly and invasive diagnostic arthroscopy [1, 7].
In clinical practice, radiology images (e.g. produced by X–ray or MRI) are usually accompanied by imaging reports (or radiology reports), which serve the purpose of conveying a specialist interpretation of images and relate it to the patient’s signs and symptoms in order to suggest diagnosis . This information is then used by clinicians to support decision making on appropriate treatment.
In terms of research, MRI evidence is often used to support epidemiologic studies of knee pathology [9, 10]. In particular, MRI findings are indispensible features of longitudinal studies of knee osteoarthritis [11, 12], where lesions detected by MRI were found to precede onset of clinical symptoms. However, many of published research findings are probably false due to sampling bias and low statistical power . Small sample size is often the cause underlying these two concerns although the relationship is not simple or proportional . Unfortunately, sample size is typically subject to funding and personnel constraints. Given the complexity and cost of manual interpretation of MRI evidence, it is, therefore, not surprising that the size of such epidemiologic studies has been limited to hundreds (e.g. 514 , 710 ) or even dozens of cases (e.g. 20 , 36 ). If interpretation of evidence described in MRI reports could be automated, then it would overcome the size limitation in retrospective cohort studies posed by the need to manually sort through the evidence.
We recently provided a critical overview of the current state of the art for natural language processing (NLP) related to cancer , where clinical narratives such as those found in pathology and radiology reports convey valuable diagnostic information that is predictive of the prognosis and biological behaviour of a disease process . The review highlighted the fact that a range of studies have proven the feasibility of NLP for extracting structured information from free text reports (e.g. [17–21]). For simpler information extraction tasks, human–like performance of automated systems can be expected. For example, when evaluated for the extraction of American College of Radiology utilisation review codes from radiology reports, M+, a system for medical text analysis, achieved recall, precision and specificity of 87, 85 and 98 % respectively . These results were comparable to average recall, precision and specificity recorded by physicians, namely 88, 86 and 98 %. Comparably good results were achieved for more complex tasks such as translating radiology reports into a large database , where the Medical Language Extraction and Encoding (MedLEE) system achieved recall of 81 % and specificity of 99 % with a total of 24 clinical conditions (diseases, abnormalities and clinical states) being the subject of the study. Again these results were comparable to average recall (85 %–87 %) and specificity (98 %) achieved by expert human coders.
Typical processing steps taken in such NLP systems include text segmentation into words, sentences, paragraphs and/or sections, part–of–speech tagging, parsing, named entity recognition (NER), normalisation and negation annotation [17, 23, 24]. Recognition of named entities, i.e. phrases that are used to differentiate between entities of the same semantic type (e.g. Osgood-Schlatter disease is a name used to refer to a specific disease), followed by normalising the representation of their meaning (e.g. Osgood-Schlatter disease is also known as apophysitis of the tibial tubercle or OSD), is the crucial step towards semantic interpretation of clinical narratives. In order to disambiguate named entities and assert relationships between them (e.g. relate disease/disorder, sign/symptom or procedure to an anatomical site), domain–specific knowledge needs to be available in a machine–readable form. For example, the domain knowledge is specified in MedLEE using a table created manually based on domain expertise . Similarly, Medical Text Analysis System (MedTAS) utilises external knowledge resources such as terminologies and ontologies . Alternatively, M+ uses Bayesian Networks to represent semantic types and relations within a specific medical domain such as that of chest radiology reports . Ideally, when a suitable ontology is available it can be used to add an explicit semantic layer over text data by linking domain–specific terms, i.e. textual representation of concepts, to their descriptions in the ontology . This allows text to be mined for interpretable information about domain–specific concepts and their relationships.
Between January 2001 and May 2012, a total of 6,382 individuals with an acute knee injury attended the Acute Knee Screening Service at the Emergency Unit of the Cardiff and Vale University Health Board (C&V UHB). A subset of 1,657 individuals fulfilled locally agreed clinical criteria for an MRI scan. Both the clinical assessment and MRI findings for these individuals were stored in a clinical database on a secure server within the C&V UHB. This database was originally developed for the purposes of service evaluation and auditing practice. Out of 1,657 referred individuals, a total of 1,468 MRI scan visits were identified retrospectively from the database records. Following an MRI scan, the imaging results were summarised by a radiologist (from a team of five) in a diagnostic narrative report that conveys a specialist interpretation of the MRI scan and relates it to the patient’s signs and symptoms. These MRI reports formed the dataset used in this study.
All reports were anonymised by removing all identifiable information related to either patient or radiologist together with the attendance date and the links to the patient’s assessment and treatment details. The anonymised reports were transferred to an encrypted memory stick that was password protected and locked in a filing cabinet in a lockable room. Ethical approval for this study was obtained from the South East Wales Research Ethics Committee (10/MRE09/29).
The size of the overall dataset was 1,002 KB with a total of 13,991 sentences, 178,931 tokens, 3,277 distinct tokens and 2,681 distinct stems. On average, the size of an individual MRI report was 0.68 KB (±0.40 KB) with a total of 9.53 (±5.13) sentences and 110.81 (±64.60) tokens.1 We separated the data into training and testing sets. A test set was created by randomly selecting a subset of 100 MRI reports from the overall dataset. These reports were then removed from consideration so that the performance of the system could later be evaluated on unseen data. The remaining 1,368 reports formed a training set, which was used to inform system development.
We previously developed TRAK as an ontology that formally models knowledge relevant for the rehabilitation of knee conditions . This information includes classification of knee conditions, detailed knowledge about knee anatomy and an array of healthcare activities that can be used to diagnose and treat knee conditions. Therefore, TRAK provides a framework that can be used to collect coded data in order to support epidemiologic studies much in the way Read Codes, a coded thesaurus of clinical terms , are used to record observational data in the Clinical Practice Research Datalink (CPRD) – formerly known as the General Practice Research Database (GPRD) . TRAK follows design principles recommended by the Open Biomedical Ontologies (OBO) Foundry and is implemented in OBO , a format widely used by this community. Its public release can be accessed through BioPortal , a web portal that provides a uniform mechanism to access biomedical ontologies, where it can be browsed, searched and visualised.
TRAK was initially developed with a specific task in mind – to formally define standard care for the rehabilitation of knee conditions. At the same time, it was designed to be extensible in order to support other tasks in the domain. For example, the knowledge about knee anatomy, which is cross–referenced to a total of 205 concepts in the Foundational Model of Anatomy (FMA) , is directly applicable to interpretation of reports describing knee MRI scans. However, in order to fully support semantic interpretation of this type of clinical narratives, the TRAK ontology needed to be expanded with other types of MRI–specific concepts.
In order to support semantic interpretation of the terminological content found in knee MRI reports, we needed to ensure that all relevant concepts are modelled appropriately in the TRAK ontology. The main aspect of this task was the expansion of a specific domain modelled by the ontology, for example, MRI–specific observations such as hyaline cartilage abnormality, bone bruise, cyclops lesion, etc. In order to support NLP applications of the ontology, its vocabulary also needed to be expanded to include term variants commonly used in MRI reports. Some term variants are confined to a specific clinical sublanguage  and as such are typically underrepresented in standardised medical dictionaries such as those included in the Unified Medical Language System (UMLS) . For example, collateral ligament was found to have no other synonyms in the UMLS. Yet, collateral ligaments are colloquially referred to as collaterals in clinical narratives. Thus, out of 37 references to collateral ligaments in the training dataset, six (i.e. 16 %) accounted for this informal variant of the term.
We devised four strategies for systematic expansion of the coverage of the TRAK ontology. Three of these strategies were data–driven. This was to ensure that the ontology is appropriate for the intended NLP applications on such data. Each data–driven strategy utilised a different approach to extracting the relevant terminology from the data either manually or automatically. The fourth strategy was based on integration of known concepts from other relevant knowledge sources. The two main aims of this strategy were: (1) to avoid overfitting the ontology based on limited data used in the data–driven strategies, and (2) to provide an initial taxonomic structure to incorporate new concepts.
Strategy 1: dictionary-based term recognition
Strategy 2: automatic term recognition
Using the UMLS to identify relevant concepts in text data has the advantage of providing not only their definitions and synonyms, but also their classification and a potential structure into which to embed them within the TRAK ontology. However, a previous lexical study conducted on a large corpus of various types of medical records (discharge summaries, radiology reports, progress notes, emergency room reports and letters) revealed that clinical narratives are characterised by a high degree of misspellings, abbreviations and non–standardised terminology . The given study found that over 20 % of the words used were unrecognisable, i.e. were not recognisable medical words, common words or names, and could not be algorithmically or contextually converted to such words. However, almost 78 % of unrecognisable words were judged to be probably correctly spelled medical terms. These findings illustrate the challenges clinical narratives pose to dictionary–based term recognition methods such as that implemented by MetaMap.
In order to extract additional terms from the training dataset that were not found in the UMLS, we complemented the use of MetaMap with FlexiTerm, our own data–driven method for automatic term recognition from a domain–specific corpus . For the original publication, we thoroughly evaluated FlexiTerm on five biomedical corpora including a subset of 100 MRI reports from the dataset used in this study. The highest values for precision (94.56 %), recall (71.31 %) and F-measure (81.31 %) were achieved on this particular corpus.
FlexiTerm performs recognition of multi–word terms in two steps: linguistic filtering is used to select term candidates followed by calculation of termhood, a corpus–based measure that combines strength of collocation with frequency of occurrence. Termhood values are used as evidence to select higher–ranked candidates as terms over the lower–ranked ones. In order to improve statistical distribution of termhood values, which may be affected by term variation phenomena, FlexiTerm uses a range of methods to neutralise the main causes of term variation and thereby aggregate termhood values that would otherwise be dispersed across different variants of the same term. Firstly, FlexiTerm manages syntactic variation by processing term candidates using a bag–of–words approach. Further, orthographic and morphological variations are neutralised by stemming in combination with lexical and phonetic similarity measures. Consequently, FlexiTerm not only extracts terms from text, but it also groups term variants such as infrapatellar fat pad, infra-patella fat pad and infra-patellar fat pad together. This allows for identification of new concepts (e.g. posterior horn ranked seventh by FlexiTerm was added as a new concept in TRAK), but also identification of previously unknown names of existing concepts, which are easily mapped to a concept via its known names. For example, lateral femoral condyle was identified as a new synonym for a concept with identifier TRAK:0001037 previously known only as lateral condyle of femur.
Strategy 3: manual data annotation
As part of developing and testing our information extraction system, we manually annotated the test set and a portion of the training set. The annotated test set was later used to create a gold standard to evaluate the system, whereas an annotated subset of 100 training documents was used not only to test the system during its development, but also to inform the expansion of the ontology with terms manually annotated in text. This strategy offers a potential to identify additional concepts and their names, particularly those that are non–standardised and occur less frequently in the training dataset. Recall that MetaMap identifies concepts based solely on the content of standardised medical dictionaries included in the UMLS. On the other hand, FlexiTerm may identify some non–standardised terminology, but in doing so it relies on the frequency of term occurrence. Moreover, FlexiTerm only extracts multi–word terms, thus ignoring concepts designated by a single word (e.g. fissure, ganglion, etc.). In addition to enabling us to identify relevant concepts overlooked by the previous two strategies, the annotation exercise allowed us to explore in detail how the terms were used in context, which helped disambiguate their meaning based on which they were embedded into the existing ontology structure.
There is definite complex tearing of the posterior horn and body of the medial meniscus .
tearing represents the finding, definite its certainty, complex its qualifier, medial meniscus the anatomical entity affected, whereas posterior horn and body are anatomy qualifiers that provide more specific location for the given finding.
A total of 484 unique phrases (not necessarily terms) with 2,071occurrences were annotated as instances of finding, 113 unique phrases with 284 occurrences as instances of finding qualifier, 68 unique phrases with 202 occurrences as instances of certainty, 208 unique phrases with 1,232 occurrences as instances of anatomy, and finally 178 unique phrases with 469 occurrences as instances of anatomy qualifier. The fact that these phrases were pre–classified into four broad categories allowed us to focus on particular branches of the TRAK hierarchy one at a time. In addition, some categories (e.g. anatomy and anatomy qualifier) were already extensively covered by the TRAK ontology. Therefore, the removal of 137 known terms referring to 60 TRAK concepts from unnecessary consideration greatly facilitated the manual curation and allowed us to consider all remaining phrases for potential inclusion in TRAK.
Strategy 4: manual dictionary search
So far, all three strategies for identification of new ontology concepts relied on the training dataset from which candidates were selected using a combination of automatic and manual methods. These data–driven approaches runs a risk of overfitting the ontology based on the available data, which may result in incomplete coverage of the domain simply because some concepts (possibly the ones less frequently encountered in practice) were not mentioned in the available sample of MRI reports. In order to systematically cover the domain by including potentially relevant concepts that are not seen in the training dataset, we consulted two authoritative knowledge sources relevant for semantic interpretation of MRI reports.
magnetic resonance imaging of knee: acute osteochondral injury of posterior aspect of lateral femoral condyle
osteochondral injury represents the finding, acute its qualifier, lateral femoral condyle the anatomical entity affected, whereas posterior aspect is its qualifier, which provides more specific location for the given finding. Most anatomical concepts were already covered in TRAK, so the manual curation process focused mainly on concepts related to findings and their qualifiers. The resulting list consisted of 76 concepts, which were then manually curated.
Once coded, the extracted information can be searched systematically. For instance, note that in the given examples equivalent phrases, posterior horn and posterior third, were mapped to the same concept, which allows for the extracted information to be searched by the underlying meaning and not merely its surface realisation in text. Note that KneeTex is an IE system and as such does not include an interface to search through the extracted information. However, the JSON format of extracted information allows for it to be stored directly into a document–oriented database such as MongoDB , from which it can be easily queried.
Previous lexical analysis of a large corpus of various types of clinical narratives (discharge summaries, radiology reports, progress notes, emergency room reports and letters) revealed that they are characterised by a high degree of misspellings, abbreviations and idioms . However, the analysis of our training corpus revealed a total of 1,138 typographical errors averaging at 0.83 errors per document. The low percentage of typographical errors was not expected to significantly hinder subsequent processing. Therefore, we supported only traditional elements of linguistic pre–processing (i.e. tokenisation and sentence splitting) in this module and dealt with typographical errors and spelling mistakes by choosing a method for named entity recognition that is robust against such variations.
Having sufficiently expanded the original TRAK ontology, its vocabulary can now be used to drive named entity recognition, whose aim is to automatically identify and classify words and phrases into predefined categories such as diseases, symptoms, anatomical entities, etc. In effect, NER is used here to identify candidates for slot fillers and as such represents the main vehicle of IE. The performance of dictionary–based NER approaches varies across different dictionaries and tools. A recent evaluation of three such state–of–the–art tools on a set of eight biomedical ontologies showed that their performance in terms of F–measure varied from 14 % to 83 % . ConceptMapper (a component of the Apache UIMA Sandbox ) generally provided the best performance. Beside performance, we considered the ease of use. While converting an OBO ontology to ConceptMapper’s dictionary format is straightforward, one must adopt the UIMA framework in order to use this particular component. For flexibility reasons, we opted to use PathNER  as an alternative to ConceptMapper.
PathNER (Pathway Named Entity Recognition) is a freely available tool originally developed for systematic identification of pathway mentions in the literature. On a pathway–specific gold–standard corpus, PathNER achieved F–measure of 84 % . It implements soft dictionary matching by utilising the SoftTFIDF method , a combination of the term frequency–inverse document frequency (TF–IDF)  and the Jaro–Winkler distance . This makes the dictionary lookup robust with respect to the problem of term variation commonly seen in biomedical text, which often causes dictionary lookup based on exact string matching to fail . Typical term variations include morphological variation, where the transformation of the content words involves inflection (e.g. lateral meniscus vs. lateral menisci) or derivation (e.g. meniscus tear vs. meniscal tear), and syntactic variation, where the content words are preserved in their original form (e.g. apex of patella vs. patella apex) .
HISTORY Twisting injury, ACL rupture and medial meniscal tear.
HISTORY Twisting injury, <term id = “0000513” > ACL rupture</term > and medial meniscal tear.
HISTORY Twisting injury, <term id = “0000049” > ACL</term > <term id = “0000211” > rupture</term > and medial meniscal tear.
The < term id = “0000049” > ACL</term > appears chronically < term id = “0000211” > ruptured</term > .
To address these issues, we simply ignored those branches of the ontology that include composite terms and did not export them from OBO into PathNER’s internal dictionary format. At this point, we addressed two other problems associated with NER, namely ambiguity resolution and recognition of informal names. For example, we noticed that the term joint effusion (TRAK:0001410) defined in TRAK as “Increased fluid in synovial cavity of a joint” was commonly used in our dataset to refer to its child node knee effusion (TRAK:0001411). Safely assuming that in the context of knee MRI reports, joint effusion will always refer to knee effusion, we ignored the concept identified by TRAK:0001410 and did not export it into PathNER’s dictionary format. Instead, a dictionary entry was created to map joint effusion to TRAK:0001411 instead in order for PathNER to recognise its intended meaning within the given context.
There is a large < term id = “0001089” > lateral meniscal</term> <term id = “0001396” > cyst</term > .
lateral meniscal refers to lateral meniscus (TRAK:0001089) in which the finding, i.e. cyst (TRAK:0001396), is noted, but it would be incorrect to specify it formally as an official synonym of that term within the ontology. Instead, we encoded “unofficial” synonyms separately within PathNER’s dictionary, thus enabling the use of informal synonyms in NER while preserving the strict formality of the ontology. It was in this manner that the verb form ruptured was mapped to the term rupture (TRAK:0000211) in a previously discussed sentence. In total, the names of 128 concepts were ignored during ontology–to–dictionary conversion and 250 new entries were added to the dictionary.
Named entity recognition
The < term id = “0000045” > menisci</term > are intact.
There are tears in the posterior horns of both < term id = “0000045” > menisci</term > .
There is some peripheral signal in both the medial and < term id = “0001089” > lateral meniscus</term > posteriorly.
This possibly represents a tiny peripheral vertical < term id = “0001390” > longitudinal tear</term > .
namely, longitudinal tear (TRAK:0001390), vertical tear (TRAK:0001388) and peripheral tear (TRAK:0001389), but only the rightmost one would be recognised by PathNER. Finally, in phrases such as such as medial meniscectomy, patellar tendinitis and prepatellar bursitis, PathNER will succeed in identifying terms referring to findings, i.e. meniscectomy (TRAK: 0001511), tendinitis (TRAK: 0000229) and bursitis (TRAK: 0000225), but it will not recognise implicit references to the anatomical entities affected, i.e. medial meniscus (TRAK: 0001090), patellar tendon (TRAK: 0000053) and prepatellar bursa (TRAK: 0001054).
On the training subset of 100 documents, this approach resulted in 430 annotations in addition to 4,439 generated by PathNER, which accounts for approximately 9 % of all named entities recognised.
<negation > No</negation > bone marrow < finding > lesion </finding > identified.
His patella lies tilted laterally though it has < negation > not </negation > <finding > subluxed </finding > .
There is general attenuation of the body of the medial meniscus < negation > without </negation > a discrete < finding > tear</finding > .
This represents residual vascularity < negation > rather than</negation > a < finding > tear</finding > .
There is a very large < finding > cartilage defect </finding > over the weight bearing surface of the medial femoral condyle. There is no further < finding > cartilage defect</finding > .
Although their structure varied across the data set, the given MRI reports generally tended to organise information under the following headings: mri of the left/right knee, indication, history, findings and conclusion. Their lexical and orthographic features were incorporated into a single pattern–matching rule designed to recognize a section heading as a sequence of upper case tokens from a list of fifteen.
Named entity disambiguation
Once recognised, named entities are imported into a relational database and further scrubbed in order to disambiguate them. Semantic ambiguity may arise naturally from linguistic phenomena such as hyponymy, a relationship between a general term (hypernym) and its more specific instances (hyponyms), and polysemy, where a term may have multiple meanings. Multiple related interpretations may also arise from nested occurrences of named entities.
There is oedema superficial to the < term id = “0000051” > medial collateral ligament</term > consistent with a sprain but the < term id = “0001027” > ligament</term > is intact.
There is oedema superficial to the < term id = “0000051” > medial collateral ligament</term > consistent with a sprain but the < term id = “0000051” > ligament</term > is intact .
This type of ambiguity is resolved systematically by identifying coreferential named entities, i.e. those that refer to the same concept. Coreference resolution is applied to named entities recognised as one of the following concepts: meniscus (TRAK:0000045), ligament (TRAK:0001027) or tendon (TRAK:0000046). In such cases, coreference is resolved by looking for previous mentions of their ontological descendants.
There is oedema in the soft tissues suggesting < term id = “0000211” > rupture</term > of the < term id = “0000222” > popliteal cyst</term > .
There is oedema in the soft tissues suggesting < term id = “0001461” > rupture</term > of the < term id = “0000222” > popliteal cyst</term > .
There is grade 3 < term id = “0000211” > rupture</term > of the MCL.
During dictionary lookup, PathNER will return longest possible matches with similarity scores over a certain threshold. As a result, there will be no overlap between named entities recognised in this manner. However, pattern matching used in the second phase of NER may introduce nested annotations of named entities. For example, in the coordinated expression medial and lateral meniscus, PathNER will recognise two terms from the TRAK ontology: medial (TRAK:0000031) and lateral meniscus (TRAK:0001089). Pattern matching will subsequently recognise a coordinated expression as a reference to medial meniscus (TRAK:0001090). The nested occurrence of lateral meniscus should be retained as a valid reference to a named entity. However, the nested occurrence of medial represents an unsuccessful match to another named entity, medial meniscus, and thus should be removed. The choice between retaining and removing nested occurrences of named entities is based on their semantic types. For example, all nested occurrences of terms descending from the concept quality (TRAK:0000133) defined as “a dependent entity that inheres in a bearer by virtue of how the bearer is related to other entities” are removed. This will remove nested occurrence of medial in the previous example, but also references to radial (TRAK:0001531) and vertical (TRAK:0000077) in the example shown in Table 4.
history Twisting injury, tender medial joint line, positive McMurray’s
There is oedema in the soft tissues at the posterolateral corner but the popliteal tendon is intact consistent with a sprain.
There is oedema in the soft tissues at the posterolateral corner
but the popliteal tendon is intact
consistent with a sprain
Segmentation greatly simplifies subsequent context analysis. When used in combination with the ontology to infer relationships between named entities, segmentation minimises the need for complex syntactic analysis. In fact, other than analysing prepositional phrases, no other syntactic analysis is performed as part of template filling in KneeTex. Alternatively, syntactic parsing can be used to support text segmentation, but such an approach would be more computationally intensive and not necessarily improving the accuracy. Due to ill formed sentences in clinical narratives, lexical rules may be more robust.
We previously described how the ontology, or more specifically – its vocabulary, is used to support NER as the first step in IE. Template filling as its final step is also driven by the ontology, or more specifically – its structure, i.e. relationships between concepts. This involves accessing information about semantic types by traversing the is–a hierarchy in order to identify slot filler candidates. In addition, relationships between the concepts are used to check compatibility between potential slot fillers. For example, if the extracted finding is a tear, then the anatomical entity affected must be soft tissue such as ligament or tendon. Similarly, if the affected anatomical entity is cartilage, then its qualifier must be related to bone or joint. We originally considered using OntoCAT for this purpose, as it provides a programming interface to query ontologies shared on BioPortal or user–specified local OBO files . However, this would separate ontology querying from querying data, which are stored in a relational database. In order to enable integrative querying of both data and knowledge, we imported the ontology into the database. This allowed us to implement ontology–driven IE as a series of SQL queries that simultaneously access data and the ontology. The remainder of this section describes the template filling process, where all semantic interpretations mentioned imply the use of such queries.
Slot filler candidates
Mapping between template slots and semantic types
Direct fall onto anterior tibia.
There is some oedema superficial to the MCL.
The ACL returns abnormal signal.
There is slight thickening of the medial collateral ligament.
The articular cartilage is unremarkable.
There is a small Baker’s cyst.
Physiological condition descriptor
No evidence of articular cartilage damage.
Presumably this had been excised during the ACL reconstruction.
Incidental note is made of a simple popliteal cyst.
There is focal hyaline cartilage fissuring.
This could represent a longitudinal split.
There are also several loose bodies.
There is a small Baker’s cyst.
HISTORY Squash injury.
Stage of healing descriptor
There is a healing tear of the medial collateral ligament.
Focal area of severe chondromalacia in the medial compartment.
There is acute ACL tear.
This raises the possibility of a previous patella dislocation.
Normal appearance of the articular cartilage.
The menisci , collateral ligaments and the PCL are intact.
Anatomical location descriptor
There is some oedema superficial to the MCL.
General anatomical term
There is a lot of oedema in the ACL fibres.
Complex tear of posterior horn of the lateral meniscus.
There is early fissuring and irregularity of the < slot = “anatomy” > hyaline cartilage</slot > of the < slot = “anatomy qualifier” > lateral patellar facet</slot > .
This is based on an observation that the finding will most likely apply to cartilage as an object, an observation drawn from the training data.
Additional text segmentation
The cruciate and collateral ligaments are <slot=“finding”>intact</slot > .
There is no <slot=“finding”>oedema</slot> in the lateral femoral condyle and the ACL is <slot=“finding”>intact</slot>.
There is no <slot=“finding”>oedema</slot> in the lateral femoral condyle
and the ACL is <slot=“finding”>intact</slot>.
There is marrow oedema at <NP>the <slot=“anatomy qualifier”>medial</slot> aspect of the <slot=“anatomy”>patella</slot></NP> and <NP>the <slot=“anatomy qualifier”>lateral</slot> aspect of the <slot=“anatomy”>lateral femoral condyle</slot></NP>.
the structure of noun phrases, denoted here by the NP tag, is used to link anatomy and anatomy qualifier slot fillers.
There is a vertical longitudinal tear of the peripheral aspect of the posterior third of the <slot=“anatomy”>medial meniscus</slot>. The <slot=“finding”>tear</slot> does not appear significant.
history <slot=“anatomy”>Lateral joint line</slot> <slot = “finding” > tenderness</slot > <slot=“anatomy”>meniscal</slot> <slot=“finding”>tear</slot>
There is <slot=“finding”>bone bruising</slot> or <slot=“finding”>subchondral marrow oedema</slot> at the <slot=“anatomy”>inferior patella</a>.
A test dataset was created as a subset of 100 MRI reports selected randomly from the dataset described previously and removed from consideration prior to system development. Its sole purpose was to test the performance of the system on unseen data. In order to create a gold standard, the test dataset was annotated manually by two independent annotators (see Data).
where Ao is observed agreement (i.e. the proportion of items on which both annotators agree) and Ae is expected chance agreement calculated under the assumption that: (1) both annotators act independently, and (2) random assignment of annotation categories to items, by either coder, is governed by distribution of items across these categories . Fleiss’ Kappa coefficient is measured on a −1 to 1 scale, where 1 corresponds to perfect agreement, 0 corresponds to chance agreement and negative values indicate potential systematic disagreement between the annotators.
Evaluation results. Performance of the system on the test set
In order to assess the generalizability of the system we conducted a series of stage–wise experiments in which we removed new concepts identified from the training dataset by using strategies 1–3. We specifically focused on concepts outside of the finding descriptor class for two reasons. Firstly, this class corresponds to the RadLex descriptor branch of the RadLex hierarchy and its dependency on the training data is minimal. Secondly, concepts from this class are used to fill three “leaf” slots (finding qualifier, anatomy qualifier and certainty – see Table 7) that have no further dependencies (see Fig. 1) and as such will have no ripple effect on the template filling unlike finding and anatomy slots. For example, if finding is not identified, it will affect text segmentation as well as linking to other slot fillers. Therefore, the highest impact on evaluation results would be caused by concepts outside the finding descriptor branch.
Some of the errors stem from incorrectly recognised named entities. For example, in segment “his patella tends to lie tilted laterally,” string similarity caused PathNER to incorrectly recognise patella tends as patellar tendon, therefore failing to extract patella instead.
There is a cleavage tear of the lateral meniscus at the junction of the body and posterior horn which extends through the body but there is currently no evidence of a significant meniscal cyst.
since body of meniscus (TRAK:0001346) most specifically represents its middle third.
However, the system failed to make use of the clue absent found at the end of the sentence to recognise that these findings are actually negative. These errors will be used to inform future improvements of the system.
In this paper, we described KneeTex, an ontology–driven system for information extraction from narrative reports that describe an MRI scan of the knee. The system exhibited human–level performance on a gold standard. Such performance can be attributed partly to the use of a domain–specific ontology, which serves as a very fine–grained lexico–semantic knowledge base and plays a pivotal role in guiding and constraining text analysis. In this context, the ontology proved to be highly attuned to the given sublanguage. The extent of knowledge engineering involved in the development of domain–specific ontologies with sufficient detail and coverage for text mining applications is known to present a major bottleneck in deep semantic NLP. Therefore, many NLP systems compensate for the lack of suitable semantic resources by resorting to extensive syntactic analysis and heuristic approaches that operate at the level of the textual surface.
We have adopted an alternative approach based on a set of strategies that can be used to systematically expand the coverage of existing ontologies or to develop them from scratch. Three of these strategies are data–driven and as such are more likely to ensure that the ontology effectively supports the intended NLP application. Each data–driven strategy utilises a different approach to extracting the relevant terminology from the data either manually or automatically. The fourth strategy is based on integration of concepts from other relevant knowledge sources. The two main aims of this strategy are: (1) to avoid the overfitting of the ontology to limited data available, and (2) to provide an initial taxonomic structure to incorporate new concepts.
In this study, we illustrated how these strategies were implemented in practice to expand the coverage of the TRAK ontology to make it suitable for a specific NLP application. The evaluation results confirm that KneeTex succeeded in making effective use of the ontology to support IE from knee MRI reports. Previously, we integrated TRAK into web and smartphone applications that provide remote support for knee rehabilitation and for collection of data that can support randomised control trials. Here we have demonstrated how the ontology was repurposed to support an NLP application within a clinical setting. In both cases, formally structured and coded datasets can be easily integrated to support large–scale multi–faceted epidemiologic studies of knee conditions.
Availability and requirements
Project name: KneeTex
Project home page: http://www.cs.cf.ac.uk/kneetex
Operating system(s): Platform independent
Programming language: Java
Other requirements: None
Any restrictions to use by non-academics: None
The values given in brackets refer to standard deviation.
Mixup is a pattern language for text spans, i.e. token sequences. Keyword defSpanType defines a span type whose structure is specified to the right of the equal sign, where square brackets [ and ] indicate the start and end of a span respectively, eqi(‘foo’) matches the token foo and … matches any sequence of tokens. Postfix operator ? specifies that the preceding token can be matched either once or not at all. Finally, operator || is used to specify alternative patterns.
We would like to thank Thomas Edwards and David Rogers for their assistance in testing the software.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Wenham C, Grainger A, Conaghan P. The role of imaging modalities in the diagnosis, differential diagnosis and clinical assessment of peripheral joint osteoarthritis. Osteoarthr Cartil. 2014;22:1692–702.View ArticleGoogle Scholar
- Pompan DC. Reassessing the role of MRI in the evaluation of knee pain. Am Fam Physician. 2012;85:221–4.Google Scholar
- Grover M. Evaluating acutely injured patients for internal derangement of the knee. Am Fam Physician. 2012;85:247–52.Google Scholar
- Yan R, Wang H, Yang Z, Ji ZH, Guo YM. Predicted probability of meniscus tears: comparing history and physical examination with MRI. Swiss Med Wkly. 2011;141:w13314.Google Scholar
- Konan S, Rayan F, Haddad FS. Do physical diagnostic tests accurately detect meniscal tears? Knee Surg Sports Traumatol Arthrosc. 2009;17:806–11.View ArticleGoogle Scholar
- Clayton RAE, Court-Brown CM. The epidemiology of musculoskeletal tendinous and ligamentous injuries. Injury. 2008;39:1338–44.View ArticleGoogle Scholar
- Luyten FP, Denti M, Filardo G, Kon E, Engebretsen L. Definition and classification of early osteoarthritis of the knee. Knee Surg Sports Traumatol Arthrosc. 2012;20:401–6.View ArticleGoogle Scholar
- The Royal College of Radiologists. Standards for the reporting and interpretation of imaging investigations. 2006. http://www.rcr.ac.uk/.Google Scholar
- Roemer FW, Guermazi A, Felson DT, Niu J, Nevitt MC, Crema MD, et al. Presence of MRI-detected joint effusion and synovitis increases the risk of cartilage loss in knees without osteoarthritis at 30-month follow-up: the MOST study. Clin Epidemiol Res. 2011;70:1804–9.Google Scholar
- Guermazi A, Niu J, Hayashi D, Roemer F, Englund M, Neogi T, et al. Prevalence of abnormalities in knees detected by MRI in adults without knee osteoarthritis: population based observational study (Framingham Osteoarthritis Study). BMJ. 2012;345, e5339.View ArticleGoogle Scholar
- Pessis E, Drapé J-L, Ravaud P, Chevrot A, Dougados M, Ayral X. Assessment of progression in knee osteoarthritis: Results of a 1 year study comparing arthroscopy and MRI. Osteoarthr Cartil. 2003;11:361–9.View ArticleGoogle Scholar
- Javaid MK, Lynch JA, Tolstykh I, Guermazi A, Roemer F, Aliabadi P, et al. Pre-radiographic MRI findings are associated with onset of knee symptoms: The MOST study. Osteoarthr Cartil. 2010;18:323–8.View ArticleGoogle Scholar
- Ioannidis JPA. Why most published research findings are false. PLoS Med. 2005;2, e124.View ArticleGoogle Scholar
- Button KS, Ioannidis JPA, Mokrysz C, Nosek BA, Flint J, Robinson ESJ, et al. Power failure: why small sample size undermines the reliability of neuroscience. Nat Rev Neurosci. 2013;14:365–76.View ArticleGoogle Scholar
- Spasić I, Livsey J, Keane J, Nenadić G. Text mining of cancer-related information: review of current status and future directions. Int J Med Inform. 2014;83:605–23.View ArticleGoogle Scholar
- Mohanty SK, Piccoli AL, Devine LJ, Patel AA, William GC, Winters SB, et al. Synoptic tool for reporting of hematological and lymphoid neoplasms based on World Health Organization classification and College of American Pathologists checklist. BMC Cancer. 2007;7:144.View ArticleGoogle Scholar
- Friedman C, Alderson P, Austin J, Cimino J, Johnson S. A general natural-language text processor for clinical radiology. J Am Med Inform Assoc. 1994;1:161–74.View ArticleGoogle Scholar
- Hripcsak G, Austin JH, Alderson PO, Friedman C. Use of natural language processing to translate clinical information from a database of 889,921 chest radiographic reports. Radiology. 2002;224:157–63.View ArticleGoogle Scholar
- Mamlin BW, Heinze DT, McDonald CJ. Automated extraction and normalization of findings from cancer-related free-text radiology reports. In: Proceedings of the AMIA Annual Symposium. 2003. p. 420–4.Google Scholar
- Dang PA, Kalra MK, Blake MA, Schultz TJ, Halpern EF, Dreyer KJ. Extraction of recommendation features in radiology with natural language processing: exploratory study. Am J Roentgenol. 2008;191:313–20.View ArticleGoogle Scholar
- Burnside ES, Davis J, Chhatwal J, Alagoz O, Lindstrom MJ, Geller BM, et al. Probabilistic computer model developed from clinical data in national mammography database format to classify mammographic findings. Radiology. 2009;251:663–72.View ArticleGoogle Scholar
- Christensen LM, Haug PJ, Fiszman M. MPLUS: a probabilistic medical language understanding system. In: ACL-02 Workshop on Natural Language Processing in the Biomedical Domain; Philadelphia, PA. 2002. p. 29–36.View ArticleGoogle Scholar
- Savova GK, Masanz JJ, Ogren PV, Zheng J, Sohn S, Kipper-Schuler KC, et al. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. J Am Med Inform Assoc. 2010;17:507–13.View ArticleGoogle Scholar
- Crowley RS, Castine M, Mitchell K, Chavan G, McSherry T, Feldman M. caTIES: a grid based system for coding and retrieval of surgical pathology reports and tissue specimens in support of translational research. J Am Med Inform Assoc. 2010;17:253–64.View ArticleGoogle Scholar
- Coden A, Savova G, Sominsky I, Tanenblatt M, Masanz J, Cooper KSJ, et al. Automatically extracting cancer disease characteristics from pathology reports into a Disease Knowledge Representation Model. J Biomed Inform. 2009;42:937–49.View ArticleGoogle Scholar
- Spasić I, Ananiadou S, McNaught J, Kumar A. Text mining and ontologies in biomedicine: making sense of raw text. Brief Bioinform. 2005;6:239–51.View ArticleGoogle Scholar
- Button K, van Deursen RW, Soldatova L, Spasić I. TRAK ontology: Defining standard care for the rehabilitation of knee conditions. J Biomed Inform. 2013;46:615–25.View ArticleGoogle Scholar
- Crockford D. Introducing JSON. 2009. http://json.org/.Google Scholar
- Cowie J, Lehnert W. Information extraction. Commun ACM. 1996;39:80–91.View ArticleGoogle Scholar
- Jacobson I, Booch G, Rumbaugh J. The Unified Software Development Process. Boston, USA: Addison-Wesley Professional; 1999Google Scholar
- Radiological Society of North America. MR Knee. 2012. http://www.radreport.org/template/0000057.Google Scholar
- Stenetorp P, Pyysalo S, Topić G, Ohta T, Ananiadou S, Tsujii J. BRAT: a web-based tool for NLP-assisted text annotation. In: The 3th Conference of the European Chapter of the Association for Computational Linguistics; Avignon, France. 2012. p. 102–7.Google Scholar
- Health & Social Care Information Centre. Read Codes. 2015. http://systems.hscic.gov.uk/data/uktc/readcodes.Google Scholar
- Herrett E, Thomas SL, Schoonen WM, Smeeth L, Hall AJ. Validation and validity of diagnoses in the General Practice Research Database: A systematic review. Br J Clin Pharmacol. 2010;69:4–14.View ArticleGoogle Scholar
- Day-Richter J, Harris MA, Haendel M, The Gene Ontology OBO-Edit Working Group, Lewis S. OBO-Edit - an ontology editor for biologists. Bioinformatics. 2007;23:2198–200.View ArticleGoogle Scholar
- Whetzel P, Noy N, Shah N, Alexander P, Nyulas C, Tudorache T, et al. BioPortal: enhanced functionality via new Web services from the National Center for Biomedical Ontology to access and use ontologies in software applications. Nucleic Acids Res. 2011;39:W541–5.View ArticleGoogle Scholar
- Rosse C, Mejino JJ. A reference ontology for biomedical informatics: the Foundational Model of Anatomy. J Biomed Inform. 2003;36:478–500.View ArticleGoogle Scholar
- Bodenreider O. The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res. 2004;32:D267–70.View ArticleGoogle Scholar
- Aronson AR. Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. In: Proceedings of the American Medical Informatics Association. 2001. p. 17–21.Google Scholar
- Clauset A, Shalizi CR, Newman MEJ. Power-law distributions in empirical data. SIAM Rev. 2009;51:661–703.MATHMathSciNetView ArticleGoogle Scholar
- Chen Y-S, Chong PP, Tong MY. Mathematical and computer modelling of the Pareto principle. Math Comput Model. 1994;19:61–80.View ArticleGoogle Scholar
- Hersh WR, Campbell EM, Malveau SE. Assessing the feasibility of large-scale natural language processing in a corpus of ordinary medical records: a lexical analysis. In: Proceedings of the AMIA Annual Fall Symposium. 1997. p. 580–4.Google Scholar
- Spasić I, Greenwood M, Preece A, Francis N, Elwyn G. FlexiTerm: A flexible term recognition method. J Biomed Semantics. 2013;4:27.View ArticleGoogle Scholar
- UMLS. Terminology Services. 2015. https://uts.nlm.nih.gov/.Google Scholar
- UMLS. MEDCIN Source Information. 2014. http://www.nlm.nih.gov/research/umls/sourcereleasedocs/current/MEDCIN/.Google Scholar
- Brown SH, Rosenbloom ST, Bauer BA, Wahner-Roedler D, Froehling DA, Bailey KR, et al. Direct comparison of MEDCIN and SNOMED CT for representation of a general medical evaluation template. In: Proceedings of the AMIA Annual Symposium. 2007. p. 75–9.Google Scholar
- National Center for Biomedical Ontology. BioPortal. 2013. http://bioportal.bioontology.org/.Google Scholar
- Langlotz CP. RadLex: a new method for indexing online educational materials. Radiographics. 2006;26:1595–7.View ArticleGoogle Scholar
- Yetisgen-Yildiz M, Gunn ML, Xia F, Payne TH. Automatic identification of critical follow-up recommendation sentences in radiology reports. In: Proceedings of the AMIA Annual Symposium; Washington, DC. 2011. p. 1593–602.Google Scholar
- MongoDB. 2015: https://www.mongodb.org/.
- Funk C, Baumgartner W, Garcia B, Roeder C, Bada M, Cohen KB, et al. Large-scale biomedical concept recognition: an evaluation of current automatic annotators and their parameters. BMC Bioinformatics. 2014;15:59.View ArticleGoogle Scholar
- Ferrucci D, Lally A. Building an example application with the unstructured information management architecture. IBM Syst J. 2004;43:455–75.View ArticleGoogle Scholar
- Wu C, Schwartz J-M, Nenadić G. PathNER: a tool for systematic identification of biological pathway mentions in the literature. BMC Syst Biol. 2013;7:S2.View ArticleGoogle Scholar
- Cohen WW, Ravikumar P, Fienberg SE. A comparison of string distance metrics for name-matching tasks. In: Kambhampati S, Knoblock CA, editors. Proceedings of the IJCAI-2003 Workshop on Information Integration on the Web. 2003. p. 73–8.Google Scholar
- Salton G, Buckley C. Term-weighting approaches in automatic text retrieval. Info Process Manag. 1988;24:513–23.View ArticleGoogle Scholar
- Winkler WE. String comparator metrics and enhanced decision rules in the Fellegi-Sunter model of record linkage. In: Proceedings of the Section on Survey Research Methods (American Statistical Association). 1990. p. 354–9.Google Scholar
- Tsuruoka Y, McNaught J, Tsujii J, Ananiadou S. Learning string similarity measures for gene/protein name dictionary look-up using logistic regression. Bioinformatics. 2007;23:2768–74.View ArticleGoogle Scholar
- Rae K, Orchard J. The Orchard Sports Injury Classification System (OSICS) version 10. Clin J Sport Med. 2007;17:201–4.View ArticleGoogle Scholar
- Finch C, Orchard J, Twomey D, Saad Saleem M, Ekegren C, Lloyd D, et al. Coding OSICS sports injury diagnoses in epidemiological studies: does the background of the coder matter? Br J Sports Med. 2012;48:552–6.View ArticleGoogle Scholar
- Justeson JS, Katz SM. Technical terminology: some linguistic properties and an algorithm for identification in text. Nat Lang Eng. 1995;1:9–27.View ArticleGoogle Scholar
- MinorThird. 2015: http://minorthird.sourceforge.net/.
- Fleiss JL. Measuring nominal scale agreement among many raters. Psychol Bull. 1971;76:378–82.View ArticleGoogle Scholar
- Artstein R, Poesio M. Inter-coder agreement for computational linguistics. Computational Linguistics. 2008;34:555–96.View ArticleGoogle Scholar
- Geertzen J. Inter-rater agreement with multiple raters and variable. 2015. https://mlnl.net/jg/software/ira/.Google Scholar
- Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33:159–74.MATHMathSciNetView ArticleGoogle Scholar