Developing a web-based SKOS editor

Background The Simple Knowledge Organization System (SKOS) was introduced to the wider research community by a 2005 World Wide Web Consortium (W3C) working draft, and further developed and refined in a 2009 W3C recommendation. Since then, SKOS has become the de facto standard for representing and sharing thesauri, lexicons, vocabularies, taxonomies, and classification schemes. In this paper, we describe the development of a web-based, free, open-source SKOS editor built for the development, curation, and management of small to medium-sized lexicons for health-related Natural Language Processing (NLP). Results The web-based SKOS editor allows users to create, curate, version, manage, and visualise SKOS resources. We tested the system against five widely-used, publicly-available SKOS vocabularies of various sizes and found that the editor is suitable for the development and management of small to medium-size lexicons. Qualitative testing has focussed on using the editor to develop lexical resources to drive NLP applications in two domains. First, developing a lexicon to support an Electronic Health Record-based NLP system for the automatic identification of pneumonia symptoms. Second, creating a taxonomy of lexical cues associated with Diagnostic and Statistical Manual of Mental Disorders (DSM-5) diagnoses with the goal of facilitating the automatic identification of symptoms associated with depression from short, informal texts. Conclusions The SKOS editor we have developed is — to the best of our knowledge — the first free, open-source, web-based, SKOS editor capable of creating, curating, versioning, managing, and visualising SKOS lexicons. Electronic supplementary material The online version of this article (doi:10.1186/s13326-015-0043-z) contains supplementary material, which is available to authorized users.


Background
The Simple Knowledge Organization System (SKOS) standard was introduced to the wider community by a 2005 World Wide Web Consortium (W3C) working draft [1], and further developed and refined in a 2009 W3C recommendation [2,3] 1 . Since then, SKOS has become the de facto standard for representing thesauri, lexicons, vocabularies, taxonomies, and classification schemes, both as a useful data format in its own right, and as a means for sharing resources on the semantic web. In this paper, we describe the development of a web-based, free, opensource SKOS editor suitable for the creation and curation of knowledge organization systems in general, and healthrelated lexicons designed to support clinical Natural Language Processing (NLP) in particular.
SKOS is a flexible standard designed to represent and encode a wide number of different types of knowledge together in various ways to create a hierarchical structure. The most important of these semantic relations are: • skos:broader can be read as "has broader concept". For instance, the relation PHOTOPHOBIA skos:broader VISIONPROBLEM, asserts that PHOTOPHOBIA has broader concept VISIONPROBLEM. • skos:narrower which can be read as "has narrower concept". For instance, the relation VISIONPROBLEM skos:narrower PHOTOPHOBIA, asserts that VISIONPROBLEM has narrower concept PHOTOPHOBIA. • skos:related can be read as "is related to". For instance, the relation PHOTOPHOBIA skos:related DIPLOPIA, asserts that PHOTOPHOBIA is related to DIPLOPIA.
Each SKOS concept can be associated with several types of lexical labels: • skos:prefLabel (preferred label ) provides a mechanism to link a preferred label to a concept. The prefLabel is the primary means of referring to a concept. Only one prefLabel per language should be assigned to each concept. For example, the SKOS concept FEVER could have the skos:prefLabel "fever"@en (note that "@en" refers to English language). • skos:altLabel (alternative label ) provides a mechanism to specify synonyms or near-synonyms for a given concept. For example, the concept FEVER could have the skos:altLabel "febrile"@en. This relation is especially useful for specifying synonymous terms necessary for NLP. • skos:hiddenLabel (hidden label ) provides a mechanism to specify non-standard synonymous terms (e.g. misspellings, typographical errors). For example, the concept FEVER could have the skos:hiddenLabel "feber"@en. Hidden labels are particularly useful for encoding common misspellings necessary for NLP systems.
In addition to the semantic relations and lexical labels described above, SKOS also provides facilities to add additional metadata to concepts and map SKOS concepts to external vocabularies.
Given its lightweight semantics, SKOS is particularly suitable as a basis for the development and sharing of vocabularies to support NLP tasks. A key part of the workflow in developing some NLP systems -in particular NLP systems designed to process health-related text -is the development of custom lexicons, including common abbreviations, synonyms (including slang terms), and truncations [5][6][7][8].
Since its inception in 2005, significant effort has been expended on the development of software tools for the SKOS standard, in particular in editing and viewing SKOS vocabularies. Whilst OWL editors, such as Protégé 2 , can be used to create and edit SKOS, they require a user to understand SKOS in terms of OWL; an unnecessary overhead for a user simply interested in creating SKOS. Furthermore, a major requirement for a SKOS editing tool is the ability to visualise and navigate SKOS concept scheme broader/narrower hierarchies, a functionality that is unlikely be supported by generic OWL and RDF (Resource Description Framework) tools. Notable examples of "SKOS aware" tools include a SKOS Application Programming Interface (API) and editing module [9] for Protégé 4 3 (the Protégé SKOS Editor), PoolParty, an online SKOS editing and manipulation tool [10], and SKOS functionality built into the TopBraid Composer RDF editing platform [11] 4 , all of which facilitate the creation, development, and utilisation of SKOS vocabularies. However, to the best of our knowledge, until now no free, open-source, web-based SKOS editor has been available to the research community (note that Pool-Party, although web-based, is a commercial product). In this paper, we present a web-based SKOS editing tool that is suitable for developing and modifying the healthrelated lexicons necessary for large-scale information extraction from clinical notes and other health-related text, yet is also general purpose enough for any small-tomedium-sized SKOS vocabulary development or curation project.

Implementation
A key advantage of using a web-based editor, is that it can be used anywhere, on any machine, without complex user installation. Given that our target users are clinicians, public health workers, and domain expertsi.e. those with little or no experience of semantic web languages -rather than informatics professionals, ease of use is an important requirement. We took the decision to simplify the editor's user interface as much as possible, hiding some of the general OWL/RDF functionality available in tools like Protégé and TopBraid Composer.
Considerable effort was expended on designing the user interface (a screenshot of the system is shown in Fig. 1 showing a SKOS thesaurus designed to drive a NLP system for the automatic identification of biosurveillancerelevant symptoms from Electronic Health Records (EHRs) [12]). After some experimentation, we adopted an interface that consists of three panes, from left to right: We identified six core functionalities necessary for the editor, partially based on the requirements identified by [9]: • Create, edit, and delete SKOS entities • Assert SKOS relationships between SKOS concepts (e.g. broader/narrower) • Assert and edit skos:prefLabel, skos:altLabel, and skos:hiddenLabel data properties • Visualise broader and narrower relationships in a browsable hierarchical tree • Support for SKOS documentation properties • Provide alternative renderings (e.g. multilingual prefLabels) within the editor Additionally, our editor provides versioning, and a Wizard tool to expedite the SKOS concept hierarchy creation process.
In building our web-based SKOS Editor, we relied heavily on existing OWL, SKOS and RDF tooling, in particular, the SKOS API [9] (developed by author Jupp) and the OWL API [13]. The system is a Liferay portlet application that uses a standard Model-View-Controller architecture implemented using the following technologies: • Business (Model) Layer: Java SKOS API and OWL API • Presentation (View) Layer: JavaScript/JSP/JQuery Libraries provides a rich web 2.0 user interface connected to the middle layer via AJAX calls • Controller/Middle Layer: The Liferay Portlet application using the JSR 286 Portlet framework connects the presentation layer to the SKOS API, as well as providing user management, authorisation, and authentication.
A MySQL database is used to save files and file versions, as well as user specific settings. The application is a Single Page Application, with all server/client communication based on Ajax calls using a JQuery library (client-side) and Liferay portlet (server-side).
A screenshot of the system interface is shown in Fig. 1 and a diagram representing the system architecture is shown in Fig. 2.

Results and discussion
The web-based SKOS editor allows a user to upload a SKOS file from their local machine for editing, load a SKOS file from a URL, create a SKOS file ab initio, and download an edited SKOS file to a local machine. Furthermore, the editor supports versioning of SKOS files, and provides a GUI-based "Wizard" to expedite the creation of concept hierarchies. The Wizard allows a user to input a plain-text tab indented concept hierarchy, a functionality that has been shown in our qualitative user testing to expedite the hierarchy creation process (see Fig. 3 and Additional file 1). The tool takes its inspiration from the Protégé SKOS editor developed by author Jupp, and supports core SKOS functionalities. In the "Concept Pane", SKOS concept schemes and concepts can be created and manipulated with a hierarchical tree structure. The "Relation Pane" shows hierarchical relations defined in the concept pane, and allows these relations to be modified, including the addition of non-hierarchical relations between concepts. The "Linguistics Pane" allows lexical information -prefLabels, hiddenLabels, altLabels -to be associated with each concept.
While there have been attempts at developing best practices for SKOS thesauri development (e.g. [3,14]) considerable heterogeneity exists between different SKOS resources [15]. We built an editor that is designed to handle even those SKOS resources that do not adhere to suggested best practice (e.g. the thesauri has more than one prefLabel for a specified language, or a SKOS concept exists outside a Concept Scheme).

Loading and editing sample SKOS vocabularies
In order to demonstrate and test the capacities of the SKOS editor, we tested the performance of the editor in executing some key editing functions. To test the editor, we used an Apple MacBook with 16GB of memory and the Firefox web browser (version 32). We chose five widely used SKOS resources:    Table 1 shows the capabilities of the editor in editing large thesauri, where it can be seen that the 5.1 MB UNESCO Thesaurus took six seconds to load into the tool. However, larger thesauri -e.g. the UK Archive Thesaurus at 9.4 MB -do not load quickly due to limitations within the Liferay web framework. The tool is primarily designed for developing relatively small, linguisticallyoriented vocabularies. In addition to testing whether various existing SKOS vocabularies could be loaded into the tool and rendered correctly, for each of the SKOS thesauri evaluated, we tested basic editing functionality (e.g. whether a new concept could be created and inserted into the existing thesauri, whether concepts could be deleted).
The results of this evaluation are shown in Table 2. Note that even very large vocabularies (e.g. STW Thesaurus) could be edited successfully using the tool.

Qualitative evaluation
Our qualitative evaluation of the SKOS editor centred on two use cases. For the first use case, an experienced knowledge engineer (author Castine) used the SKOS editor to build a lexical resource to drive an EHR-oriented NLP algorithm based on the Centers for Disease Control pneumonia definition (see Fig. 4 for a screenshot of the resulting SKOS resource). The pneumonia resource took a total of 40 min to build using the Web SKOS Editor, as opposed to the Protégé SKOS Editor Plug-in, which took 45 min. Note that the knowledge engineer did not use the Wizard concept creation functionality, a tool which we believe is likely to expedite the concept hierarchy creation process substantially. For the second use case, an experienced NLP researcher (author Mowery) used the tool to develop a resource designed to map lexical cues to Diagnostic and Statistical Manual of Mental Disorders (DSM-5) diagnoses with the goal of facilitating the automatic identification of symptoms associated with depression from short, informal texts [21] (see Fig. 5 for a screenshot of the SKOS resource creation process). The depression resource took less than one hour to create, and it was reported that the Wizard greatly expedited the concept creation process. However, several enhancements were suggested, including the development of an autosave feature, and the ability to configure default values for language labels (for example, default to English -@enlabels).

Limitations
While the SKOS editor is suitable for building and curating special purpose SKOS vocabularies to run bespoke clinical NLP systems, it does have several limitations: • It is not suitable for editing very large SKOS vocabularies • As the tool is built around the SKOS API [9], some language features outside "core SKOS" [1] are not supported (e.g. skos:closeMatch, skos:relatedMatch).

Future directions
Our long-term goal is to integrate the SKOS editor as a lexicon development and management module within a comprehensive platform for developing clinical NLP algorithms. As part of this long term goal -and informed by the comments and suggestions of our early users -we plan three major system enhancements: • In the medium term, we plan to add multi-user functionality and collaborative editing to the system. • We plan to include the ability to search other vocabularies -in particular the UMLS (Unified Medical Language System) [22] -from within the editor interface in order to expedite the synonym identification process. • We plan to extend the current documentation and tutorial material

Conclusions
The SKOS editor we have developed is -to the best of our knowledge -the first free, open-source, online, SKOS editor capable of creating, curating, versioning, and managing SKOS vocabularies. The editor is free to use 5 and the source code is available under an Apache Version 2.0 License.