1. The Rise of Biomedical Semantics
New discoveries in biology and medicine over the last two decades have been facilitated by large-scale data generation exercises as part of high-throughput experiments [1]. The observed data is maintained in ever growing scientific public databases as structured data, while novel scientific findings are reported in a semi- or un-structured form in the literature. Many of the past and present developments in biomedicine, in particular in computational biology and medical informatics, have focused on structured data management. At the same time, there has been a remarkable increase in the number and coverage of semantics resources and semantically annotated repositories to leverage biomedical knowledge from their structured and unstructured origins alike [2].
As part of the new developments, biomedical research is not relying only on local data and findings generated within individual laboratories, but is increasingly integrating shared, publicly available biomedical repositories. Furthermore, researchers exploit data resources from miscellaneous domains: for example, molecular biologists using toxicological data and medical scientists exploring data resources from molecular biology or environmental sciences. Altogether, biomedical research is moving beyond approaches utilizing "hand-crafted" hypotheses and inference techniques towards approaches that are trained on large-scale, semantically integrated biomedical data. For example, research in systems biology and systems medicine requires semantic integration of data across the biomedical domain to feed it into comprehensive and formalised models of living beings, utilising techniques from computer science, mathematics and engineering to analyse and predict biological and medical outcomes based on experimental and formal semantics descriptions.
Indeed, in recent years, we have been witnessing an evolution towards formalising biomedical knowledge in explicit domain models such as curated and annotated datasets, ontologies, pathways, semantic networks, etc [3]. The biomedical semantics resourceome now contains a large number of repositories. Biomedical ontologies [4], terminologies and controlled vocabularies (e.g. the Gene Ontology [5], MeSH [6], UMLS [7], ICD-10 [8], etc.), in particular, have been widely used for the integration of diverse scientific databases and for standardisation of information access through common but stringent logical definitions and constraints on semantic annotations of database content. Such semantic descriptors have been also used for detailed analyses and improvement of our understanding of known and prediction of yet un-explained biomedical processes.
In addition to the development of the semantics resourceome for structured data, researchers have been targeting the automated integration and exploitation of unstructured data, i.e. the scientific literature, to capture novel results and findings [9]. Information retrieved from semantics-driven literature mining needs to be integrated into existing knowledge repositories, thus becoming an integral part of a fully specified and interconnected knowledge space for biomedical research.
The need for formalised semantics is, of course, not specific to biomedicine. However, the unprecedented scale of biomedical data has put biomedical research into a forefront position to foster the development of leading-edge technologies such as linked data and the Semantic Web. We can already see a growing penetration of these technologies into the life science community, evident through many initiatives and conferences (e.g. Bio2RDF [10], BioHackathon [11], SWAT4LS [12]). In addition to institutionally-supported actions, there are various community-driven initiatives to annotate, organise and provide access to the whole spectrum of biomedical resources, from primary experimental data to data analysis workflows (e.g. myExperiment [13]) and secondary semantically-curated resources (e.g. ORegAnno [14]).
Management and maintenance of the semantics resourceome has already been recognised as one of the most challenging tasks that the community needs to address in order to facilitate our understanding of biological processes and predictions of functional behaviour. An interoperable and comprehensive semantic infrastructure is likely to facilitate data, knowledge and methodology re-use and integration, and thus encourage further developments in the field. Coupled with human- and machine-readable formal models and automated reasoning, the infrastructure can provide an environment for the next generation of biomedical research, one that will rely on a distributed semantic grid of biomedical data and knowledge. Some authors already refer to these developments as Semantic Systems Biology (and Medicine), where reasoning and conclusions are based on the full integration of biomedical knowledge [15]. Combined with semantics-driven data analysis workflow orchestration and distributed execution, the new framework for in-silico biomedical experimentation has the potential to add a new dimension to the way biomedical research is conducted [16].
One question that many biomedical semantics researchers often face is whether their research aims to create a virtual scientist, who would do automated investigations and will, eventually, replace real-world scientists. While such attempts have been explored [17], most of the current research in the domain of biomedical semantics is focused on the facilitation of large-scale, systematic and formalised representation and exploration of biomedical knowledge, where the main onus is still on an expert to mange and experience the scientific discovery process, but using a research environment that can cope with the complexity, dynamics and volume of biomedical knowledge.
Biomedical semantics research thus aims to bridge the gap between the data and knowledge that has been increasingly available and facilitate their use to support scientific exploration. Only full semantic integration of biomedical knowledge and experimental data can provide a means for scientists to model complex biological systems and thus improve our understanding of living organisms.