Project Rosetta: A Childhood Social, Emotional, and Behavioral Developmental Ontology

There is a wide array of existing instruments used to assess childhood behavior and development for the evaluation of social, emotional and behavioral disorders. Many of these instruments either focus on one diagnostic category or encompass a broad set of childhood behaviors. We built an extensive ontology of the questions associated with key features that have diagnostic relevance for child behavioral conditions, such as Autism Spectrum Disorder (ASD), attention-deficit/hyperactivity disorder (ADHD), and anxiety, by incorporating a subset of existing child behavioral instruments and categorizing each question into clinical domains. Each existing question and set of question responses were then mapped to a new unique Rosetta question and set of answer codes encompassing the semantic meaning and identified concept(s) of as many existing questions as possible. This resulted in 1274 existing instrument questions mapping to 209 Rosetta questions creating a minimal set of questions that are comprehensive of each topic and subtopic. This resulting ontology can be used to create more concise instruments across various ages and conditions, as well as create more robust overlapping datasets for both clinical and research use.


Introduction
Deficit/Hyperactivity Problems, Anxiety Problems, Oppositional Defiant Problems, Somatic Problems, and Conduct Problems 7 .
The Conners, 3rd Edition (Conners 3) is a thorough assessment of ADHD and its most commonly associated problems and disorders in school-aged youth. It is a multi-informant assessment with forms for parents, teachers, and youth. The assessment features multiple content scales that assess ADHD-related concerns as well as related problems in executive functioning, learning, aggression, and peer/family relations. In addition to these content scales, Conners 3 has five DSM-IV Symptom Scales that can be used as diagnostic criteria for ADHD and common comorbid disorders, including ADHD Inattentive, ADHD Hyperactive-Impulsive, ADHD Combined, Conduct Disorder, and Oppositional Defiant Disorder scales 8 .
The Social Responsiveness Scale-Second Edition (SRS-2) is a 65-item rating scale measuring deficits in social behavior associated with ASD, as outlined by the DSM-IV. The SRS-2 consists of four rating forms across three age ranges, including parent-, teacher-, and self-report forms. There are five treatment sub-scales, including Social Awareness, Social Cognition, Social Communication, Social Motivation, and Restricted Interests and Repetitive Behavior, as well as an overall total score that are used to assess ASD 9 .
The American Academy of Pediatrics (AAP) and the National Initiative for Children's Healthcare Quality (NICHQ) jointly published the Vanderbilt ADHD Diagnostic Rating Scale (VADRS) as a psychological assessment toolkit to be used in the assessment and treatment of ADHD in a primary care setting. It includes versions specific for parents and teachers. In addition to items corresponding to the ADHD diagnostic criteria of the DSM-IV, the VADRS includes symptom screens for four common comorbidities: oppositional defiant disorder, conduct disorder, anxiety, and depression 10 .

Methods
We set out to build the first generation of project Rosetta to include a minimal set of Rosetta questions representative of all of the concepts identified within the ontology. Eight child behavioral instruments were included in the analysis that led to the creation of the behavioral ontology underlying Rosetta, which were described in more detail above. The process for creating Rosetta involved ingesting the existing child behavioral instruments into a consistent format, creating Rosetta questions and answer choices, and mapping existing instrument questions to Rosetta questions. The process for building Rosetta is detailed in this section.

Ingestion of Instruments
The first step of this process involved creating a document to ingest each of the eight child behavioral instruments into a consistent format, including the ADI-R, ADOS-2, BASC-3, BRIEF2, CBCL, Conners 3, SRS-2, and the VADRS. Each instrument was included in a single tab within the document for reference named by the version of the instrument. All versions of each instrument by age group were included, whereas only the parental version of an instrument was included if there were multiple forms for different raters. A uniform set of column names was used for all instruments, including question id, question body, and answer choices. This was carried out to have all of the instruments together in a standard format to make it easier to compare the concepts being asked in each instrument to create a comprehensive child behavioral ontology.

Clinical Domain Categorization
The individual questions from the instruments that were ingested in the previous step were added to an aggregate tab within the document. This made it easier to examine each question to determine which questions from these instruments had overlapping semantic meaning. Since there were a total of 1274 questions from all of the versions of the existing instruments, the questions needed to be grouped into categories to be able to assess this semantic overlap. A scheme was created to use for categorizing each of the individual questions into an ontology of clinical domains, shown in Table 2. A team of subject matter experts which included clinical neuropsychology, developmental psychology, and pediatric neurology representations were consulted in order to arrive at groupings and sub-groupings within the ontology. At the top level, there were three broad domains, including Cognitive, Motor, and Somatic, which were then further broken down into a total of 60 leaf categories. Following the creation of the ontology, each question was assigned to a leaf category within the ontology. Through this process of categorization, the goal was to be able to have a workable amount of existing questions within each of the leaf categories to better understand the specific concepts being asked in each domain.

Rosetta Question Creation
Following the categorization of questions, the questions were grouped by leaf category to determine which questions from the existing instruments were conceptually similar and therefore, could be covered by the creation of a novel single Rosetta question. As the questions were analyzed by leaf category, Rosetta questions were phrased to ensure a minimal loss of meaning to assess each particular child behavior. Subject matter experts drafted de novo questions based on the features identified (i.e., the broad clinical domain and leaf categories). Question versions were then reviewed to arrive at a final wording that was both novel as well as conceptually similar to the original questions. As an example of this iterative process, we looked at the Adaptability leaf category and found there were 32 existing questions within this leaf from six different instruments. A subset of these questions are shown in Table 3, to illustrate how well these questions overlap between instruments. A single Rosetta question was created by a team of subject matter experts to assess a child's ability to adapt to changes to a routine, schedule or the environment. Again, we utilized a process of experts phrasing a de novo question followed by a team review and final phrasing. This particular Rosetta question was phrased as follows: "Does [NAME] become unusually upset with or have difficulty accepting small changes? For example, a change in [his/her] bedtime routine, weekly scheduled activities, or furniture arrangement in the house."

Rosetta Question Mapping
A mapping was then created in a separate tab in the document so that each of the original instrument questions were mapped to a corresponding Rosetta question. As shown in the example above, all of the sample questions in Table 3 could be mapped to the Rosetta question about a child's ability to adapt to changes. Within the adaptability leaf category, twenty-one instrument questions were mapped to this Rosetta question. Three additional Rosetta questions were created within this leaf category to assess other specific child behaviors associated with adaptability, and the remaining eleven existing questions within the adaptability leaf category were mapped accordingly. This mapping process led to a many-to-one mapping, where many questions from different instruments mapped to a single Rosetta question. On average, three instruments and seven questions mapped to one Rosetta question. Further details of the overlap created by carrying out this process are discussed in the Results section.

Rosetta Question Answer Creation and Mapping
As multiple questions mapped to a single Rosetta question, each of these existing questions tended to have varying types of answer responses. The ADI-R and ADOS-2 generally have descriptive answer choices that relate to the quality of behavior 6/11 being assessed, whereas the remaining assessments have answer choices on a Likert scale referring to the frequency of that behavior. For the last step in this process, these differences had to be consolidated to create a new, consistent coding of answer choices that each of the original question responses could be mapped to, retaining as many of the response signals as possible. Each of the original questions and answer choices was independently examined to make sure the answer codes were mapped without a significant loss of meaning. In the example discussed in the section above, five different instruments were mapped to the question about adaptability to change and each of them was asked in a slightly different way with different answer choices. The corresponding ADI-R questions had four descriptive answer choices, whereas the BRIEF2 and CBCL had three answer choices on a frequency scale, and the BASC-3 and SRS-2 had four answer choices on a frequency scale, shown in Table 4.
The subject matter experts crafted question answer choices for Rosetta questions such that, where appropriate, descriptive quality-based responses of ADI-R and ADOS were combined with the frequency responses typical of instruments like BASC-3 and BRIEF2. When questions from BRIEF2 or CBCL mapped to a Rosetta question, three answer codes were created in Rosetta because that was the least amount of responses that would potentially be mapped if the child only had the BRIEF2 or CBCL instruments assessed. This was decided because it could not be inferred how a parent would have responded if given more answer choices. The new answer choices for this particular question that combined frequency and quality were as follows: 1=Rarely or never; 2=Sometimes, but with little interference in family life; and 3=Often, and with some interference with family life. We then mapped each of the existing question answer choices to the Rosetta answer codes based on how the questions and answer choices overlapped with the phrasing of the Rosetta question and answer codes as shown in Figure 1.

Case study: machine-learning-based, concurrent assessment of children for autism and ADHD
To demonstrate the utility of Rosetta as a platform to enable the development of simultaneous multi-condition assessment algorithms, we trained and validated a machine-learning algorithm to assess young children for the risk of autism or ADHD using a single questionnaire comprised entirely of Rosetta questions. The input data for this case study consisted of 3,731 patient records of children aged 4-10, each of whom underwent one or more clinical assessment instruments that are currently incorporated into Rosetta. The diagnostic labels for the dataset were assigned by licensed medical professionals, and the breakdown was 2,941 positive for autism, 343 positive for ADHD, and 447 negative for both.
As is the case in most clinical data collection settings, no single assessment instrument has been undergone by all patients in the dataset for this case study. Rather, the data covers 6 different Rosetta-friendly clinical assessment instruments with little overlap. Because many instruments have multiple questionnaire versions, the total number of unique instrument questionnaire versions was 15. Under traditional settings, it would not be possible to proceed with machine learning training in these conditions. However, with Rosetta available, it was possible to leverage the entire dataset as training and cross-validation samples to a machine-learning predictive algorithm.
The assessment algorithm identifies autism and ADHD using the Rosetta dataset as follows: first, a data imputation technique is used to infer values for missing Rosetta questions for every sample as needed. Next, a gradient boosted decision tree algorithm is trained to identify if either of autism or ADHD is present for a child. A second gradient boosted decision tree algorithm is trained to identify which of the two conditions is present, and is only used for predictions if the first algorithm identifies a child as having autism or ADHD. To train each algorithm an iterative procedure is used to identify the most predictive Rosetta questions to be used in model training. The training process identified a total of 30 Rosetta questions as the relevant features for the assessment of autism and ADHD in a single questionnaire.
Cross-validation AUC is evaluated to be 99% (when identifying autism or ADHD) and 99% (for separating autism from ADHD). This encouraging preliminary result demonstrates the potential utility of the application, and the benefit of applying Rosetta instrument mappings to unlock the power of machine-learning in settings that might otherwise not be amenable to such application. Further clinical trial testing should be performed to evaluate how effective such algorithms are when the Rosetta instrument is applied in real world settings.

Results
The final Rosetta document mapped 1274 existing instrument questions to 209 Rosetta questions. Table 5 shows the resulting fusion of questions, including the number of existing instruments with overlap, the total existing instrument questions and resulting number of novel Rosetta questions for each leaf category in the ontology. On average, three instruments and seven existing questions mapped to a single Rosetta question.  The resulting overlap between existing instruments and Rosetta questions is illustrated in the heat map in Figure 2. For each existing instrument, the heat map shows how many Rosetta questions have overlap with every other instrument in the ontology. The diagonal from top left to bottom right shows how many Rosetta questions only map to one instrument. It can be seen from this that the ADI-R, CBCL, and SRS-2 have a considerable amount of questions that are difficult to find overlap with other existing questions. The last column in this figure shows the resulting number of Rosetta questions that overlap with each instrument.

Discussion
To our knowledge, our first generation of project Rosetta is the first ontological mapping of its kind, creating a minimal set of questions with significant conceptual overlap across multiple childhood behavioral/developmental instruments. The resulting ontology incorporates many concepts that have diagnostic relevance for child behavioral conditions, which can be used as a resource for child mental, behavioral, and developmental health diagnosis and treatment. By creating a reduced set of 209 questions, Rosetta can be used to create more concise instruments, thereby addressing some of the time constraints that lead to delays in diagnosis and treatment interventions.
The overlap between existing childhood behavioral/developmental instruments that was created by Rosetta can be used to create a virtual diagnostic instrument that covers more patients across various ages and with various conditions in a uniform way that could not be done before. This ability to take in and combine assessment data from any existing instrument through the corresponding mappings allows for the creation of a large, dense dataset that is required for building machine learning algorithms in the development of diagnostic tools as presented in our case study in the Methods section above.
There are some potential limitations to this project due to the variation between the instruments included in the first generation of Rosetta. There could be a loss of response signals from over-simplification of the question phrasing when creating the Rosetta questions, as well as from mapping existing instrument questions to Rosetta questions that are not representative of a particular behavior. Another challenge leading to a loss of response signals comes from the combination of instruments with varying scales of answer choices, as well as the combination of descriptive quality-based answer choices with frequency-based answer choices. Both of these limitations could lead to a misrepresentation of parental responses for particular behaviors.
This work needs to be extended to cover more child behavioral health instruments. Different child behavioral health instruments could potentially expand the ontology to be more representative of other diagnoses that are not well-represented by the eight instruments included in this ontology. This work could also be extended into other diagnostic domains, such as adult behavioral conditions by applying the same concepts to adult checklists and screening tools. Additionally, clinical trial testing should be performed to assess the application of the Rosetta instrument in real world settings across a variety of child behavioral conditions.