The 2nd DBCLS BioHackathon: interoperable bioinformatics Web services for integrated applications

Katayama, Toshiaki; Wilkinson, Mark D; Vos, Rutger; Kawashima, Takeshi; Kawashima, Shuichi; Nakao, Mitsuteru; Yamamoto, Yasunori; Chun, Hong-Woo; Yamaguchi, Atsuko; Kawano, Shin; Aerts, Jan; Aoki-Kinoshita, Kiyoko F; Arakawa, Kazuharu; Aranda, Bruno; Bonnal, Raoul JP; Fernández, José M; Fujisawa, Takatomo; Gordon, Paul MK; Goto, Naohisa; Haider, Syed; Harris, Todd; Hatakeyama, Takashi; Ho, Isaac; Itoh, Masumi; Kasprzyk, Arek; Kido, Nobuhiro; Kim, Young-Joo; Kinjo, Akira R; Konishi, Fumikazu; Kovarskaya, Yulia; von Kuster, Greg; Labarga, Alberto; Limviphuvadh, Vachiranee; McCarthy, Luke; Nakamura, Yasukazu; Nam, Yunsun; Nishida, Kozo; Nishimura, Kunihiro; Nishizawa, Tatsuya; Ogishima, Soichi; Oinn, Tom; Okamoto, Shinobu; Okuda, Shujiro; Ono, Keiichiro; Oshita, Kazuki; Park, Keun-Joon; Putnam, Nicholas; Senger, Martin; Severin, Jessica; Shigemoto, Yasumasa; Sugawara, Hideaki; Taylor, James; Trelles, Oswaldo; Yamasaki, Chisato; Yamashita, Riu; Satoh, Noriyuki; Takagi, Toshihisa

doi:10.1186/2041-1480-2-4

Journal of Biomedical Semantics

Table 1 Summary of technical problems and solutions for each use case

From: The 2nd DBCLS BioHackathon: interoperable bioinformatics Web services for integrated applications

Use Case 1	Annotation of 100,000 invertebrate ESTs
Task	A researcher needs to annotate 100,000 sequences obtained from an invertebrate species and also needs to provide the result as a public database.
Strategy	Annotate sequences by similarity and complement these annotations for sequences showing no similarity by integrated analysis tools. Then, store the results into BioMart or TogoDB to make the database publicly available.
Problem	Needed to identify which tool was most suitable for each step. Some tools turned out to require very long time for execution. The resulting annotations needed to be archived in a database and made accessible on the Web.
Solution	Firstly, use relatively fast tools like Blast2GO and KAAS then use ANNOTATOR for limted number of sequences. BioMart is suitable for integration of remote BioMart resources like Ensembl, while TogoDB can be used to host databases without installation. Both database systems are accessible through the Web service interface for workflow tools like jORCA and Taverna.
Tools	Blast2GO, KAAS, ANNOTATOR, BioMart, TogoDB, TogoWS, jORCA, Taverna
Databases	Ensembl, BioMart, KEGG
Use Case 2	TFBS enrichment within differential microarray gene expression data
Task	Identify SNPs in transcription factor binding sites and visualize the result as a genome browser.
Strategy	Retrieve SNP and TSS datasets through the DAS protocol, then compute enrichment and export results for a DAS viewer.
Problem	Needed to integrate information from multiple databases and needed to customize the visualization.
Solution	Developed a custom-made prediction system for the data obtained from DAS sources, then customize the Ajax DAS viewer to show the result in a genomic view.
Tools	BioDAS, Ajax DAS viewer
Databases	FESD II, DBTSS
Use Case 3	Protein interactions among enzymes in a KEGG metabolic pathway
Task	Predict interacting pairs of proteins in a given metabolic pathway.
Strategy	Retrieve enzymes from a specified pathway and search pairs of homologous proteins forming complexes in a strucuture database.
Problem	Found version incompatilibity of the server and client implementations of SOAP protocol. Non-standard BLAST output format was returned by PDBj Web service. There were no Web services to calculate phylogenetic profile.
Solution	Switch programming languages according to the service in use. Programs are written to parse BLAST results and to generate a phylogenetic profile.
Tools	Java, OCaml, Perl, Ruby, BLAST, DDBJ WABI, PDBj Mine, KEGG API
Databases	DDBJ, KEGG, PDBj, UniProt
Use Case 4	Analyzing glyco-gene-related diseases
Task	Find human diseases which are potentially related to SNPs and glycans.
Stragety	Retrieve disease genes and search for homologs in other organisms to which glyco-gene interactions are recoreded, then search for epitopes to identify glycans and retrieve their structures.
Problem	No Web service existed to query GlycoEpitopeDB and to convert a glycan structure in IUPAC format into KCF format. The output of OMIM search was in XML including entries which did not contain SNPs.
Solution	Implemented and registered BioMoby compliant Web services. Wrote custom BeanShell script for a Taverna workflow.
Tools	Taverna, BioMoby, KEGG API
Databases	OMIM, H-InvDB, GlycoEpitopeDB, RINGS, Consortium for Functional Glycomics, GlycomeDB, GlycoGene DataBase, KEGG

Back to article page

ISSN: 2041-1480

Contact us

General enquiries: journalsubmissions@springernature.com