In our consortium, we strongly believe in the importance of doing research in a multidisciplinary team. Besides our efforts toward finding new insights in the molecular biology of brain tumors and those developing new diagnostic platforms for early diagnosis and following up treatments, we have a dedicated team of data scientists in charge of building a comprehensive and easy-to-use biomarker database that will provide a useful tool for researchers, pathologists, clinicians, among others, interested in improving the diagnosis and therapeutic outcome of patients with brain tumors. The database project was proposed by Jeremy Georges-Filteau (The Hyve/Radboud University) who planned and led the software development component with the involvement of Xiaoyu Zhang (Imperial College) and Birbal Prasad (Plymouth University).
Especially for biomarkers, there is a lot of literature available. What we are missing is an extensive and exhaustive collection of this data. The organised collection, storage and electronic accessibility of data is described as a database. When looking for a specific biomarker or when exploiting available biomarker options, a database can make it easier to find a good biomarker and corresponding information associated to it. To our knowledge, no such database is available for Glioblastoma multiforme (GBM). With this in mind, we have been building GlioBase, a GBM biomarker knowledge database.
This month, we have updates from the aforementioned ESRs who are building the database.
“Currently, I am developing our glioblastoma multiforme (GBM) biomarker knowledge base, which is the first biomarker database specifically designed for GBM to the best of our knowledge. For now, the first phase of development is complete, and a prototype of this database is ready for usability testing. At the same time, I am also working on my individual research project which is focused on the deep learning-based multi-omics brain tumour prediction model. Omics data are high dimensional with tens of thousands of features, whereas the number of available samples is relatively small. To deal with the challenge of “dimensionality cures”, we used unsupervised deep learning approaches like stacked denoising autoencoder (SDAE) and variational autoencoder (VAE) to extract lower dimensional latent features from high dimensional omics data for further analyses.”
Xiaoyu Zhang, ESR-11 at Imperial College London (UK)
“My individual research project is aimed at the development and application of novel system analysis and statistical learning approaches to discover biomarkers for glioma (in particular, glioblastoma multiforme (GBM)) diagnosis. Thus, we are leading the GBM biomarker discovery in silico study. Currently, as a first step to biomarker discovery, I am preparing (quality control) identified GBM datasets from public databases like ArrayExpress, GEO and TCGA for carrying out meta-analysis. In addition to this, I am also leading the overall development (in particular, biomarker information collection from literature, data curation and overall design) of the GBM biomarker knowledge base in collaboration with other ESRs. Recently, we successfully completed the first phase of database development as mentioned by Xiaoyu (above) and a prototype is now being tested for usability. Moreover, I also completed my first secondment at The Hyve, Netherlands (1 month, March 2019) where I understood and experienced different steps of software development related to biological data (in collaboration with Jeremy). This was of immense help for the first phase of GBM biomarker database development.”
Birbal Prasad, ESR-10, University of Plymouth (UK)
“I am currently working on the scalable and high performance storage of clinical and omics data component of my project. There is currently no widely accepted benchmark to compare the performance of such database systems. Generic benchmarks that are not representative of the desired application domain or single-use custom scripts are often employed. I am thus investigating ways to generate privacy-preserving synthetic data sets from real-world case studies. These should be able to reproduce the referential correlations and statistical properties of the original data structure. The resulting benchmarking tool will enable me to meaningfully compare storage solutions, such as column-based databases for clinical data, and integrate them to the cloud-based diagnostic platform. In addition to this, I came up with the idea for GlioBase and led the software development component of it, including the choice of modern web technologies, the planning of development tasks and the design of a flexible data model”
Jeremy Georges-Filteau, ESR-12, The HYVE, Netherlands