A Cross-Domain Ontology Semantic Representation Based on NCBI-BlueBERT Embedding
-
Graphical Abstract
-
Abstract
A common but critical task in biological ontologies data analysis is to compare the difference between ontologies. There have been numerous ontology-based semantic-similarity measures proposed in specific ontology domain, but it still remains a challenge for cross-domain ontologies comparison. An ontology contains the scientific natural language description for the corresponding biological aspect. Therefore, we develop a new method based on natural language processing (NLP) representation model bidirectional encoder representations from transformers (BERT) for cross-domain semantic representation of biological ontologies. This article uses the BERT model to represent the word-level of the ontologies as a set of vectors, facilitating the semantic analysis or comparing the biomedical entities named in an ontology or associated with ontology terms. We evaluated the ability of our method in two experiments: calculating similarities of pair-wise disease ontology and human phenotype ontology terms and predicting the pair-wise of proteins interaction. The experimental results demonstrated the comparative performance. This gives promise to the development of NLP methods in biological data analysis.
-
-