AttentionSplice: An Interpretable Multi-Head Self-Attention Based Hybrid Deep Learning Model in Splice Site Prediction
-
Graphical Abstract
-
Abstract
Pre-mRNA splicing is an essential procedure for gene transcription. Through the cutting of introns and exons, the DNA sequence can be decoded into different proteins related to different biological functions. The cutting boundaries are defined by the donor and acceptor splice sites. Characterizing the nucleotides patterns in detecting splice sites is sophisticated and challenges the conventional methods. Recently, the deep learning frame has been introduced in predicting splice sites and exhibits high performance. It extracts high dimension features from the DNA sequence automatically rather than infers the splice sites with prior knowledge of the relationships, dependencies, and characteristics of nucleotides in the DNA sequence. This paper proposes the AttentionSplice model, a hybrid construction combined with multi-head self-attention, convolutional neural network, bidirectional long short-term memory network. The performance of AttentionSplice is evaluated on the Homo sapiens (Human) and Caenorhabditis Elegans (Worm) datasets. Our model outperforms state-of-the-art models in the classification of splice sites. To provide interpretability of AttentionSplice models, we extract important positions and key motifs which could be essential for splice site detection through the attention learned by the model. Our result could offer novel insights into the underlying biological roles and molecular mechanisms of gene expression.
-
-