Missing Value Estimation for Gene Expression Profile Data
-
Graphical Abstract
-
Abstract
A new Missing value (MV) estimation method for gene expression profile data is proposed by considering both the internal and external conditions of gene expression profiles. The internal condition emphasizes the time-series characteristic of gene expression profile data. Therefore, we can use the cubic spline fitting method to construct a gene expression curve so as to estimate MVs. The main idea of MV estimation based on the external condition is to reconstruct MVs according to the expression values of candidate genes. Firstly, an initial subset of candidate genes is determined by defining a trace matrix. Then a final subset of candidate genes is constructed by selecting genes from the initial subset according to an improved Pearson correlation coefficient. At last, we select K genes that are most correlated with the target gene from the final subset to compute the weighted sum of the K expression values. Thus, the weighted sum is the estimated value of the target gene based on the external condition. Experimental results indicate that, compared with commonly used MV estimation methods, KNNimpute, SKNNimpute and IKNNimpute, the proposed method has higher estimation accuracy and is robust to the magnitude of K.
-
-