Missing Value Estimation for Gene Expression Profile Data[J]. Chinese Journal of Electronics, 2012, 21(4): 673-677.
Citation: Missing Value Estimation for Gene Expression Profile Data[J]. Chinese Journal of Electronics, 2012, 21(4): 673-677.

Missing Value Estimation for Gene Expression Profile Data

  • Received Date: 2011-03-01
  • Rev Recd Date: 2011-05-01
  • Publish Date: 2012-10-25
  • A new Missing value (MV) estimation method for gene expression profile data is proposed by considering both the internal and external conditions of gene expression profiles. The internal condition emphasizes the time-series characteristic of gene expression profile data. Therefore, we can use the cubic spline fitting method to construct a gene expression curve so as to estimate MVs. The main idea of MV estimation based on the external condition is to reconstruct MVs according to the expression values of candidate genes. Firstly, an initial subset of candidate genes is determined by defining a trace matrix. Then a final subset of candidate genes is constructed by selecting genes from the initial subset according to an improved Pearson correlation coefficient. At last, we select K genes that are most correlated with the target gene from the final subset to compute the weighted sum of the K expression values. Thus, the weighted sum is the estimated value of the target gene based on the external condition. Experimental results indicate that, compared with commonly used MV estimation methods, KNNimpute, SKNNimpute and IKNNimpute, the proposed method has higher estimation accuracy and is robust to the magnitude of K.
  • loading
  • X.S. Wang, Y.Y. Gu, Y.H. Cheng et al., Construction of delaygene regulatory network based on complex network”, Acta ElectronicaSinica, Vol.38, No.11, pp.2518-2522, 2010. (in Chinese)
    M. Choi, O.H. Lee, S. Jeon et al., “The oocyte-specific transcriptionfactor, Nobox, regulates the expression of Pad6, a peptidylargininedeiminase in the oocyte”, FEBS Letters, Vol.584,No.16, pp.3629-3634, 2010.
    A.G. De Brevern, S. Hazout, A. Malpertuy, “Influence of microarraysexperiments missing values on the stability of genegroups by hierarchical clustering”, BMC Bioinformatics, Vol.5,pp.114-125, 2004.
    X.S. Wang, Y.Y. Gu, Y.H. Cheng et al., “An ensemble classifier based on selective independent component analysis of DNAmicroarray data”, Chinese Journal of Electronics, Vol.18, No.4,pp.645-649, 2009.
    J.L. Schafer, J.W. Graham, “Missing data: our view of thestate of the art”, Psychological Methods, Vol.7, No.2, pp.147-177, 2002.
    M.T. Swain, J.J. Mandel, W. Dubitzky, “Comparative studyof three commonly used continuous deterministic methods formodeling gene regulation networks”, BMC Bioinformatics,Vol.11, pp.459-484, 2010.
    M. Scholz, F. Kaplan, C.L. Guy et al., “Non-linear PCA: a missingdata approach”, Bioinformatics, Vol.21, No.20, pp.3887-3895, 2005.
    R. Jörnsten, H.Y. Wang, J.W. William et al., DNA microarraydata imputation and significance analysis of differential expression”,Bioinformatics, Vol.21, No.22, pp.4155-4161, 2005.
    O. Troyanskaya, M. Cantor, G. Sherlock et al., “Missing valueestimation methods for DNA microarrays”, Bioinformatics,Vol.17, No.6, pp.520-525, 2001.
    K.Y. Kim, B.J. Kim, G.S. Yi, “Reuse of imputed data in microarrayanalysis increases imputation efficiency”, BMC Bioinformatics,Vol.5, pp.160-169, 2004.
    L.P. Brás, J.C.Menezes, “Improving cluster-based missing valueestimation of DNA microaray data”, Biomolecular Engineering,Vol.24, No.2, pp.273-282, 2007.
    P.T. Spellman, G. Sherlock, M.Q. Zhang et al., “Comprehensiveidentification of cell cycle-regulated genes of the yeast Saccharomycescerevisiae by microarray hybridization”, MolecularBiology of the Cell, Vol.9, No.12, pp.3273-3297, 1998.
  • 加载中


    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (291) PDF downloads(1254) Cited by()
    Proportional views


    DownLoad:  Full-Size Img  PowerPoint