LUO Changwei, LI Rui, YU Lingyun, YU Jun, WANG Zengfu. Automatic Tongue Tracking in X-Ray Images[J]. Chinese Journal of Electronics, 2015, 24(4): 767-771. doi: 10.1049/cje.2015.10.017
Citation: LUO Changwei, LI Rui, YU Lingyun, YU Jun, WANG Zengfu. Automatic Tongue Tracking in X-Ray Images[J]. Chinese Journal of Electronics, 2015, 24(4): 767-771. doi: 10.1049/cje.2015.10.017

Automatic Tongue Tracking in X-Ray Images

doi: 10.1049/cje.2015.10.017
Funds:  This work is supported by the Open Project Program of the State Key Lab of CAD&CG, Zhejiang University (No.A1501), and the National Natural Science Foundation of China (No.61303150, No.61472393).
More Information
  • Corresponding author: YU Jun (corresponding author) wasborn in 1983. He is a research associate professorof University of Science and Technologyof China. His research interests includehuman computer interaction and intelligentrobot. (Email:
  • Received Date: 2015-03-09
  • Rev Recd Date: 2015-06-14
  • Publish Date: 2015-10-10
  • X-ray imaging is an effective technique to obtain the continuous motions of the vocal tract during speech, and Active appearance model (AAM) is a useful tool to analyze the X-ray images. However, for the task of tongue tracking in X-ray images, the accuracy of AAM fitting is insufficient. AAM aims to minimize the residual error between the model appearance and the input image. It often fails to accurately converge to the true landmarks. To improve the tracking accuracy, we propose a fitting method by combining Constrained local model (CLM) into AAM. In our method, we first combine the objective functions of AAM and CLM into a single objective function. Then, we project out the texture variation and derive a gradient based method to optimize the objective function. Our method effectively incorporates not only the shape prior and global texture, but also local texture around each landmark. Experiments demonstrate that the proposed method significantly reduces the fitting error. We also show that realistic 3D tongue animation can be created by using tongue tracking results of the X-ray images.
  • loading
  • G. Wang and J. Kong, "The relation between larynx height and f0 during the four tones of mandarin in X-ray movie", Proc. of the 7th International Symposium on Chinese Spoken Language Processing, Tainan, Taiwan, pp.335-338, 2010.
    W. Wu, S. Wang, E. Kuruoglu, et al., "Optimization of lip contour estimation", Chinese Journal of Electronics, Vol.23, No.2, pp.341-347, 2014.
    M. Berger and Y. Laprie, "Tracking articulators in X-ray images with minimal user interaction: Example of the tongue extraction", Proc. of IEEE International Conference on Image Processing, Lausanne, Switzerland, pp.289-292, 1996.
    M. Yang, J. Tao and D. Zhang, "Extraction of tongue contour in X-ray videos", Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver BC, Canada, pp.1094-1098, 2013.
    Y. Chen, F. Yu and C. Ai, "Sequential active appearance model based on online instance learning", IEEE Signal Processing Letters, Vol.20, No.6, pp.567-570, 2013.
    R. Beichel and H. Bischo, "Robust active appearance models and their application to medical image analysis", IEEE Transactions on Medical Imaging, Vol.24, No.9, pp.1151-1168, 2005.
    A. Rousos, A. Katsamanis and P. Maragos, "Tongue tracking in ultrasound images with active apperance models", Proc. of IEEE International Conference on Image Processing, Cairo, Egypt, pp.1733-1736, 2009.
    L. Wang, B. Zou, X. Peng, et al., "An improved AAM fitting algorithm for extracting human facial features", Acta Electronica Sinica, Vol.34, No.8, pp.1424-1427, 2006. (in Chinese)
    Y. Wang, S. Lucey and J. Cohn, "Enforcing convexity for improved alignment with constrained local models", Proc. of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Anchorage, Alaska, USA, pp.1-8, 2008.
    I. Matthews and S. Baker, "Active appearance models revisited", International Journal of Computer Vision, Vol.60, No.2, pp.135-164, 2004.
    Y. Laprie, M. Loosvelt, S. Maeda, et al., "Articulatory copy synthesis from cine X-ray films", Proc. of InterSpeech-14th Annual Conference of the International Speech Communication Association, Lyon, France, pp.2024-2028, 2013.
    J. Yu and Z. Wang, "A video, text, and speech-driven realistic 3-d virtual head for human-machine interface", IEEE Transactions on Cybernetics, Vol.45, No.5, pp.977-988, 2015.
    O. Engwall, "Combining MRI, EMA and EPG measurements in a three-dimensional tongue model", Speech Communication, Vol.41, No.2-3, pp.303-329, 2003.
    A. Scott and E. Richard, "A 3D parametric tongue model for animated speech", Journal of Visualization and Computer Animation, Vol.12, No.3, pp.107-115, 2001.
  • 加载中


    通讯作者: 陈斌,
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (189) PDF downloads(658) Cited by()
    Proportional views


    DownLoad:  Full-Size Img  PowerPoint