Volume 29 Issue 6
Dec.  2020
Turn off MathJax
Article Contents
ZHANG Xiaojiang and JIANG Ying, “Research and Application of Machine Learning in Automatic Program Generation,” Chinese Journal of Electronics, vol. 29, no. 6, pp. 1001-1015, 2020, doi: 10.1049/cje.2020.10.006
Citation: ZHANG Xiaojiang and JIANG Ying, “Research and Application of Machine Learning in Automatic Program Generation,” Chinese Journal of Electronics, vol. 29, no. 6, pp. 1001-1015, 2020, doi: 10.1049/cje.2020.10.006

Research and Application of Machine Learning in Automatic Program Generation

doi: 10.1049/cje.2020.10.006
Funds:  This work is supported by the National Key Research and Development Program of China (No.2018YFB1003904), the National Natural Science Foundation of China under Grant (No.61462049, No.60703116 and No.61063006), and Key Project of Yunnan Applied Basic Research (No.2017FA033), and the Scientific Research Fund Project of the Yunnan Education (No.2020Y0087).
More Information
  • Corresponding author: JIANG Ying (corresponding author) was born in 1974, Yunnan, China. She received her Ph.D. degree in computer software and theory from Peking University, China, in 2005. She is currently a professor of Kunming University of Science and Technology, Supervisor of Ph.D. Candidate. Her research interests include software quality assurance and testing, cloud computing, big data analysis and intelligent software engineering. (Email:jy_910@163.com)
  • Received Date: 2019-10-31
  • Publish Date: 2020-12-25
  • With the development of artificial intelligence, machine learning has been applied in more and more domains. In order to improve the quality and efficiency of software, automatic program generation is becoming a research hotspot. In recent years, machine learning has also been gradually applied in automatic program generation. Decision trees, language models, and cyclic neural networks have been applied in code generation, code completion and code knowledge mining. The efficiency of software development has been improved to a certain extent using machine learning. Aimed at the automatic program generation, this paper analyzes and summarizes the models of machine learning, the modifications involved in the models and the application effects. The research direction is discussed from the aspects of programmer behavior and automatic program generation of machine learning.
  • loading
  • D. Michie, D. J. Spiegelhalter, C. Taylor, et al., "Machine learning", Neuraland Statistical Classification, Vol. 13, No. 1994, pp.1-298, 1994.
    P. Soucy and G. W. Mineau, "A simple knn algorithm for text categorization", IEEE International Conference on Data Mining,pp.647-648, 2001.
    K. Krishna and M. N. Murty, "Genetic k-means algorithm", IEEE Trans-actions on Systems, Man, and Cybernetics, Part B (Cybernetics), Vol. 29, No. 3, pp.433-439, 1999.
    J. F. Magee, "Decision trees for decision making", Harvard Business Review, Vol.42, No.4, pp.126-138, 1964.
    A. Liaw and M. Wiener, "Classification and regression by randomforest",R news, Vol. 2, No. 3, pp.18-22, 2002.
    G. A. Seber and A. J. Lee, Linear Regression Analysis, John Wiley & Sons, Vol.329, 2012.
    R. E. Wright, "Logistic regression", Reading & Understanding Multivariate Stats, Vol.68, No.3, pp.497-507, 1995.
    Y. LeCun, Y. Bengio, and G. Hinton, "Deep learning", Nature, Vol. 521, No. 7553, pp.436-444, 2015.
    V. Raychev, P. Bielik, and M. Vechev, "Probabilistic model for code with decision trees", ACM SIGPLAN Notices, Vol. 51, No. 10, pp.731-747, 2016.
    M. Bruch, M. Monperrus, and M. Mezini, "Learning from examples to improve code completion systems", Proceedings of the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering, Amsterdam, Netherlands, pp.213-222, 2009.
    A. Hindle, E. T. Barr, Z. Su, M. Gabel, et al., "On the naturalness of software", International Conference on Software Engineering (ICSE). IEEE, Zurich, Switzerland, pp.837-847, 2012.
    T. T. Nguyen, A. T. Nguyen, H. A. Nguyen, et al., "A statistical semantic language model for source code", Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, New York, NY, United States, pp.532-542, 2013.
    Z. Tu, Z. Su, and P. Devanbu, "On the localness of software", Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, New York, NY, United States, pp.269-280, 2014.
    V. J. Hellendoorn and P. Devanbu, "Are deep neural networks the best choice for modeling source code?", Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, New York, NY, United States, pp.763-773, 2017.
    V. Raychev, M. Vechev, and E. Yahav, "Code completion with statistical language models", Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, New York, NY, United States, pp.419-428, 2014.
    M. Allamanis, E. T. Barr, C. Bird, et al., "Suggesting accurate method and class names", Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, New York, NY, United States, pp.38-49, 2015.
    M. Allamanis, D. Tarlow, A. Gordon, and Y. Wei, "Bimodal modelling of source code and natural language", International conference on machine learning, United Kingdom, pp.2123-2132, 2015.
    S. Reed and N. De Freitas, "Neural programmer-interpreters", International Conference on Learning Representations,Caribe Hilton, San Juan, Puerto Rico, 2016.
    C. Li, D. Tarlow, A. L. Gaunt, M. Brockschmidt, et al., "Neural program lattices", International Conference on Learning Representations, Caribe Hilton, San Juan, Puerto Rico, 2016.
    V. Raychev, P. Bielik, M. Vechev, et al., "Learning programs from noisy data", ACM SIGPLAN Notices, Vol. 51, No. 1, pp.761-774, 2016.
    C. Liu, X. Wang, R. Shin, et al., "Neural code completion," https://openreview.net/forum, 2017-1-23.
    A. Bhoopchand, T. Rocktaschel, E. Barr, et al., "Learning python code suggestion with a sparse pointer network", International Conference on Learning Representations, Toulon, France, 2016.
    S. Kim, J. Zhao, Y. Tian, et al., "Code prediction by feeding trees to transformers", Computer Science ArXiv, Vol.2003, Article ID 13848, 2020.
    X. Gu, H. Zhang, D. Zhang, et al., "Deep api learning", Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, New York, NY, United States, pp.631-642, 2016.
    L. Mou, R. Men, G. Li, et al., "On end-to-end program generation from user intention by deep neural networks", Computing Research Repository, Vol.1510, Article ID 07211, 2015.
    V. Zhong, C. Xiong, and R. Socher, "Seq2sql:Generating structured queries from natural language using reinforcement learning", International Conference on Learning Representations, Vancouver CANADA, 2018.
    R. Cai, B. Xu, X. Yang, et al., "An encoder-decoder framework translating natural language to database queries", Proceedings of the 27th International Joint Conference on Artificial Intelligence, Stockholm, pp.3977-3983, 2017.
    J. Devlin, J. Uesato, S. Bhupatiraju, et al., "Robustfill:Neural program learning under noisy i/o", Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, pp.990-998, 2017.
    C. Wang, X. Peng, M. Liu, et al., "A learning-based approach for automatic construction of domain glossary from source code and documentation", Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, New York, NY, United States, pp.97-108, 2019.
    L. Mou, G. Li, L. Zhang, et al., "Convolutional neural networks over tree structures for programming language processing", AAAI Publications, Thirtieth AAAI Conference on Artificial Intelligence, Phoenix, Arizona USA, 2016.
    T. D. Nguyen, A. T. Nguyen, and T. N. Nguyen, "Mapping api elements for code migration with vector representations", IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C), New York, NY, United States, pp.756-758, 2016.
    T. D. Nguyen, A. T. Nguyen, H. D. Phan, et al., "Exploring api embedding for api usages and applications", IEEE/ACM 39th International Conference on Software Engineering (ICSE), Buenos Aires, Argentina, pp.438-449, 2017.
    X. Gu, H. Zhang, D. Zhang, et al., "Deepam:Migrate apis with multi-modal sequence to sequence learning", Proceedings of the 26th International Joint Conference on Artificial Intelligence, Melbourne, Australia, pp.3675-3681, 2017.
    M. Liu, X. Peng, A. Marcus, et al., "Generating query-specific class api summaries", Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, New York, NY, United States, pp.120-130, 2019.
    C. Chen and Z. Xing, "Similartech:automatically recommend analogical libraries across different programming languages", Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, Singapore, Singapore, pp.834-839, 2016.
    M. Allamanis, M. Brockschmidt, and M. Khademi, "Learning to represent programs with graphs", International Conference on Learning Representations, Vancouver, BC, Canada, 2018.
    X. Li, Z. Wang, Q. Wang, et al., "Relationship-aware code search for javascript frameworks", Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, New York, NY, United States, pp.690-701, 2016.
    M. Balog, A. L. Gaunt, M. Brockschmidt, et al., "Deepcoder:Learning to write programs", International Conference on Learning Representations, Toulon, France, 2017.
    P. Bielik, V. Raychev, and M. Vechev, "Program synthesis for character level language modeling", International Conference on Learning Representations, Toulon, France, 2016.
    J. Li, Y. Wang, M. R. Lyu, et al., "Code completion with neural attention and pointer networks", Proceedings of the 27th International Joint Conference on Artificial Intelligence, Stockholm, pp.4159-25, 2017.
    D. Jurafsky and J. H. Martin, "Speech and language processing:Pearson new international edition", Pearson New International Edition, 2013.
    L. Mou, G. Li, Y. Liu, et al., "Building program vector representations for deep learning", International Conference on Knowledge Science, Engineering and Management, Chongqing, China, pp.547-553, 2014.
    T. Mikolov, M. Karafiat, L. Burget, et al., "Recurrent neural network based language model", INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, Makuhari, Chiba, Japan, 2010.
    A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks", Neural Information Processing Systems, Harrahs and Harveys, Lake Tahoe, pp.1097-1105, 2012.
    X. Rong, Word2vec Parameter Learning Explained, Computer Ence, 2014.
    J. Turian, L. Ratinov, and Y. Bengio, "Word representations:a simple and general method for semi-supervised learning", Proceedings of the 48th annual meeting of the association for computational linguistics, Uppsala, Sweden, pp.384-394, 2010.
    S. Yan, H. Yu, Y. Chen, et al., "Are the code snippets what we are searching for? a benchmark and an empirical study on code search with natural-language queries", 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER), London, ON, Canada, Canada, pp.344-354, 2020.
    F. Scarselli, M. Gori, A. C. Tsoi, et al., "The graph neural network model", IEEE Transactions on Neural Networks, Vol. 20, No. 1, pp.61-80, 2008.
    C. Liu, X. Chen, E. C. Shin, et al., "Latent attention for if-then program synthesis", Proceedings of the 30th International Conference on Neural Information Processing Systems, Red Hook, NY, United States, pp.4574-4582, 2016.
    L. Dong and M. Lapata, "Language to logical form with neural attention", Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, 2016.
    F. Liu, L. Zhang, and Z. Jin, "Modeling programs hierarchi-cally with stack-augmented lstm", Journal of Systems and Software, Vol. 164, pp.110547, 2020.
    V. Le, S. Gulwani, and Z. Su, "Smartsynth:Synthesizing smartphone automation scripts from natural language",Proceeding of the 11th annual international conference on Mobile systems, applications, and services, Taipei, Taiwan, pp.193-206, 2013.
    M. Raghothaman, Y. Wei, and Y. Hamadi, "Swim:Synthesizing what i mean-code search and idiomatic snippet synthesis", 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE). IEEE, Austin, TX, USA, pp.357-367, 2016.
    R.-M. Karampatsis, H. Babii, R. Robbes, et al., "Big code!=big vocabulary:Open-vocabulary models for source code", International Conference on Software Engineering, Gyeonghoeru Pavilion, Seoul, South Korea, 2020.
    T. Gvero and V. Kuncak, "Interactive synthesis using freeform queries", IEEE International Conference on Software Engineering, Florence, Italy, Vol. 2, pp.689-692, 2015.
    P. Bielik, V. Raychev, and M. Vechev, "Phog:Probabilistic model forcode", inInternational Conference on Machine Learning, New York City, NY, USA, pp.2933-2942, 2016.
    C. Shu and H. Zhang, "Neural programming by example", Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, California USA, pp.1539-1545, 2017.
    E. Parisotto, A.-r. Mohamed, R. Singh, et al., "Neurosymbolic program synthesis", International Conference on Learning Representations, Palais des Congrès Neptune, Toulon, France, 2016.
    N. Kushman and R. Barzilay, "Using semantic unification to generate regular expressions from natural language", Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies, Atlanta, Georgia, pp.826-836, 2013.
    W. Ye, R. Xie, J. Zhang, et al., "Leveraging code generation to improve code retrieval and summarization via dual learning", Proceedings of The Web Conference 2020, New York, NY, United States, pp. 2309-2319, 2020.
    H. X, L. G, L. F, and J. Z, "Program generation and code completion techniques based on deep learning:Literature review", Ruan Jian Xue Bao/Journal of Software, Vol.030, No.005, pp.1206-1223, 2019.
    C. Yang, G. Liu, C. Yan, et al., "A clustering-based flexible weighting method in adaboost and its application to transaction fraud detection", Information ences, China, 2020.
    S. Xuan, G. Liu, and Z. Li, "Refined weighted random forest and its application to credit card fraud detection", International Conference on Computational Social Networks, Shanghai, China, pp.343-355, 2018.
    X. Zhang, Y. Jiang, and Z. Wang, "Analysis of automatic code generation tools based on machine learning", IEEE International Conference on Computer Science and Educational Informatization (CSEI). IEEE, Kunming, China, China, pp.263-270, 2019.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (1346) PDF downloads(405) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return