Volume 32 Issue 3
May  2023
Turn off MathJax
Article Contents
LIU Gongshen, DU Wei, ZHOU Jie, et al., “A Semi-shared Hierarchical Joint Model for Sequence Labeling,” Chinese Journal of Electronics, vol. 32, no. 3, pp. 519-530, 2023, doi: 10.23919/cje.2020.00.363
Citation: LIU Gongshen, DU Wei, ZHOU Jie, et al., “A Semi-shared Hierarchical Joint Model for Sequence Labeling,” Chinese Journal of Electronics, vol. 32, no. 3, pp. 519-530, 2023, doi: 10.23919/cje.2020.00.363

A Semi-shared Hierarchical Joint Model for Sequence Labeling

doi: 10.23919/cje.2020.00.363
Funds:  This work was supported by the Joint Funds of the National Natural Science Foundation of China (U1636112)
More Information
  • Author Bio:

    Gongshen LIU received the Ph.D. degree in the Department of Computer Science, Shanghai Jiao Tong University (SJTU), China, in 2003. He is currently a Professor of SJTU. His research interests include natural language processing, machine learning, and artificial intelligent security. (Email: lgshen@sjtu.edu.cn)

    Wei DU received the B.E. degree from Xidian University, Xi’an, China, in 2020. He is currently working toward the Ph.D. degree with the School of Cyber Science and Engineering, Shanghai Jiao Tong University, Shanghai, China. His research interests include natural language processing and artificial intelligent security. (Email: dddddw@sjtu.edu.cn)

    Jie ZHOU received the M.E. degree in the School of Cyber Science and Engineering from Shanghai Jiao Tong University, Shanghai, China, in 2020. She has focused on natural language processing (NLP) since 2018. Currently, her research interests include machine learning, NLP fundamental tasks graph neural networks, and recommender system. (Email: sanny02@sjtu.edu.cn)

    Jing LI received the M.S. degree in computer science from Beijing University of Posts and Telecommunications, China, in 2003. She passed the Professor Level Engineer certification in State Grid. Her research interests include computer network and information security. (Email: 1713615427@qq.com)

    Jie CHENG received the M.S. degree in computer application technology from Beijing University of Posts and Telecommunications, China, in 2010. He joined State Grid Information and Telecommunication Branch in the same year and passed CISSP in 2020. His main research interests include enterprise-class cybersecurity, threat hunting, and XDR. (Email: 108916685@qq.com)

  • Received Date: 2020-11-01
  • Accepted Date: 2022-02-20
  • Available Online: 2022-04-19
  • Publish Date: 2023-05-05
  • Multi-task learning is an essential yet practical mechanism for improving overall performance in various machine learning fields. Owing to the linguistic hierarchy, the hierarchical joint model is a common architecture used in natural language processing. However, in the state-of-the-art hierarchical joint models, higher-level tasks only share bottom layers or latent representations with lower-level tasks thus ignoring correlations between tasks at different levels, i.e., lower-level tasks cannot be instructed by the higher features. This paper investigates how to advance the correlations among various tasks supervised at different layers in an end-to-end hierarchical joint learning model. We propose a semi-shared hierarchical model that contains cross-layer shared modules and layer-specific modules. To fully leverage the mutual information between various tasks at different levels, we design four different dataflows of latent representations between the shared and layer-specific modules. Extensive experiments on CTB-7 and CONLL-2009 show that our semi-shared approach outperforms basic hierarchical joint models on sequence tagging while having much fewer parameters. It inspires us that the proper implementation of the cross-layer sharing mechanism and residual shortcuts is promising to improve the performance of hierarchical joint natural language processing models while reducing the model complexity.
  • https://github.com/strubell/LISA
    https://dumps.wikimedia.org/zhwiki/
    https://cl.lingfil.uu.se/nivre/research/Penn2Malt.html
  • loading
  • [1]
    I. Misra, A. Shrivastava, A. Gupta, et al., “Cross-stitch networks for multi-task learning,” in Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, pp.3994–4003, 2016.
    [2]
    L. Deng, G. Hinton, and B. Kingsbury, “New types of deep neural network learning for speech recognition and related applications: An overview,” in Proceedings of 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, Canada, pp.8599–8603, 2013.
    [3]
    R. Collobert and J. Weston, “A unified architecture for natural language processing: deep neural networks with multitask learning,” in Proceedings of the 25th International Conference on Machine Learning, Helsinki, Finland, pp.160–167, 2008.
    [4]
    D. X. Dong, H. Wu, W. He, et al., “Multi-task learning for multiple language translation,” in Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Beijing, China, pp.1723–1732, 2015.
    [5]
    P. F. Liu, X. P. Qiu, and X. J. Huang, “Recurrent neural network for text classification with multi-task learning,” in Proceedings of the 25th International Joint Conference on Artificial Intelligence, New York, NY, USA, pp.2873–2879, 2016.
    [6]
    E. Strubell, P. Verga, D. Andor, et al., “Linguistically-informed self-attention for semantic role labeling,” in Proceedings of 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, pp.5027–5038.
    [7]
    J. Baxter, “A Bayesian/information theoretic model of learning to learn via multiple task sampling,” Machine Learning, vol.28, no.1, pp.7–39, 1997. doi: 10.1023/A:1007327622663
    [8]
    X. D. Liu, P. C. He, W. Z. Chen, et al., “Multi-task deep neural networks for natural language understanding,” in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, pp.4487–4496, 2019.
    [9]
    A. Sgaard and Y. Goldberg, “Deep multi-task learning with low level tasks supervised at lower layers,” in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, pp.231–235, 2016.
    [10]
    K. Hashimoto, C. M. Xiong, Y. Tsuruoka, et al., “A joint many-task model: Growing a neural network for multiple NLP tasks,” in Proceedings of 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, pp.1923–1933, 2017.
    [11]
    V. Sanh, T. Wolf, and S. Ruder, “A hierarchical multi-task approach for learning embeddings from semantic tasks,” in Proceedings of the 33rd AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, pp.6949–6956, 2019.
    [12]
    J. K. Chen, K. Y. Chen, X. C. Chen, et al., “Exploring shared structures and hierarchies for multiple NLP tasks,” arXiv preprint, arXiv: 1808.07658, 2018.
    [13]
    C. S. Gao, J. F. Zhang, W. P. Li, et al., “A joint model of named entity recognition and coreference resolution based on hybrid neural network,” Acta Electronica Sinica, vol.48, no.3, pp.442–448, 2020. (in Chinese) doi: 10.3969/j.issn.0372-2112.2020.03.004
    [14]
    A. Vaswani, N. Shazeer, N. Parmar, et al., “Attention is all you need,” in Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, pp.6000–6010, 2017.
    [15]
    A. L. Maas, A. Y. Hannun, and A. Y. Ng, “Rectifier nonlinearities improve neural network acoustic models,” in Proceedings of the 30th International Conference on Machine Learning, Atlanta, GA, USA, vol.28, article no.3, 2013.
    [16]
    D. Han, P. Martínez-Gómez, Y. Miyao, et al., “Effects of parsing errors on pre-reordering performance for Chinese-to-Japanese SMT,” in Proceedings of the 27th Pacific Asia Conference on Language, Information, and Computation, Taipei, China, pp.267–276, 2013.
    [17]
    T. Mikolov, I. Sutskever, K. Chen, et al., “Distributed representations of words and phrases and their compositionality,” in Proceedings of the 26th International Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA, pp.3111–3119, 2013.
    [18]
    T. Dozat and C. D. Manning, “Deep biaffine attention for neural dependency parsing,” in Proceedings of the 5th International Conference on Learning Representations, Toulon, France, 2017.
    [19]
    J. D. Lafferty, A. McCallum, and F. C. N. Pereira, “Conditional random fields: Probabilistic models for segmenting and labeling sequence data,” in Proceedings of the 18th International Conference on Machine Learning, Williamstown, MA, USA, pp.282–289, 2001.
    [20]
    G. M. Ling, A. P. Xu, and W. Wang, “Research of address information automatic annotation based on deep learning,” Acta Electronica Sinica, vol.48, no.11, pp.2081–2091, 2020. (in Chinese) doi: 10.3969/j.issn.0372-2112.2020.11.001
    [21]
    Y. O. Wang, J. I. Kazama, Y. Tsuruoka, et al., “Improving Chinese word segmentation and POS tagging with semi-supervised methods using large auto-analyzed data,” in Proceedings of the 5th International Joint Conference on Natural Language Processing, Chiang Mai, Thailand, pp.309–317, 2011.
    [22]
    Y. Zhang and S. Clark, “A tale of two parsers: Investigating and combining graph-based and transition-based dependency parsing,” in Proceedings of 2008 Conference on Empirical Methods in Natural Language Processing, Honolulu, HI, USA, pp.562–571, 2008.
    [23]
    W. X. Che, Z. H. Li, Y. Q. Li, et al., “Multilingual dependency-based syntactic and semantic parsing,” in Proceedings of the Thirteenth Conference on Computational Natural Language Learning: Shared Task, Boulder, CO, USA, pp.49–54, 2009.
    [24]
    H. Zhao, W. L. Chen, J. Kazama, et al., “Multilingual dependency learning: Exploiting rich features for tagging syntactic and semantic dependencies,” in Proceedings of the Thirteenth Conference on Computational Natural Language Learning: Shared Task, Boulder, CO, USA, pp.61–66, 2009.
    [25]
    A. Gesmundo, J. Henderson, P. Merlo, et al., “A latent variable model of synchronous syntactic-semantic parsing for multiple languages,” in Proceedings of the 13th Conference on Computational Natural Language Learning, Boulder, CO, USA, pp.37–42, 2014.
    [26]
    D. Andor, C. Alberti, D. Weiss, et al., “Globally normalized transition-based neural networks,” in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin, Germany, pp.2442–2452, 2012.
    [27]
    H. Yan, X. P. Qiu, and X. J. Huang, “A unified model for joint Chinese word segmentation and dependency parsing,” arXiv preprint, arXiv: 1904.04697, 2019.
    [28]
    M. Ballesteros, C. Dyer, and N. A. Smith, “Improved transition-based parsing by modeling characters instead of words with LSTMs,” in Proceedings of 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, pp.349–359, 2015.
    [29]
    H. Zhang and R. McDonald, “Enforcing structural diversity in cube-pruned dependency parsing,” in Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, MD, USA, pp.656–661, 2014.
    [30]
    T. Lei, Y. Xin, Y. Zhang, et al., “Low-rank tensors for scoring dependency structures,” in Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Baltimore, MD, USA, pp.1381–1391, 2014.
    [31]
    B. Bohnet and J. Nivre, “A transition-based system for joint part-of-speech tagging and labeled non-projective dependency parsing,” in Proceedings of 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Jeju Island, Korea, pp.1455–1465, 2012.
    [32]
    C. Alberti, D. Weiss, G. Coppola, et al., “Improved transition-based parsing and tagging with neural networks,” in Proceedings of 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, pp.1354–1359, 2015.
    [33]
    X. Z. Ma and E. Hovy, “End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF,” in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin, Germany, pp.1064–1074, 2016.
    [34]
    Z. H. Huang, W. Xu, and K. Yu, “Bidirectional LSTM-CRF models for sequence tagging,” arXiv preprint, arXiv: 1508.01991, 2015.
    [35]
    S. Pentyala, M. W. Liu, and M. Dreyer, “Multi-task networks with universe, group, and task feature learning,” in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, pp.820–830, 2019.
    [36]
    S. Ruder, J. Bingel, I. Augenstein, et al., “Latent multi-task architecture learning,” in Proceedings of the 33rd AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, pp.4822–4829, 2019.
    [37]
    S. K. Liu, E. Johns, and A. J. Davison, “End-to-end multi-task learning with attention,” in Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, pp.1871–1880, 2019.
    [38]
    T. Standley, A. Zamir, D. Chen, et al., “Which tasks should be learned together in multi-task learning?,” in Proceedings of the 37th International Conference on Machine Learning, Virtual Event, pp.9120–9132, 2020.
    [39]
    J. M. Ye, D. X. Luo, and S. Chen, “A text error correction model based on hierarchical editing framework,” Acta Electronica Sinica, vol.49, no.2, pp.401–407, 2021. (in Chinese) doi: 10.12263/DZXB.20200448
    [40]
    C. Feng, C. Liao, Z. R. Liu, et al., “Sentiment key sentence identification based on lexical semantics and syntactic dependency,” Acta Electronica Sinica, vol.44, no.10, pp.2471–2476, 2016. (in Chinese) doi: 10.3969/j.issn.0372-2112.2016.10.027
    [41]
    Y. Gong, X. S. Luo, Y. Zhu, et al., “Deep cascade multi-task learning for slot filling in online shopping assistant,” in Proceedings of the 33rd AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, pp.6465–6472, 2019.
    [42]
    Z. P. Wei, Y. T. Jia, Y. Tian, et al., “Joint extraction of entities and relations with a hierarchical multi-task tagging model,” arXiv preprint, arXiv: 1908.08672, 2019.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(6)  / Tables(8)

    Article Metrics

    Article views (855) PDF downloads(19) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return