Citation: | LIU Gongshen, DU Wei, ZHOU Jie, et al., “A Semi-shared Hierarchical Joint Model for Sequence Labeling,” Chinese Journal of Electronics, vol. 32, no. 3, pp. 519-530, 2023, doi: 10.23919/cje.2020.00.363 |
[1] |
I. Misra, A. Shrivastava, A. Gupta, et al., “Cross-stitch networks for multi-task learning,” in Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, pp.3994–4003, 2016.
|
[2] |
L. Deng, G. Hinton, and B. Kingsbury, “New types of deep neural network learning for speech recognition and related applications: An overview,” in Proceedings of 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, Canada, pp.8599–8603, 2013.
|
[3] |
R. Collobert and J. Weston, “A unified architecture for natural language processing: deep neural networks with multitask learning,” in Proceedings of the 25th International Conference on Machine Learning, Helsinki, Finland, pp.160–167, 2008.
|
[4] |
D. X. Dong, H. Wu, W. He, et al., “Multi-task learning for multiple language translation,” in Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Beijing, China, pp.1723–1732, 2015.
|
[5] |
P. F. Liu, X. P. Qiu, and X. J. Huang, “Recurrent neural network for text classification with multi-task learning,” in Proceedings of the 25th International Joint Conference on Artificial Intelligence, New York, NY, USA, pp.2873–2879, 2016.
|
[6] |
E. Strubell, P. Verga, D. Andor, et al., “Linguistically-informed self-attention for semantic role labeling,” in Proceedings of 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, pp.5027–5038.
|
[7] |
J. Baxter, “A Bayesian/information theoretic model of learning to learn via multiple task sampling,” Machine Learning, vol.28, no.1, pp.7–39, 1997. doi: 10.1023/A:1007327622663
|
[8] |
X. D. Liu, P. C. He, W. Z. Chen, et al., “Multi-task deep neural networks for natural language understanding,” in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, pp.4487–4496, 2019.
|
[9] |
A. Sgaard and Y. Goldberg, “Deep multi-task learning with low level tasks supervised at lower layers,” in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, pp.231–235, 2016.
|
[10] |
K. Hashimoto, C. M. Xiong, Y. Tsuruoka, et al., “A joint many-task model: Growing a neural network for multiple NLP tasks,” in Proceedings of 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, pp.1923–1933, 2017.
|
[11] |
V. Sanh, T. Wolf, and S. Ruder, “A hierarchical multi-task approach for learning embeddings from semantic tasks,” in Proceedings of the 33rd AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, pp.6949–6956, 2019.
|
[12] |
J. K. Chen, K. Y. Chen, X. C. Chen, et al., “Exploring shared structures and hierarchies for multiple NLP tasks,” arXiv preprint, arXiv: 1808.07658, 2018.
|
[13] |
C. S. Gao, J. F. Zhang, W. P. Li, et al., “A joint model of named entity recognition and coreference resolution based on hybrid neural network,” Acta Electronica Sinica, vol.48, no.3, pp.442–448, 2020. (in Chinese) doi: 10.3969/j.issn.0372-2112.2020.03.004
|
[14] |
A. Vaswani, N. Shazeer, N. Parmar, et al., “Attention is all you need,” in Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, pp.6000–6010, 2017.
|
[15] |
A. L. Maas, A. Y. Hannun, and A. Y. Ng, “Rectifier nonlinearities improve neural network acoustic models,” in Proceedings of the 30th International Conference on Machine Learning, Atlanta, GA, USA, vol.28, article no.3, 2013.
|
[16] |
D. Han, P. Martínez-Gómez, Y. Miyao, et al., “Effects of parsing errors on pre-reordering performance for Chinese-to-Japanese SMT,” in Proceedings of the 27th Pacific Asia Conference on Language, Information, and Computation, Taipei, China, pp.267–276, 2013.
|
[17] |
T. Mikolov, I. Sutskever, K. Chen, et al., “Distributed representations of words and phrases and their compositionality,” in Proceedings of the 26th International Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA, pp.3111–3119, 2013.
|
[18] |
T. Dozat and C. D. Manning, “Deep biaffine attention for neural dependency parsing,” in Proceedings of the 5th International Conference on Learning Representations, Toulon, France, 2017.
|
[19] |
J. D. Lafferty, A. McCallum, and F. C. N. Pereira, “Conditional random fields: Probabilistic models for segmenting and labeling sequence data,” in Proceedings of the 18th International Conference on Machine Learning, Williamstown, MA, USA, pp.282–289, 2001.
|
[20] |
G. M. Ling, A. P. Xu, and W. Wang, “Research of address information automatic annotation based on deep learning,” Acta Electronica Sinica, vol.48, no.11, pp.2081–2091, 2020. (in Chinese) doi: 10.3969/j.issn.0372-2112.2020.11.001
|
[21] |
Y. O. Wang, J. I. Kazama, Y. Tsuruoka, et al., “Improving Chinese word segmentation and POS tagging with semi-supervised methods using large auto-analyzed data,” in Proceedings of the 5th International Joint Conference on Natural Language Processing, Chiang Mai, Thailand, pp.309–317, 2011.
|
[22] |
Y. Zhang and S. Clark, “A tale of two parsers: Investigating and combining graph-based and transition-based dependency parsing,” in Proceedings of 2008 Conference on Empirical Methods in Natural Language Processing, Honolulu, HI, USA, pp.562–571, 2008.
|
[23] |
W. X. Che, Z. H. Li, Y. Q. Li, et al., “Multilingual dependency-based syntactic and semantic parsing,” in Proceedings of the Thirteenth Conference on Computational Natural Language Learning: Shared Task, Boulder, CO, USA, pp.49–54, 2009.
|
[24] |
H. Zhao, W. L. Chen, J. Kazama, et al., “Multilingual dependency learning: Exploiting rich features for tagging syntactic and semantic dependencies,” in Proceedings of the Thirteenth Conference on Computational Natural Language Learning: Shared Task, Boulder, CO, USA, pp.61–66, 2009.
|
[25] |
A. Gesmundo, J. Henderson, P. Merlo, et al., “A latent variable model of synchronous syntactic-semantic parsing for multiple languages,” in Proceedings of the 13th Conference on Computational Natural Language Learning, Boulder, CO, USA, pp.37–42, 2014.
|
[26] |
D. Andor, C. Alberti, D. Weiss, et al., “Globally normalized transition-based neural networks,” in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin, Germany, pp.2442–2452, 2012.
|
[27] |
H. Yan, X. P. Qiu, and X. J. Huang, “A unified model for joint Chinese word segmentation and dependency parsing,” arXiv preprint, arXiv: 1904.04697, 2019.
|
[28] |
M. Ballesteros, C. Dyer, and N. A. Smith, “Improved transition-based parsing by modeling characters instead of words with LSTMs,” in Proceedings of 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, pp.349–359, 2015.
|
[29] |
H. Zhang and R. McDonald, “Enforcing structural diversity in cube-pruned dependency parsing,” in Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, MD, USA, pp.656–661, 2014.
|
[30] |
T. Lei, Y. Xin, Y. Zhang, et al., “Low-rank tensors for scoring dependency structures,” in Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Baltimore, MD, USA, pp.1381–1391, 2014.
|
[31] |
B. Bohnet and J. Nivre, “A transition-based system for joint part-of-speech tagging and labeled non-projective dependency parsing,” in Proceedings of 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Jeju Island, Korea, pp.1455–1465, 2012.
|
[32] |
C. Alberti, D. Weiss, G. Coppola, et al., “Improved transition-based parsing and tagging with neural networks,” in Proceedings of 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, pp.1354–1359, 2015.
|
[33] |
X. Z. Ma and E. Hovy, “End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF,” in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin, Germany, pp.1064–1074, 2016.
|
[34] |
Z. H. Huang, W. Xu, and K. Yu, “Bidirectional LSTM-CRF models for sequence tagging,” arXiv preprint, arXiv: 1508.01991, 2015.
|
[35] |
S. Pentyala, M. W. Liu, and M. Dreyer, “Multi-task networks with universe, group, and task feature learning,” in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, pp.820–830, 2019.
|
[36] |
S. Ruder, J. Bingel, I. Augenstein, et al., “Latent multi-task architecture learning,” in Proceedings of the 33rd AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, pp.4822–4829, 2019.
|
[37] |
S. K. Liu, E. Johns, and A. J. Davison, “End-to-end multi-task learning with attention,” in Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, pp.1871–1880, 2019.
|
[38] |
T. Standley, A. Zamir, D. Chen, et al., “Which tasks should be learned together in multi-task learning?,” in Proceedings of the 37th International Conference on Machine Learning, Virtual Event, pp.9120–9132, 2020.
|
[39] |
J. M. Ye, D. X. Luo, and S. Chen, “A text error correction model based on hierarchical editing framework,” Acta Electronica Sinica, vol.49, no.2, pp.401–407, 2021. (in Chinese) doi: 10.12263/DZXB.20200448
|
[40] |
C. Feng, C. Liao, Z. R. Liu, et al., “Sentiment key sentence identification based on lexical semantics and syntactic dependency,” Acta Electronica Sinica, vol.44, no.10, pp.2471–2476, 2016. (in Chinese) doi: 10.3969/j.issn.0372-2112.2016.10.027
|
[41] |
Y. Gong, X. S. Luo, Y. Zhu, et al., “Deep cascade multi-task learning for slot filling in online shopping assistant,” in Proceedings of the 33rd AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, pp.6465–6472, 2019.
|
[42] |
Z. P. Wei, Y. T. Jia, Y. Tian, et al., “Joint extraction of entities and relations with a hierarchical multi-task tagging model,” arXiv preprint, arXiv: 1908.08672, 2019.
|