Turn off MathJax
Article Contents
Chenchen ZHANG, Qiuchi LI, Zhan SU, et al., “Word2State: Modeling Word Representations as States with Density Matrices,” Chinese Journal of Electronics, vol. x, no. x, pp. 1–12, xxxx doi: 10.23919/cje.2023.00.336
Citation: Chenchen ZHANG, Qiuchi LI, Zhan SU, et al., “Word2State: Modeling Word Representations as States with Density Matrices,” Chinese Journal of Electronics, vol. x, no. x, pp. 1–12, xxxx doi: 10.23919/cje.2023.00.336

Word2State: Modeling Word Representations as States with Density Matrices

doi: 10.23919/cje.2023.00.336
More Information
  • Author Bio:

    Chenchen ZHANG is a PhD student at the School of Computer Science and Technology, Beijing Institute of Technology, China. She received the M.E. degree from Communication University of China. Her research interest involves natural language processing, and her current research direction is language modeling driven by quantum probability theory. (Email: zhangchenchen@bit.edu.cn)

    Qiuchi LI received the PhD degree in information engineering from University of Padua in 2020. He is currently an assistant professor at Department of Computer Science, University of Copenhagen. Prior to this appointment, he was a postdoc (2021-2022) at University of Copenhagen. He is broadly interested in natural language processing and information retrieval, with a particular interest in quantum theoretical and computing frameworks for textual and multi-modal data representation and analysis. (Email: qiuchi.li@di.ku.dk)

    Zhan SU is a third-year PhD student at the Department of Computer Science, University of Copenhagen. He received the M.E. degree at Tianjin University (2016-2019), in China. He also worked as an algorithm researcher in Tencent Company (2019-2021) and a research internship in Mila, Montreal (2023). His research interests involve language modeling and tensor networks. (Email: zhan.su@di.ku.dk)

    Dawei SONG received the PhD degree in information systems from the Chinese University of Hong Kong, in 2000. He is currently a professor with the Beijing Institute of Technology. Prior to this appointment, he was a professor with Tianjin University (2012-2018), and a professor of computing with the Robert Gordon University, U.K. (2008-2012), where he remains as an honorary professor since 2012. He has also worked as a senior lecturer with the Knowledge Media Institute, Open University, U.K. (2005-2008), where he remains as a part-time professor since 2012; and as a research scientist (since 2000) and senior research scientist (since 2002) with the Cooperative Research Centre in Enterprise Distributed Systems Technology, Australia. His research interests include theory and formal models for natural language and multi-modal information processing, and user-centric information seeking. (Email: dwsong@bit.edu.cn)

  • Corresponding author: Email: dwsong@bit.edu.cn
  • Received Date: 2023-10-22
  • Accepted Date: 2024-05-09
  • Available Online: 2024-07-20
  • Polysemy is a common phenomenon in linguistics. Quantum-inspired complex word embeddings based on Semantic Hilbert Space play an important role in natural language processing (NLP), which may accurately define a genuine probability distribution over the word space. However, the existing quantum-inspired works manipulate on the real-valued vectors to compose the complex-valued word embeddings, which lack direct complex-valued pre-trained word representations. Motivated by quantum-inspired complex word embeddings, we propose a complex-valued pre-trained word embedding based on density matrices, called Word2State. Unlike the existing static word embeddings, our proposed model can provide non-linear semantic composition in the form of amplitude and phase, which also defines an authentic probabilistic distribution. We evaluate this model on twelve datasets from the word similarity task and six datasets from the relevant downstream tasks. The experimental results on different tasks demonstrate that our proposed pre-trained word embedding can capture richer semantic information and exhibit greater flexibility in expressing uncertainty.
  • loading
  • [1]
    R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval. ACM Press, New York, NY, USA, 1999.
    [2]
    Y. Bengio, R. Ducharme, P. Vincent, et al, “A neural probabilistic language model,” The Journal of Machine Learning Research, vol. 3, pp. 1137–1155, 2003
    [3]
    P. Zhang, J. B. Niu, Z. Su, et al, “End-to-end quantum-like language models with application to question answering,” in Proceedings of the 32nd AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, pp. 5666–5673, 2018.
    [4]
    Q. C. Li, B. Y. Wang, and M. Melucci, “CNM: An interpretable complex-valued network for matching,” in Proceedings of 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, pp. 4139–4148, 2019.
    [5]
    Y. W. Chen, Y. Pan, and D. Y. Dong, “Quantum language model with entanglement embedding for question answering,” IEEE Transactions on Cybernetics, vol. 53, no. 6, pp. 3467–3478, 2023 doi: 10.1109/TCYB.2021.3131252
    [6]
    P. Zhang, W. J. Hui, B. Y. Wang, et al, “Complex-valued neural network-based quantum language models,” ACM Transactions on Information Systems, vol. 40, no. 4, article no. article no. 1, 2022 doi: 10.1145/3505138
    [7]
    Q. C. Li, S. Uprety, B. Y. Wang, et al, “Quantum-inspired complex word embedding,” in Proceedings of the Third Workshop on Representation Learning for NLP, Melbourne, Australia, pp. 50–57, 2018.
    [8]
    B. Y. Wang, Q. C. Li, M. Melucci, et al, “Semantic hilbert space for text representation learning,” in Proceedings of World Wide Web Conference, San Francisco, CA, USA, pp. 3293–3299, 2019.
    [9]
    B. Y. Wang, D. H. Zhao, C. Lioma, et al, “Encoding word order in complex embeddings,” in Proceedings of the 8th International Conference on Learning Representations, Addis Ababa, Ethiopia, pp. 1–15, 2020.
    [10]
    J. R. Busemeyer and P. D. Bruza, Quantum Models of Cognition and Decision, Cambridge University Press, New York, NY, USA, 2012.
    [11]
    A. Mnih and G. Hinton, “Three new graphical models for statistical language modelling,” in Proceedings of the 24th International Conference on Machine Learning, Corvalis, OR, USA, pp. 641–648, 2007.
    [12]
    A. Mnih and G. Hinton, “A scalable hierarchical distributed language model,” in Proceedings of the 21st International Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, pp. 1081–1088, 2008.
    [13]
    R. Collobert, J. Weston, L. Bottou, et al, “Natural language processing (almost) from scratch,” The Journal of Machine Learning Research, vol. 12, pp. 2493–2537, 2011
    [14]
    T. Mikolov, M. Karafiát, L. Burget, et al, “Recurrent neural network based language model,” in Proceedings of the 11th Annual Conference of the International Speech Communication Association, Makuhari, Japan, pp. 1045–1048, 2010.
    [15]
    T. Mikolov, I. Sutskever, K. Chen, et al, “Distributed representations of words and phrases and their compositionality,” in Proceedings of the 26th International Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA, pp. 3111–3119, 2013.
    [16]
    Z. S. Harris, “Distributional structure,” Word, vol. 10, no. 2–3, pp. 146–162, 1954 doi: 10.1080/00437956.1954.11659520
    [17]
    J. Pennington, R. Socher, and C. D. Manning, “GloVe: Global vectors for word representation,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, Qatar, pp. 1532–1543, 2014.
    [18]
    A. Radford, J. Wu, R. Child, et al, “Language models are unsupervised multitask learners,” OpenAI, vol. 1, no. 8, article no. 9, 2019
    [19]
    J. Devlin, M. W. Chang, K. Lee, et al, “BERT: Pre-training of deep bidirectional transformers for language understanding,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, pp. 4171–4186, 2019.
    [20]
    C. J. van Rijsbergen, The Geometry of Information Retrieval. Cambridge University Press, New York, NY, USA, 2004.
    [21]
    A. Sordoni, J. Y. Nie, and Y. Bengio, “Modeling term dependencies with quantum language models for IR,” in Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, Dublin, Ireland, pp. 653–662, 2013.
    [22]
    P. Zhang, Z. Su, L. P. Zhang, et al, “A quantum many-body wave function inspired language modeling approach,” in Proceedings of the 27th ACM International Conference on Information and Knowledge Management, Torino, Italy, pp. 1303–1312, 2018.
    [23]
    Y. C. Liu, Q. C. Li, B. Y. Wang, et al, “A survey of quantum-cognitively inspired sentiment analysis models,” ACM Computing Surveys, vol. 56, no. 1, article no. article no., 2023
    [24]
    S. Merity, C. M. Xiong, J. Bradbury, et al, “Pointer sentinel mixture models,” in Proceedings of the 5th International Conference on Learning Representations, Toulon, France, pp. 1–15, 2017.
    [25]
    G. A. Miller and W. G. Charles, “Contextual correlates of semantic similarity,” Language and Cognitive Processes, vol. 6, no. 1, pp. 1–28, 1991 doi: 10.1080/01690969108406936
    [26]
    E. Bruni, N. K. Tran, and M. Baroni, “Multimodal distributional semantics,” Journal of Artificial Intelligence Research, vol. 49, no. 1, pp. 1–47, 2014 doi: 10.1613/jair.4135
    [27]
    K. Radinsky, E. Agichtein, E. Gabrilovich, et al, “A word at a time: Computing word relatedness using temporal semantic analysis,” in Proceedings of the 20th International Conference on World Wide Web, Hyderabad, India, pp. 337–346, 2011.
    [28]
    G. Halawi, G. Dror, E. Gabrilovich, et al, “Large-scale learning of word relatedness with constraints,” in Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Beijing, China, pp. 1406–1414, 2012.
    [29]
    H. Rubenstein and J. B. Goodenough, “Contextual correlates of synonymy,” Communications of the ACM, vol. 8, no. 10, pp. 627–633, 1965 doi: 10.1145/365628.365657
    [30]
    T. Luong, R. Socher, and C. D. Manning, “Better word representations with recursive neural networks for morphology,” in Proceedings of the 17th Conference on Computational Natural Language Learning, Sofia, Bulgaria, pp. 104–113, 2013.
    [31]
    F. Hill, R. Reichart, and A. Korhonen, “Simlex-999: Evaluating semantic models with (Genuine) similarity estimation,” Computational Linguistics, vol. 41, no. 4, pp. 665–695, 2015 doi: 10.1162/COLI_a_00237
    [32]
    D. Gerz, I. Vulić, F. Hill, et al, “SimVerb-3500: A large-scale evaluation set of verb similarity,” in Proceedings of 2016 Conference on Empirical Methods in Natural Language Processing, Austin, TX, USA, pp. 2173–2182, 2016.
    [33]
    L. Finkelstein, E. Gabrilovich, Y. Matias, et al, “Placing search in context: The concept revisited,” in Proceedings of the 10th International Conference on World Wide Web, Hong Kong, China, pp. 406–414, 2001.
    [34]
    C. Spearman, “The proof and measurement of association between two things,” The American Journal of Psychology, vol. 15, no. 1, pp. 72–101, 1904 doi: 10.2307/1412159
    [35]
    F. Jelinek, R. L. Mercer, L. R. Bahl, et al, “Perplexity—a measure of the difficulty of speech recognition tasks,” The Journal of the Acoustical Society of America, vol. 62, no. S1, article no. article no. S63, 1977 doi: 10.1121/1.2016299
    [36]
    B. Pang and L. Lee, “Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales,” in Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, Ann Arbor, MI, USA, pp. 115–124, 2005.
    [37]
    M. Q. Hu and B. Liu, “Mining and summarizing customer reviews,” in Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, WA, USA, pp. 168–177, 2004.
    [38]
    B. Pang and L. Lee, “A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts,” in Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics, Barcelona, Spain, pp. 271–278, 2004.
    [39]
    J. Wiebe, T. Wilson, and C. Cardie, “Annotating expressions of opinions and emotions in language,” Language Resources and Evaluation, vol. 39, no. 2–3, pp. 165–210, 2005 doi: 10.1007/s10579-005-7880-9
    [40]
    R. Socher, A. Perelygin, J. Wu, et al, “Recursive deep models for semantic compositionality over a sentiment treebank,” in Proceedings of 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, WA, USA, pp. 1631–1642, 2013.
    [41]
    W. B. Dolan and C. Brockett, “Automatically constructing a corpus of sentential paraphrases,” in Proceedings of the Third International Workshop on Paraphrasing, Jeju Island, Korea, pp. 9–16, 2005.
    [42]
    A. Joulin, E. Grave, P. Bojanowski, et al, “Bag of tricks for efficient text classification,” in Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, Valencia, Spain, pp. 427–431, 2017.
    [43]
    M. Pagliardini, P. Gupta, and M. Jaggi, “Unsupervised learning of sentence embeddings using compositional n-gram features,” in Proceedings of 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans, LA, USA, pp. 528–540, 2018.
    [44]
    F. Hill, K. Cho, and A. Korhonen, “Learning distributed representations of sentences from unlabelled data,” in Proceedings of 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, CA, USA, pp. 1367–1377, 2016.
    [45]
    F. Hill, K. Cho, A. Korhonen, et al, “Learning to understand phrases by embedding the dictionary,” Transactions of the Association for Computational Linguistics, vol. 4, pp. 17–30, 2016 doi: 10.1162/tacl_a_00080
    [46]
    Y. H. Chen, “Convolutional neural network for sentence classification,” Master Thesis, University of Waterloo, Waterloo, Ontario, Canada, 2015.
    [47]
    A. Conneau, D. Kiela, H. Schwenk, et al, “Supervised learning of universal sentence representations from natural language inference data,” in Proceedings of 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, pp. 670–680, 2017.
    [48]
    T. B. Brown, B. Mann, N. Ryder, et al, “Language models are few-shot learners,” in Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, article no. 159, 2020.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(5)  / Tables(6)

    Article Metrics

    Article views (96) PDF downloads(14) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return