Discovering Topics from the Titles of the Indian LIS Theses

Mazumder, Sourav and Barui, Tapan Discovering Topics from the Titles of the Indian LIS Theses. Library Philosophy and Practice (e-journal), 2021, pp. 1-23. [Journal article (Paginated)]

[thumbnail of Article.pdf]
Preview
Text
Article.pdf - Published version

Download (722kB) | Preview

English abstract

A lot of text data is being generated on the web in the form of scholarly articles, doctoral thesis, social media, library databases, and data archives. They are easy to use but complicated to process for research works. That is exactly why text mining is required and topic modeling is one of the most important techniques involved in text mining. In this paper, an attempt has been made to discover topics from the thesis titles (uploaded theses) in the field of Library and Information Science (LIS). For this work, the text data (n=2132) has been obtained from the Shodhganga. Then, topic modeling through Latent Dirichlet Allocation (LDA) has been applied. After employing preliminary investigation, the findings show: State universities of India have the highest contribution of the thesis (78.06%); most theses (106) belong to Karnatak University, and 60.83% of thesis falls under the period 2011-2020. The main results of this paper are (a) The keyword “library” (0.204) has the highest score regarding 10 topics and “Library use” can be inferred as the major topic; (b) the keywords “information”, “technology”, “communication”, “survey”, “comparative”, “plant”, “scientist”, “city”, “support”, and “small” were discussed over 266 titles; and (c) “study”, “university libraries”, and “information-seeking behaviour’ are the most frequent n-grams appeared in the titles. This work can be taken towards future research for more improvement and new applications.

Item type: Journal article (Paginated)
Keywords: Topic modeling, Natural Language Processing, Text Mining, Data Mining
Subjects: L. Information technology and library technology
Depositing user: Sourav Mazumder
Date deposited: 12 Aug 2021 15:27
Last modified: 12 Aug 2021 15:27
URI: http://hdl.handle.net/10760/42342

References

Barde, B. V., & Bainwad, A. M. (2017). An overview of topic modeling methods and tools. 2017 International Conference on Intelligent Computing and Control Systems (ICICCS), 745–750. https://doi.org/10.1109/ICCONS.2017.8250563

Bernard, H. R., & Ryan, G. W. (1998). Text Analysis: Qualitative and Quantitative Methods. https://www.rand.org/pubs/external_publications/EP19980030.html

Bird, S., Klein, E., & Loper, E. (2009). Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. O’Reilly Media. https://books.google.co.in/books?id=KGIbfiiP1i4C

Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research, 3, 993–1022. https://ai.stanford.edu/~ang/papers/jair03-lda.pdf

Buenaño-Fernandez, D., González, M., Gil, D., & Luján-Mora, S. (2020). Text Mining of Open-Ended Questions in Self-Assessment of University Teachers: An LDA Topic Modeling Approach. IEEE Access, 8, 35318–35330. https://doi.org/10.1109/ACCESS.2020.2974983

Chuang, J., Manning, C. D., & Heer, J. (2012). Termite: Visualization Techniques for Assessing Textual Topic Models. Proceedings of the International Working Conference on Ad-vanced Visual Interfaces, 74–77. https://doi.org/10.1145/2254556.2254572

Gan, G., Li, B., Li, X., & Wang, S. (2018). Advanced Data Mining and Applications: 14th International Conference, ADMA 2018, Nanjing, China, November 16–18, 2018, Proceedings. Springer International Publishing. https://books.google.co.in/books?id=pI2wvQEACAAJ

Gandomi, A., & Haider, M. (2015). Beyond the hype: Big data concepts, methods, and analytics. International Journal of Information Management, 35(2), 137–144. https://doi.org/10.1016/j.ijinfomgt.2014.10.007

Goswami, S., Mazumder, S., & Chakrabarty, S. (2021). Text mining of biomedical literature: Discovering new knowledge. Library Philosophy and Practice (e-Journal), 31. https://digitalcommons.unl.edu/libphilprac/4754

Han, X. (2020). Evolution of research topics in LIS between 1996 and 2019: An analysis based on latent Dirichlet allocation topic model. Scientometrics, 125(3), 2561–2595. https://doi.org/10.1007/s11192-020-03721-0

Hong, L., & Davison, B. D. (2010). Empirical Study of Topic Modeling in Twitter. Proceedings of the First Workshop on Social Media Analytics, 80–88. https://doi.org/10.1145/1964858.1964870

Hotho, A., Nürnberger, A., & Paaß, G. (2005). A brief survey of text mining. LDV Forum - GLDV Journal for Computational Linguistics and Language Technology.

Huth, E. J. (1989). The information explosion. Bulletin of the New York Academy of Medicine, 65(6), 647–672. PubMed. https://pubmed.ncbi.nlm.nih.gov/2590751

Ifijeh, G. (2010). Information Explosion and University Libraries: Current Trends and Strategies for Intervention. Chinese Librarianship: An International Electronic Journal. http://eprints.covenantuniversity.edu.ng/5824/#.X_rpaegzZEY

Jelodar, H., Wang, Y., Yuan, C., Feng, X., Jiang, X., Li, Y., & Zhao, L. (2019). Latent Dirichlet allocation (LDA) and topic modeling: Models, applications, a survey. Multimedia Tools and Applications, 78(11), 15169–15211. https://doi.org/10.1007/s11042-018-6894-4

Major, C. H., & Savin-Baden, M. (2012). An Introduction to Qualitative Research Synthesis: Managing the Information Explosion in Social Science Research. Taylor & Francis. https://books.google.co.in/books?id=hXO9ZdzuV30C

McCallum, A. K. (2002). MALLET: A Machine Learning for Language Toolkit. http://mallet.cs.umass.edu/about.php

Miller, A. (2018). Text Mining Digital Humanities Projects: Assessing Content Analysis Capa-bilities of Voyant Tools. Journal of Web Librarianship, 12(3), 169–197. https://doi.org/10.1080/19322909.2018.1479673

Nikolenko, S. I., Koltcov, S., & Koltsova, O. (2017). Topic modelling for qualitative studies. Journal of Information Science, 43(1), 88–102.

https://doi.org/10.1177/0165551515617393

Perkins, J. (2011). Python Text Processing with Nltk 2.0 Cookbook: Lite. Packt Publishing. https://books.google.co.in/books?id=XjXXnWPkd-AC

Rehurek, R., & Sojka, P. (2010). Software Framework for Topic Modelling with Large Corpora. Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, 40–50.

Sievert, C., & Shirley, K. (2014). LDAvis: A method for visualizing and interpreting topics. Proceedings of the Workshop on Interactive Language Learning, Visualization, and In-terfaces, 63–70. https://doi.org/10.3115/v1/W14-3110

Srinivasa-Desikan, B. (2018). Natural Language Processing and Computational Linguistics: A practical guide to text analysis with Python, Gensim, spaCy, and Keras. Packt Publish-ing. https://books.google.co.in/books?id=48RiDwAAQBAJ

Sun, L., & Yin, Y. (2017). Discovering themes and trends in transportation research using topic modeling. Transportation Research Part C: Emerging Technologies, 77, 49–66. https://doi.org/10.1016/j.trc.2017.01.013

Tong, Z., & Zhang, H. (2016). A Text Mining Research Based on LDA Topic Modelling. In Computer Science & Information Technology (Vol. 6, p. 210).

https://doi.org/10.5121/csit.2016.60616

Villars, R. L., Olofson, C. W., & Eastwood, M. (2011). Big data: What it is and why you should care. White Paper, 14, 1–14.

Wang, C., & Blei, D. M. (2011). Collaborative Topic Modeling for Recommending Scientific Articles. Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 448–456. https://doi.org/10.1145/2020408.2020480

Yang, T.-I., Torget, A., & Mihalcea, R. (2011). Topic Modeling on Historical Newspapers. Pro-ceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, 96–104. https://www.aclweb.org/anthology/W11-1513


Downloads

Downloads per month over past year

Actions (login required)

View Item View Item