Discovering Topics from the Titles of the Indian LIS Theses

Mazumder, Sourav and Barui, Tapan Discovering Topics from the Titles of the Indian LIS Theses. Library Philosophy and Practice (e-journal), 2021, pp. 1-23. [Journal article (Paginated)]

Article.pdf - Published version

Download (722kB) | Preview

English abstract

A lot of text data is being generated on the web in the form of scholarly articles, doctoral thesis, social media, library databases, and data archives. They are easy to use but complicated to process for research works. That is exactly why text mining is required and topic modeling is one of the most important techniques involved in text mining. In this paper, an attempt has been made to discover topics from the thesis titles (uploaded theses) in the field of Library and Information Science (LIS). For this work, the text data (n=2132) has been obtained from the Shodhganga. Then, topic modeling through Latent Dirichlet Allocation (LDA) has been applied. After employing preliminary investigation, the findings show: State universities of India have the highest contribution of the thesis (78.06%); most theses (106) belong to Karnatak University, and 60.83% of thesis falls under the period 2011-2020. The main results of this paper are (a) The keyword “library” (0.204) has the highest score regarding 10 topics and “Library use” can be inferred as the major topic; (b) the keywords “information”, “technology”, “communication”, “survey”, “comparative”, “plant”, “scientist”, “city”, “support”, and “small” were discussed over 266 titles; and (c) “study”, “university libraries”, and “information-seeking behaviour’ are the most frequent n-grams appeared in the titles. This work can be taken towards future research for more improvement and new applications.

Item type: Journal article (Paginated)
Keywords: Topic modeling, Natural Language Processing, Text Mining, Data Mining
Subjects: L. Information technology and library technology
Depositing user: Sourav Mazumder
Date deposited: 12 Aug 2021 15:27
Last modified: 12 Aug 2021 15:27


Barde, B. V., & Bainwad, A. M. (2017). An overview of topic modeling methods and tools. 2017 International Conference on Intelligent Computing and Control Systems (ICICCS), 745–750.

Bernard, H. R., & Ryan, G. W. (1998). Text Analysis: Qualitative and Quantitative Methods.

Bird, S., Klein, E., & Loper, E. (2009). Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. O’Reilly Media.

Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research, 3, 993–1022.

Buenaño-Fernandez, D., González, M., Gil, D., & Luján-Mora, S. (2020). Text Mining of Open-Ended Questions in Self-Assessment of University Teachers: An LDA Topic Modeling Approach. IEEE Access, 8, 35318–35330.

Chuang, J., Manning, C. D., & Heer, J. (2012). Termite: Visualization Techniques for Assessing Textual Topic Models. Proceedings of the International Working Conference on Ad-vanced Visual Interfaces, 74–77.

Gan, G., Li, B., Li, X., & Wang, S. (2018). Advanced Data Mining and Applications: 14th International Conference, ADMA 2018, Nanjing, China, November 16–18, 2018, Proceedings. Springer International Publishing.

Gandomi, A., & Haider, M. (2015). Beyond the hype: Big data concepts, methods, and analytics. International Journal of Information Management, 35(2), 137–144.

Goswami, S., Mazumder, S., & Chakrabarty, S. (2021). Text mining of biomedical literature: Discovering new knowledge. Library Philosophy and Practice (e-Journal), 31.

Han, X. (2020). Evolution of research topics in LIS between 1996 and 2019: An analysis based on latent Dirichlet allocation topic model. Scientometrics, 125(3), 2561–2595.

Hong, L., & Davison, B. D. (2010). Empirical Study of Topic Modeling in Twitter. Proceedings of the First Workshop on Social Media Analytics, 80–88.

Hotho, A., Nürnberger, A., & Paaß, G. (2005). A brief survey of text mining. LDV Forum - GLDV Journal for Computational Linguistics and Language Technology.

Huth, E. J. (1989). The information explosion. Bulletin of the New York Academy of Medicine, 65(6), 647–672. PubMed.

Ifijeh, G. (2010). Information Explosion and University Libraries: Current Trends and Strategies for Intervention. Chinese Librarianship: An International Electronic Journal.

Jelodar, H., Wang, Y., Yuan, C., Feng, X., Jiang, X., Li, Y., & Zhao, L. (2019). Latent Dirichlet allocation (LDA) and topic modeling: Models, applications, a survey. Multimedia Tools and Applications, 78(11), 15169–15211.

Major, C. H., & Savin-Baden, M. (2012). An Introduction to Qualitative Research Synthesis: Managing the Information Explosion in Social Science Research. Taylor & Francis.

McCallum, A. K. (2002). MALLET: A Machine Learning for Language Toolkit.

Miller, A. (2018). Text Mining Digital Humanities Projects: Assessing Content Analysis Capa-bilities of Voyant Tools. Journal of Web Librarianship, 12(3), 169–197.

Nikolenko, S. I., Koltcov, S., & Koltsova, O. (2017). Topic modelling for qualitative studies. Journal of Information Science, 43(1), 88–102.

Perkins, J. (2011). Python Text Processing with Nltk 2.0 Cookbook: Lite. Packt Publishing.

Rehurek, R., & Sojka, P. (2010). Software Framework for Topic Modelling with Large Corpora. Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, 40–50.

Sievert, C., & Shirley, K. (2014). LDAvis: A method for visualizing and interpreting topics. Proceedings of the Workshop on Interactive Language Learning, Visualization, and In-terfaces, 63–70.

Srinivasa-Desikan, B. (2018). Natural Language Processing and Computational Linguistics: A practical guide to text analysis with Python, Gensim, spaCy, and Keras. Packt Publish-ing.

Sun, L., & Yin, Y. (2017). Discovering themes and trends in transportation research using topic modeling. Transportation Research Part C: Emerging Technologies, 77, 49–66.

Tong, Z., & Zhang, H. (2016). A Text Mining Research Based on LDA Topic Modelling. In Computer Science & Information Technology (Vol. 6, p. 210).

Villars, R. L., Olofson, C. W., & Eastwood, M. (2011). Big data: What it is and why you should care. White Paper, 14, 1–14.

Wang, C., & Blei, D. M. (2011). Collaborative Topic Modeling for Recommending Scientific Articles. Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 448–456.

Yang, T.-I., Torget, A., & Mihalcea, R. (2011). Topic Modeling on Historical Newspapers. Pro-ceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, 96–104.


Downloads per month over past year

Actions (login required)

View Item View Item