N-grams Analysis of Digital Humanities Research During 2017-2021: A Study Based on Scopus Database

Mazumder, Sourav and Barui, Tapan . N-grams Analysis of Digital Humanities Research During 2017-2021: A Study Based on Scopus Database., 2022 In: Resilience, Reflection, and Innovation in Library Services and Practices (10th International Library Information Professional Summit (I-LIPS)). Bookwell, pp. 98-111. [Book chapter]

[img]
Preview
Text
ILIPS2022.pdf

Download (7MB) | Preview

English abstract

In the social sciences, Digital Humanities (DH) is gaining traction. An N-gram is a contiguous sequence of n words or tokens in a text document in computational linguistics and probability. In this study, authors have applied n-grams analysis to understand the context of the DH research from the abstract of 1348 articles (2017-2021). The data was collected from the Scopus database. The authors used Orage for n-grams extraction and visualised the n-grams using the word cloud. The study identified top-10 unigrams, bigrams, and trigrams and constructed the research contexts with human judgement using the frequencies of the n-grams. From the analysis authors observed some major research contexts like DH research, the use of digital technologies, ICT, social networks, cultural heritage, DH projects, and natural language processing. Bigrams were identified as more significant. This study can be helpful for scholars to understand the current research context and usage of terms.

Item type: Book chapter
Keywords: Digital humanities; Text mining; Social computing; N-grams model; Bibliographic data, Scopus
Subjects: B. Information use and sociology of information > BB. Bibliometric methods
L. Information technology and library technology
Depositing user: Sourav Mazumder
Date deposited: 28 May 2022 01:20
Last modified: 28 May 2022 01:20
URI: http://hdl.handle.net/10760/43231

References

Arana-Catania, M., Lier, F.-A. V., Procter, R., Tkachenko, N., He, Y., Zubiaga, A., & Liakata, M. (2021). Citizen Participation and Machine Learning for a Better Democracy. Digital Government: Research and Practice, 2(3), 1–22. https://doi.org/10.1145/3452118

Berry, D. M. (2012). Introduction: Understanding the Digital Humanities. In D. M. Berry (Ed.), Understanding Digital Humanities (pp. 1–20). Palgrave Macmillan UK. https://doi.org/10.1057/9780230371934_1

Berry, D. M. (2019). What are the digital humanities? [Blog]. The British Academy. https://www.thebritishacademy.ac.uk/blog/what-are-digital-humanities/

Bharadwaj, P., & Shao, Z. (2019). Fake News Detection with Semantic Features and Text Mining. International Journal on Natural Language Computing (IJNLC), 08(3), 17. https://doi.org/10.5121/ijnlc.2019.8302

Bouras, C., & Tsogkas, V. (2013). Enhancing News Articles Clustering using Word N-Grams. In DATA (pp. 53–60). http://telematics.upatras.gr/telematics/system/files/publications/1962DATA_2013_12_CR.pdf

Brumann, C. (2015). Cultural Heritage. In J. D. Wright (Ed.), International Encyclopedia of the Social & Behavioral Sciences (Second Edition) (pp. 414–419). Elsevier. https://doi.org/10.1016/B978-0-08-097086-8.12185-3

Demšar, J., Curk, T., Erjavec, A., Gorup, Č., Hočevar, T., Milutinovič, M., Možina, M., Polajnar, M., Toplak, M., Starič, A., Štajdohar, M., Umek, L., Žagar, L., Žbontar, J., Žitnik, M., & Zupan, B. (2013). Orange: Data mining toolbox in python. The Journal of Machine Learning Research, 14(1), 2349–2353.

Floridi, L. (2007). A Look into the Future Impact of ICT on Our Lives. The Information Society, 23(1), 59–64. https://doi.org/10.1080/01972240601059094

Gold, M. K. (2012). Debates in the Digital Humanities. University of Minnesota Press. https://books.google.co.in/books?id=\_6mo2tApzQQC

Han, J., Kamber, M., & Pei, J. (2012). Data Mining Trends and Research Frontiers. In J. Han, M. Kamber, & J. Pei (Eds.), Data Mining (Third Edition) (pp. 585–631). Morgan Kaufmann. https://doi.org/10.1016/B978-0-12-381479-1.00013-7

Han, X. (2020). Evolution of research topics in LIS between 1996 and 2019: An analysis based on latent Dirichlet allocation topic model. Scientometrics, 125(3), 2561–2595. https://doi.org/10.1007/s11192-020-03721-0

Hilburger, C., Langille, D., Nelson, M., Bordini, A., Greenhill, J. A., Dowson, R., & Goddard, L. (2021). Collaborating with GLAM Institutions. Digital Studies / Le Champ Numérique, 11(Special Collection: Student Researchers within the DPN), Article Special Collection: Student Researchers within the DPN. https://doi.org/10.16995/dscn.377

Jurafsky, D., & Martin, J. H. (2021). N-gram Language Models. In Speech and Language Processing (pp. 1–29). https://web.stanford.edu/~jurafsky/slp3/3.pdf

Lehigh University. (2022). Library Guides: Digital Humanities: Projects. Lehigh University Libraries - Library Guides. https://libraryguides.lehigh.edu/digitalhumanities/projects

Lovins, J. B. (1968). Development of a Stemming Algorithm. MASSACHUSETTS INST OF TECH CAMBRIDGE ELECTRONIC SYSTEMS LAB. https://apps.dtic.mil/sti/citations/AD0735504

Mazumder, S., & Barui, T. (2021). Discovering Topics from the Titles of the Indian LIS Theses. Library Philosophy and Practice (e-Journal). https://digitalcommons.unl.edu/libphilprac/5924

Münster, S. (2019). Digital Heritage as a Scholarly Field-Topics, Researchers, and Perspectives from a Bibliometric Point of View. Journal on Computing and Cultural Heritage, 12(3), 22:1-22:27. https://doi.org/10.1145/3310012

Murphy, R. O., Ackermann, K. A., & Handgraaf, M. J. J. (2011). Measuring social value orientation. Judgment and Decision Making, 6(8), 771–781.

Pine, L. G. (2021). Genealogy. Encyclopedia Britannica. https://www.britannica.com/topic/genealogy

Robertson, A. M., & Willett, P. (1998). Applications of n‐grams in textual information systems. Journal of Documentation, 54(1), 48–67. https://doi.org/10.1108/EUM0000000007161

Soper, D. S., & Turel, O. (2012). Who Are We? Mining Institutional Identities Using n-grams. 2012 45th Hawaii International Conference on System Sciences, 1107–1116. https://doi.org/10.1109/HICSS.2012.642

Su, F., & Zhang, Y. (2021). Research output, intellectual structures and contributors of digital humanities research: A longitudinal analysis 2005–2020. Journal of Documentation, 78(3), 673–695. https://doi.org/10.1108/JD-11-2020-0199

Svensson, P. (2010). The Landscape of Digital Humanities. Digital Humanities, 4(1). http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-37513

Wang, X., McCallum, A., & Wei, X. (2007). Topical N-Grams: Phrase and Topic Discovery, with an Application to Information Retrieval. Seventh IEEE International Conference on Data Mining (ICDM 2007), 697–702. https://doi.org/10.1109/ICDM.2007.86

Wang, X., Tan, X., & Li, H. (2020). The Evolution of Digital Humanities in China. Library Trends, 69(1), 7–29. https://doi.org/10.1353/lib.2020.0029

Warwick, C., Terras, M., & Nyhan, J. (2012). Digital Humanities in Practice. Facet Publishing. https://books.google.co.in/books?id=hPhnDQAAQBAJ

Welbers, K., Van Atteveldt, W., & Benoit, K. (2017). Text Analysis in R. Communication Methods and Measures, 11(4), 245–265. https://doi.org/10.1080/19312458.2017.1387238

Wikipedia contributors. (2022). N-gram. In Wikipedia. https://en.wikipedia.org/w/index.php?title=N-gram&oldid=1073019765

Wright, B. (2019). Integrating Social Science & Data Science [Video]. https://doi.org/10.4135/9781526491503

Wyskwarski, M. (2020). An attempt to determine the scope of duties of the business analyst – application of text mining analysis. Zeszyty Naukowe. Organizacja i Zarządzanie / Politechnika Śląska, z. 148. https://doi.org/10.29119/1641-3466.2020.148.59

Zhou, P. X. (2021). Towards a Sustainable Infrastructure for the Preservation of Cultural Heritage and Digital Scholarship. Data and Information Management, 5(2), 253–261. https://doi.org/10.2478/dim-2020-0052


Downloads

Downloads per month over past year

Actions (login required)

View Item View Item