Técnicas big data: análisis de textos a gran escala para la investigación científica y periodística

Arcila-Calderón, Carlos and Barbosa-Caro, Eduar and Cabezuelo-Lorenzo, Francisco Técnicas big data: análisis de textos a gran escala para la investigación científica y periodística. El profesional de la información, 2016, vol. 25, n. 4, pp. 623-631. [Journal article (Paginated)]

Text (Research article)
EPI_2016_25_4.pdf.pdf - Published version
Available under License Creative Commons Attribution.

Download (3MB) | Preview

English abstract

Big data techniques: Large-scale text analysis for scientific and journalistic research. This paper conceptualizes the term big data and describes its relevance in social research and journalistic practices. We explain large-scale text analysis techniques such as automated content analysis, data mining, machine learning, topic modeling, and sentiment analysis, which may help scientific discovery in social sciences and news production in journalism. We explain the required e-infrastructure for big data analysis with the use of cloud computing and we asses the use of the main packages and libraries for information retrieval and analysis in commercial software and programming languages such as Python or R

Spanish abstract

Este trabajo conceptualiza el término big data y describe su importancia en el campo de la investigación científica en ciencias sociales y en las prácticas periodísticas. Se explican técnicas de análisis de datos textuales a gran escala como el análisis automatizado de contenidos, la minería de datos (data mining), el aprendizaje automatizado (machine learning), el modelamiento de temas (topic modeling) y el análisis de sentimientos (sentiment analysis), que pueden servir para la generación de conocimiento en ciencias sociales y de noticias en periodismo. Se expone cuál es la infraestructura necesaria para el análisis de big data a través del despliegue de centros de cómputo distribuido y se valora el uso de las principales herramientas para la obtención de información a través de software comerciales y de paquetes de programación como Python o R

Item type: Journal article (Paginated)
Keywords: Datos; Big data; Minería de datos; Aprendizaje automático; Modelamiento de temas; Análisis de sentimientos; Data; Big data; Data mining; Machine learning; Topic modeling; Sentiment analysis.
Subjects: H. Information sources, supports, channels. > HP. e-resources.
H. Information sources, supports, channels. > HQ. Web pages.
I. Information treatment for information services > IM. Open data
L. Information technology and library technology > LK. Software methodologies and engineering.
L. Information technology and library technology > LM. Automatic text retrieval.
Depositing user: Almudena Aguilera-Montenegro
Date deposited: 09 Mar 2019 14:17
Last modified: 09 Mar 2019 14:17
URI: http://hdl.handle.net/10760/34193


"SEEK" links will first look for possible matches inside E-LIS and query Google Scholar if no results are found.

Alpaydin, Ethem (2010). Introduction to machine learning Cambridge/London: The MIT Press. ISBN 978 0262012430

Arora, Sanjeev; Ge, Rong; Halpern, Yoni; Mimno, David; Moitra, Ankur;Sontag, David; Wu, Yichen; Zhu, Michael (2013). “A practical algorithm for topic modeling with provable guarantees”. En: 30th Intl conf on machine learning. pp. 280-288


Blei, David M. (2012). “Topic modeling and digital Humanities”. Journal of digital humanities, v. 2, n. 1, pp. 8-11


Blum, Avrim (2003). “Machine learning theory”. En: FOCS 2003 Procs of the 44th Annual IEEE Symposium on foundations of computer science. Washington DC: IEEE Computer Society, pp. 2-4. ISBN: 0 7695 2040 5

Cai, Keke; Spangler, Scott; Chen, Ying; Zhang, Li (2010). “Leveraging sentiment analysis for topic detection”. En: IEEE/WIC/ACM International Conference on Web Intelligence and Agent Systems: An International Journal, pp. 265-271


Cambria, Erick; Schuller, Björn; Liu, Bing; Wang, Haixun; Havasi, Catherine (2013). “Knowledge-based approaches to concept-level sentiment analysis”. IEEE intelligent systems, v. 28, n. 2, pp. 12-14


Cheng, An-Shou; Fleischmann, Kenneth; Wang, Ping; Oard, Douglas (2008). “Advancing social science research by applying computational linguistics”. En: Procs of the American Society for Information Science and Technology, v. 45, n. 1, pp. 1-12


Dhar, Vasant (2013). “Data science and prediction”. Communications of the ACM, v. 56, n. 12, pp. 64-73



Dietterich, Thomas (2003). “Machine learning”. Nature encyclopedia of cognitive science. London: Macmillan


Domingos, Pedro (2012). “A few useful things to know about machine learning”. Communications of the ACM, v. 55, n. 10, pp. 78-87



Feldman, Ronen (2013). “Techniques and applications for sentiment analysis”. Communications of the ACM, v. 56, n. 4, pp. 82-89


Han, Jiawei; Kamber, Micheline; Pei, Jian (2006). Data mining. Concepts and techniques. San Francisco: Morgan Kaufmann Publishers. ISBN: 978 0123814791


Hand, David; Mannila, Heikki; Smyth, Padhraic (2001). Principles of data mining. Cambridge: MIT Press. ISBN: 978 0262082907


Harwood, Tracy; Garry, Tony (2003). “An overview of content analysis”. The marketing review, v. 3, pp. 479-498


Kalina, Jan (2013). “Highly robust methods in data mining”. Serbian journal of management, v. 8, n. 1, pp. 9-24



Kechaou, Zied; Ben-Ammar, Mohammed; Alimi, Adel(2013). “A multi-agent based system for sentiment analysis of user-generated content”. International journal on artificial intelligence tools, v. 22, n. 2, pp. 1-28


Kelleher, John D.; MacNamee, Brian; D’Arcy, Aoife (2015). Fundamentals of machine learning for predictive data analytics: algorithms, worked examples, and case studies. Londres: MIT Press. ISBN: 978 0262029445

Krippendorff, Klaus. (2004). Content analysis. An introduction to its methodology. Los Angeles: Sage Publications. ISBN: 978 0761915454

Leetaru, Kalev-Hannes (2011). Data mining methods for the content analyst: An introduction to the computational analysis of informational center. New York:Routledge. ISBN: 978 0415895149

Mayer-Schönberger, Viktor; Cukier, Kenneth (2013). Big data. La revolución de los datos masivos. Madrid:Turner. ISBN: 978 8415832102

McCallum, Andrew-Kachites (2002). Mallet: A machine learning for language toolkit


Meena, Arun; Prabhakar, T. V. (2007). Sentence level sentiment analysis in the presence of conjuncts using linguistic analysis. En: Amati, Giambattista; Carpineto, Claudio; Romano, Giovanni (eds.). Advances in information retrieval. 29thEuropean conf on IR research (ECIR), April 2-5, 2007, Rome, Italy, pp. 573-580


Mitchell, Tom (1997). Machine learning. New York: Mc-Graw-Hill. ISBN: 978 0070428072


Murphy, Kevin (2012). Machine learning. A probabilistic perspective. Cambridge/London: The MIT Press. ISBN: 978 0262018029

Murphy, Michael; Barton, John (2014). “From a sea of data to actionable insights: Big data and what it means for lawyers”. Intellectual property & technology law journal, v. 26, n. 3, pp. 8-17


Nunan,Dan; Di-Domenico, Maria-Laura (2013). “Market research and the ethics of big data”. International journal of market research, v. 55, n. 4, pp. 505-520.


Pennacchiotti, Marco; Popescu, Ana-Maria (2011). “A machine learning approach to Twitter user classification”. En: Procs of the 5th Intl conf on weblogs and social media. Menlo Park, California: The Association for the Advancement of Artificial Intelligence Press.


Téllez-Valero, Alberto; Montes, Manuel; Villaseñor-Pineda, Luis (2009). “Using machine learning for extracting information from natural disaster news reports”. Computación y sistemas, v. 13, n. 1, pp. 33-44


Turney, Peter (2002). “Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews”. En: Procs of the 40th Annual meeting of the Association for Computational Linguistics (ACL), Philadelphia, July 2002, pp. 417-424


Verbeke, Mathias; Berendt, Bettina; D’Haenens, Leen; Opgenhaffen, Michaël (2014). “When two disciplines meet, data mining for communication science”. En: 64th Annual meeting of International Communication Association (ICA) conf. Seattle, USA


Vinodhini, Gopalakrishnan; Chandrasekaran, Ramaswamy M. (2012). “Sentiment analysis and opinion mining: A survey”. International journal of advanced research in computer science and software engineering, v. 2, n. 6, pp. 282-292


West, Mark (2001). Theory, method, and practice in computer content analysis. Westport, Connecticut: Ablex Publishing. ISBN: 978 1567505030

White, Marilyn-Domas; Marsh, Emiliy (2006). “Content analysis: A flexible methodology”. Library trends, v. 55, n.1, pp. 22-45



Woody, Alex (2016). “Inside the Panama papers: How cloud analytics made it all possible”. Datanami, 7 April



Downloads per month over past year

Actions (login required)

View Item View Item