Técnicas big data: análisis de textos a gran escala para la investigación científica y periodística

Arcila-Calderón, Carlos, Barbosa-Caro, Eduar and Cabezuelo-Lorenzo, Francisco Técnicas big data: análisis de textos a gran escala para la investigación científica y periodística. El profesional de la información, 2016, vol. 25, n. 4, pp. 623-631. [Journal article (Paginated)]

[thumbnail of Research article]
Preview
Text (Research article)
EPI_2016_25_4.pdf.pdf - Published version
Available under License Creative Commons Attribution.

Download (3MB) | Preview

English abstract

Big data techniques: Large-scale text analysis for scientific and journalistic research. This paper conceptualizes the term big data and describes its relevance in social research and journalistic practices. We explain large-scale text analysis techniques such as automated content analysis, data mining, machine learning, topic modeling, and sentiment analysis, which may help scientific discovery in social sciences and news production in journalism. We explain the required e-infrastructure for big data analysis with the use of cloud computing and we asses the use of the main packages and libraries for information retrieval and analysis in commercial software and programming languages such as Python or R

Spanish abstract

Este trabajo conceptualiza el término big data y describe su importancia en el campo de la investigación científica en ciencias sociales y en las prácticas periodísticas. Se explican técnicas de análisis de datos textuales a gran escala como el análisis automatizado de contenidos, la minería de datos (data mining), el aprendizaje automatizado (machine learning), el modelamiento de temas (topic modeling) y el análisis de sentimientos (sentiment analysis), que pueden servir para la generación de conocimiento en ciencias sociales y de noticias en periodismo. Se expone cuál es la infraestructura necesaria para el análisis de big data a través del despliegue de centros de cómputo distribuido y se valora el uso de las principales herramientas para la obtención de información a través de software comerciales y de paquetes de programación como Python o R

Item type: Journal article (Paginated)
Keywords: Datos; Big data; Minería de datos; Aprendizaje automático; Modelamiento de temas; Análisis de sentimientos; Data; Big data; Data mining; Machine learning; Topic modeling; Sentiment analysis.
Subjects: H. Information sources, supports, channels. > HP. e-resources.
H. Information sources, supports, channels. > HQ. Web pages.
I. Information treatment for information services > IM. Open data
L. Information technology and library technology > LK. Software methodologies and engineering.
L. Information technology and library technology > LM. Automatic text retrieval.
Depositing user: Almudena Aguilera-Montenegro
Date deposited: 09 Mar 2019 14:17
Last modified: 09 Mar 2019 14:17
URI: http://hdl.handle.net/10760/34193

References

Alpaydin, Ethem (2010). Introduction to machine learning Cambridge/London: The MIT Press. ISBN 978 0262012430

Arora, Sanjeev; Ge, Rong; Halpern, Yoni; Mimno, David; Moitra, Ankur;Sontag, David; Wu, Yichen; Zhu, Michael (2013). “A practical algorithm for topic modeling with provable guarantees”. En: 30th Intl conf on machine learning. pp. 280-288

http://jmlr.org/proceedings/papers/v28/arora13.html

Blei, David M. (2012). “Topic modeling and digital Humanities”. Journal of digital humanities, v. 2, n. 1, pp. 8-11

http://journalofdigitalhumanities.org/2-1/topic-modeling-and-digital-humanities-by-david-m-blei

Blum, Avrim (2003). “Machine learning theory”. En: FOCS 2003 Procs of the 44th Annual IEEE Symposium on foundations of computer science. Washington DC: IEEE Computer Society, pp. 2-4. ISBN: 0 7695 2040 5

Cai, Keke; Spangler, Scott; Chen, Ying; Zhang, Li (2010). “Leveraging sentiment analysis for topic detection”. En: IEEE/WIC/ACM International Conference on Web Intelligence and Agent Systems: An International Journal, pp. 265-271

http://www.csce.uark.edu/~sgauch/5013NLP/S13/hw/Chris.pdfhttp://dx.doi.org/10.1109/WIIAT.2008.188

Cambria, Erick; Schuller, Björn; Liu, Bing; Wang, Haixun; Havasi, Catherine (2013). “Knowledge-based approaches to concept-level sentiment analysis”. IEEE intelligent systems, v. 28, n. 2, pp. 12-14

http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6547971http://dx.doi.org/10.1109/MIS.2013.45

Cheng, An-Shou; Fleischmann, Kenneth; Wang, Ping; Oard, Douglas (2008). “Advancing social science research by applying computational linguistics”. En: Procs of the American Society for Information Science and Technology, v. 45, n. 1, pp. 1-12

http://www.asis.org/Conferences/AM08/proceedings/posters/55_poster.pdf

Dhar, Vasant (2013). “Data science and prediction”. Communications of the ACM, v. 56, n. 12, pp. 64-73

https://archive.nyu.edu/bitstream/2451/31553/2/Dhar-DataScience.pdf

http://dx.doi.org/10.1145/2500499

Dietterich, Thomas (2003). “Machine learning”. Nature encyclopedia of cognitive science. London: Macmillan

http://eecs.oregonstate.edu/~tgd/publications/nature-ecs-machine-learning.ps.gz

Domingos, Pedro (2012). “A few useful things to know about machine learning”. Communications of the ACM, v. 55, n. 10, pp. 78-87

http://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf

http://dx.doi.org/10.1145/2347736.2347755

Feldman, Ronen (2013). “Techniques and applications for sentiment analysis”. Communications of the ACM, v. 56, n. 4, pp. 82-89

http://dx.doi.org/10.1145/2436256.2436274

Han, Jiawei; Kamber, Micheline; Pei, Jian (2006). Data mining. Concepts and techniques. San Francisco: Morgan Kaufmann Publishers. ISBN: 978 0123814791

http://goo.gl/5zTYb6

Hand, David; Mannila, Heikki; Smyth, Padhraic (2001). Principles of data mining. Cambridge: MIT Press. ISBN: 978 0262082907

ftp://gamma.sbin.org/pub/doc/books/Principles_of_Data_Mining.pdf

Harwood, Tracy; Garry, Tony (2003). “An overview of content analysis”. The marketing review, v. 3, pp. 479-498

http://dx.doi.org/10.1362/146934703771910080

Kalina, Jan (2013). “Highly robust methods in data mining”. Serbian journal of management, v. 8, n. 1, pp. 9-24

http://www.sjm06.com/SJM%20ISSN1452-4864/8_1_2013_May_1_132/8_1_2013_9-24.pdf

http://dx.doi.org/10.5937/sjm8-3226

Kechaou, Zied; Ben-Ammar, Mohammed; Alimi, Adel(2013). “A multi-agent based system for sentiment analysis of user-generated content”. International journal on artificial intelligence tools, v. 22, n. 2, pp. 1-28

http://dx.doi.org/10.1142/S0218213013500048

Kelleher, John D.; MacNamee, Brian; D’Arcy, Aoife (2015). Fundamentals of machine learning for predictive data analytics: algorithms, worked examples, and case studies. Londres: MIT Press. ISBN: 978 0262029445

Krippendorff, Klaus. (2004). Content analysis. An introduction to its methodology. Los Angeles: Sage Publications. ISBN: 978 0761915454

Leetaru, Kalev-Hannes (2011). Data mining methods for the content analyst: An introduction to the computational analysis of informational center. New York:Routledge. ISBN: 978 0415895149

Mayer-Schönberger, Viktor; Cukier, Kenneth (2013). Big data. La revolución de los datos masivos. Madrid:Turner. ISBN: 978 8415832102

McCallum, Andrew-Kachites (2002). Mallet: A machine learning for language toolkit

http://mallet.cs.umass.edu

Meena, Arun; Prabhakar, T. V. (2007). Sentence level sentiment analysis in the presence of conjuncts using linguistic analysis. En: Amati, Giambattista; Carpineto, Claudio; Romano, Giovanni (eds.). Advances in information retrieval. 29thEuropean conf on IR research (ECIR), April 2-5, 2007, Rome, Italy, pp. 573-580

http://dx.doi.org/10.1007/978-3-540-71496-5_53

Mitchell, Tom (1997). Machine learning. New York: Mc-Graw-Hill. ISBN: 978 0070428072

http://personal.disco.unimib.it/Vanneschi/McGrawHill_-_Machine_Learning_-Tom_Mitchell.pdf

Murphy, Kevin (2012). Machine learning. A probabilistic perspective. Cambridge/London: The MIT Press. ISBN: 978 0262018029

Murphy, Michael; Barton, John (2014). “From a sea of data to actionable insights: Big data and what it means for lawyers”. Intellectual property & technology law journal, v. 26, n. 3, pp. 8-17

http://www.pillsburylaw.com/publications/from-a-sea-of-data-to-actionable-insights

Nunan,Dan; Di-Domenico, Maria-Laura (2013). “Market research and the ethics of big data”. International journal of market research, v. 55, n. 4, pp. 505-520.

http://dx.doi.org/10.2501/IJMR-2013-015

Pennacchiotti, Marco; Popescu, Ana-Maria (2011). “A machine learning approach to Twitter user classification”. En: Procs of the 5th Intl conf on weblogs and social media. Menlo Park, California: The Association for the Advancement of Artificial Intelligence Press.

https://www.aaai.org/ocs/index.php/ICWSM/ICWSM11/paper/download/2886/3262

Téllez-Valero, Alberto; Montes, Manuel; Villaseñor-Pineda, Luis (2009). “Using machine learning for extracting information from natural disaster news reports”. Computación y sistemas, v. 13, n. 1, pp. 33-44

http://www.scielo.org.mx/pdf/cys/v13n1/v13n1a4.pdf

Turney, Peter (2002). “Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews”. En: Procs of the 40th Annual meeting of the Association for Computational Linguistics (ACL), Philadelphia, July 2002, pp. 417-424

http://www.aclweb.org/anthology/P02-1053.pdf

Verbeke, Mathias; Berendt, Bettina; D’Haenens, Leen; Opgenhaffen, Michaël (2014). “When two disciplines meet, data mining for communication science”. En: 64th Annual meeting of International Communication Association (ICA) conf. Seattle, USA

https://lirias.kuleuven.be/handle/123456789/436424

Vinodhini, Gopalakrishnan; Chandrasekaran, Ramaswamy M. (2012). “Sentiment analysis and opinion mining: A survey”. International journal of advanced research in computer science and software engineering, v. 2, n. 6, pp. 282-292

http://www.ijarcsse.com/docs/papers/June2012/Volume_2_issue_6/V2I600263.pdf

West, Mark (2001). Theory, method, and practice in computer content analysis. Westport, Connecticut: Ablex Publishing. ISBN: 978 1567505030

White, Marilyn-Domas; Marsh, Emiliy (2006). “Content analysis: A flexible methodology”. Library trends, v. 55, n.1, pp. 22-45

https://www.ideals.illinois.edu/bitstream/handle/2142/3670/whitemarch551.pdf?sequence=2

http://dx.doi.org/10.1353/lib.2006.0053

Woody, Alex (2016). “Inside the Panama papers: How cloud analytics made it all possible”. Datanami, 7 April

http://www.datanami.com/2016/04/07/inside-panama-papers-cloud-analytics-made-possible


Downloads

Downloads per month over past year

Actions (login required)

View Item View Item