A Semantic Approach for Outlier Detection in Big Data Streams

Ahmad, Hussien and Dowaji, Salah A Semantic Approach for Outlier Detection in Big Data Streams. Webology, 2019, vol. 16, n. 1. [Journal article (Unpaginated)]

[thumbnail of a186.pdf]
Preview
Text
a186.pdf - Published version

Download (388kB) | Preview

English abstract

In recent years, the world faced a big revolution in data generation and collection technologies. The volume, velocity and veracity of data have changed drastically and led to new types of challenges related to data analysis, modeling and prediction. One of the key challenges is related to the semantic analysis of textual data especially in big data streams settings. The existing solutions focus on either topic analysis or the sentiment analysis. Moreover, the semantic outlier detection over data streams as one of the key problems in data mining and data analysis fields has less focus. In this paper, we introduce a new concept of semantic outlier through which the topic of the textual data is considered as the primary content of the data stream while the sentiment is considered as the context in which the data has been generated and affected. Also, we propose a framework for semantic outlier detection in big data streams which incorporates the contextual detection concepts. The advantage of the proposed concept is that it incorporates both topic and sentiment analysis into one single process; while at the same time the framework enables the implementation of different algorithms and approaches for semantic analysis.

Item type: Journal article (Unpaginated)
Keywords: Outlier detection; Big data; Big data stream; Distributed data streams; Graph data streams; Content-based outlier; Context-aware outlier
Subjects: I. Information treatment for information services > ID. Knowledge representation.
I. Information treatment for information services > IF. Information transfer: protocols, formats, techniques.
Depositing user: Hussien Ahmad
Date deposited: 05 Oct 2019 13:19
Last modified: 05 Oct 2019 13:19
URI: http://hdl.handle.net/10760/39050

References

Ahmad, H., & Dowaji, S. (2018). A novel framework for context-aware outlier detection in big data streams. Journal of Digital Information Management, 16(5), 213-222.

Bravo-Marquez, F., Mendoza, M., & Poblete, B. (2014). Meta-level sentiment models for big social data analysis. Knowledge-Based Systems, 69, 86-99.

Cambria, E., & Melfi, G. (2015, April). Semantic Outlier Detection for Affective Common-Sense Reasoning and Concept-Level Sentiment Analysis. In FLAIRS Conference (pp. 276-281).

Golmohammadi, S. K. (2016). Time series contextual anomaly detection for detecting stock market manipulation. Doctoral dissertation, University of Alberta.

Hamilton, W. L., Clark, K., Leskovec, J., & Jurafsky, D. (2016). Inducing domain-specific sentiment lexicons from unlabeled corpora. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing (Vol. 2016, p. 595). NIH Public Access.

Jurafsky, D., Chahuneau, V., Routledge, B. R., & Smith, N. A. (2014). Narrative framing of consumer sentiment in online restaurant reviews. First Monday, 19(4).

Mahapatra, A., Srivastava, N., & Srivastava, J. (2012). Contextual anomaly detection in text data. Algorithms, 5(4), 469-489.

Manning, C., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S., & McClosky, D. (2014). The Stanford CoreNLP natural language processing toolkit. In Proceedings of 52nd annual meeting of the association for computational linguistics: system demonstrations (pp. 55-60).

Medhat, W., Hassan, A., & Korashy, H. (2014). Sentiment analysis algorithms and applications: A survey. Ain Shams Engineering Journal, 5(4), 1093-1113.

Miller, G. A. (1995). WordNet: a lexical database for English. Communications of the ACM, 38(11), 39-41.

Nagy, A., & Stamberger, J. (2012, April). Crowd sentiment detection during disasters and crises. In Proceedings of the 9th International ISCRAM Conference (pp. 1-9).

Nakanishi, T., Okada, R., Tanaka, Y., Ogasawara, Y., & Ohashi, K. (2017, July). A topic extraction method on the flow of conversation in meetings. In 2017 6th IIAI International Congress on Advanced Applied Informatics (IIAI-AAI) (pp. 351-356). IEEE.

Patil, B., Takmare, S., Mirajkar, R., & Kharade, P. (2017). Mining users rare sequential topic patterns from tweets based on topic extraction. International Research Journal of Engineering and Technology, 4(9), 680-683.

Saif, H., He, Y., Fernandez, M., & Alani, H. (2016). Contextual semantics for sentiment analysis of Twitter. Information Processing & Management, 52(1), 5-19.

Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C. D., Ng, A., & Potts, C. (2013). Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 conference on empirical methods in natural language processing (pp. 1631-1642).

Wan, Y., & Gao, Q. (2015, November). An ensemble sentiment classification system of twitter data for airline services analysis. In 2015 IEEE international conference on data mining workshop (ICDMW) (pp. 1318-1325). IEEE.

Wang, S., & Manning, C. D. (2012, July). Baselines and bigrams: Simple, good sentiment and topic classification. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers-Volume 2(pp. 90-94). Association for Computational Linguistics.

Yoon, J., Kim, J. W., & Jang, B. (2018). DiTeX: Disease-related topic extraction system through internet-based sources. PloS one, 13(8), e0201933.

Zhang, J. (2008). Towards outlier detection for high-dimensional data streams using projected outlier analysis strategy. Doctoral dissertation, Dalhousie University Halifax.

Zhang, X., & He, R. (2018, August). Topic Extraction of Events on Social Media Using Reinforced Knowledge. In International Conference on Knowledge Science, Engineering and Management (pp. 465-476). Springer, Cham.‏


Downloads

Downloads per month over past year

Actions (login required)

View Item View Item