Veri Tabanlarında Bilgi Keşfine Formel Bir Yaklaşım:Kısım I: Eşleştirme Sorguları ve Algoritmalar

Sever, Hayri and Oğuz, Buket Veri Tabanlarında Bilgi Keşfine Formel Bir Yaklaşım:Kısım I: Eşleştirme Sorguları ve Algoritmalar. Bilgi Dünyası, 2002, vol. 3, n. 2, pp. 173-204. [Journal article (Paginated)]

[img]
Preview
PDF
173-204.pdf

Download (394kB) | Preview

English abstract

In the last two decades, we have witnessed an explosive growth in our capabilities to both collect and store data, and generate even more data by further computer processing. In fact, it is estimated that the amount of information in the world doubles every 20 months. Our inability to interpret and digest these data, as readily as they are accumulated, has created a need for a new generation of tools and techniques for automated and intelligent database analysis. Consequently, the discipline of knowledge discovery in databases (KDD), which deals with the study of such tools and techniques, has evolved into an important and active area of research because of theoretical challenges and practical applications associated with the problem of discovering (or extracting) valuable, interesting and previously unknown knowledge from very large real-world databases. Many aspects of KDD have been investigated in several related fields such as database systems, machine learning, intelligent information systems, statistics, and expert systems. In the first part of our study (Part I), we discuss the fundamental issues of KDD as well as its process oriented view with a special emphasis on modelling association rules. In the second part (Part II), a follow-up study of this article, association queries will be modelled by formal concept analysis.

Turkish abstract

Son yirmi yıldır veri toplama ve saklama kapasitesinde çok ani büyümeye şahit olmaktayız. Öyleki, bir bilgisayarın işleyebileceği veriden daha fazlası üretilmektedir. Gerçekte bu durum, dünyadaki bilgi miktarının her 20 ayda bir ikiye katlandığı varsayımı ile uygunluk arz etmektedir. Veri biriktirilmesi ile eş zamanlı olarak onu yorumlamadaki ve özümsemedeki insanoğlunun yetersizliği, özdevimli ve akıllı veri tabanı analizi için, yeni nesil araçlarına ve tekniklerine olan ihtiyacı doğurdu. Sonuç olarak, büyük hacimli veri tabanlarından değerli, ilginç ve önceden bilinmeyen bilgiyi keşfetme (veya çıkarma) problemi ile eşleştirilen pratik uygulamalar ve olası çözümlerin kuramsal zorlukları nedeni ile, veri tabanlarında bilgi keşfi (VTBK) önemli ve aktif bir araştırma alanına evrimleşti. Veri tabanı sistemleri, makine öğrenimi, akıllı bilgi sistemleri, istatistik ve uzman sistemler gibi birbirleri ile yakından ilişkili alanlarca VTBK’nın birçok yönü incelendi. Çalışmamızın ilk kısmında (Kısım I), VTBK’ya süreç esaslı bakış açısı getireceğiz ve onun temel sorunlarını adresleyeceğiz. Açık olarak, VTBK disiplinine taban oluşturan gerçek-hayat verilerinin karakteristik özellikleri verilecek ve takiben veri madenciliği ve özelinde eşleştirme sorguları işlenecektir. Eşleştirme sorgularına getirilen tipik bir çözüm açıklanacak ve etkinlik ölçütleri değerlendirilecektir. Bu makalenin devamı olarak yayınlanacak olan ikinci kısımda ise (Kısım II), biçimsel kavram analizi aracılığı ile eşleştirme kuralları modellenmesine özgün yaklaşımımız sunulacaktır.

Item type: Journal article (Paginated)
Keywords: Biçimsel kavram analizi, eşleştirme sorguları, bağımlılık ilişkileri, kavram yapıları, formal concept analysis, association query, dependency relationships, concept structures
Subjects: H. Information sources, supports, channels. > HL. Databases and database Networking.
L. Information technology and library technology
L. Information technology and library technology > LN. Data base management systems.
Depositing user: Kamil Comlekci
Date deposited: 26 Mar 2006
Last modified: 02 Oct 2014 12:02
URI: http://hdl.handle.net/10760/7348

References

Agrawal, R., Imielinski, T. ve Swami, A. (1993). Mining association rules between sets of items in large databases. P. Buneman ve S. Jajodia (eds.). ACM SIGMOD Conference on Management of Data içinde (s. 207-216). Washington, DC: ACM Press.

Agrawal, R. ve Srikant, R. (1994). Fast algorithms for Mining Association Rules. J.B. Bocca, M. Jarke ve C. Zaniola (eds.). 20th International Conference on Very Large Databases içinde (s. 487-499). Santiago de Chile: Morgan Kaufmann.

Agrawal, R. ve Srikant, R. (1995). Mining sequential patterns. P.S. Yu ve A.S.P. Chen (eds.), 11st International Conference on Data Engineering içinde (s. 3-14). Taipei: IEEE Computer Society Press.

Ali, K., Manganaris, S., ve Srikant, R. (1997). Partial classification using association rules. D. Heckerman, H. Manila ve D. Pregibon (eds.). 3rd International Conference on Knowledge Discovery in Databases and Data Mining içinde (s. 115-118) , Newport Beach, CA: AAAI Press.

Almuallim, H. ve Dietterich, T. (1991). Learning with many irrelevant features. 3rd Conference of American Association on Artificial Intelligence içinde (s. 547-552). Menlo Park, CA: AAAI Press.

Baim, P. (1988). A method for attribute selection in inductive learning systems. IEEE Transactions on Pattern Analysis and Machine Intelligence, 10(4): 888-896.

Braynt, R.E. ve O’Hallaron, D.R. (2003). Computer systems: A programmer’s perspective. New Jersey: Prentice Hall.

Chan, K.C.C. ve Wong, A.K.C. (1991). A statistical technique for extracting classificatory knowledge from databases. G. Piatetsky-Shapiro ve W. J. Frawley (eds.). Knowledge discovery in databases içinde (s. 107-123). Cambridge, MA: AAAI/MIT.

Chen, M.S., Han, J. ve Yu, P.S. (1996). Data mining: An overview from a database perspective. IEEE Transactions on Knowledge and Data Engineering, 8(6): 866-883.

Chaudhuri, S. ve Dayal, U. (1997). An overview of data warehousing and OLAP technology. ACM SIGMOD Record, 26(1): s. 65-74.

Ching, J.Y., Wong, A.K.C. ve Chan, K.C.C. (1995). Class-dependent discretization for inductive learning from continuous and mixed mode data. IEEE Transactions on Knowledge and Data Engineering, 17(7): 641-651.

Choubey, S.K., Deogun, J.S., Raghavan, V.V. ve Sever, H. (1996). A comparison of feature selection algorithms in the context of rough classifiers. The 5th IEEE International Conference on Fuzzy Systems içinde (2, s. 1122-1128). New Orleans, LA: IEEE Computer Society Press.

Cormen, T.H., Leiserson, C.E. ve Rivest, R. (1991). Introduction to algorithms. (2nd ed.). New York, NY: McGraw Hill.

Deogun, J.S., Raghavan, V.V., Sarkar, A. ve Sever, H. (1997). Data mining: Trends in research and development. T.Y. Lin ve N. Cercone (eds.). Rough sets and data mining: Analysis for imprecise data içinde (s. 9-45). New York: Kluwer Academic Publishers.

Deogun, J.S., Raghavan, V.V. ve Sever, H. (1995). Exploiting upper approximations in the rough set methodology. U. Fayyad ve R. Uthurusamy (eds.). The First International Conference on Knowledge Discovery and Data Mining içinde (s. 69-74). Montreal, Quebec: AAAI Press.

Deogun, J.S., Raghavan V.V. ve Sever, H. (1998). Association queries and formal concept analysis. The Sixth International Workshop on Rough Sets, Data Mining and Granular Computing (in conjunction with JCIS'98), Research Triangle Park, NC.

Elder, J.F. ve Pregibon, D. (1995). A statistical perspective on KDD. U. Fayyad ve R. Uthurusamy (eds.). The First International Conference on Knowledge Discovery and Data Mining içinde (s. 87-93). Montreal, Quebec: AAAI Press.

Fayyad, U.M. ve Irani, K.B. (1993). Multi interval discretization of continuous attributes for classification learning. R. Bajcsy, (ed.). 13th International Joint Conference on Artificial Intelligence içinde (s. 1022-1027). New York, NY: Morgan Kauffmann Publishers, Inc..

Fayyad, U.M., Piatetsky-Shapiro, G. ve Smyth, P. (1996a). The KDD process for extracting useful knowledge from volumes of data. Communications of ACM, 39(11): 27-34.

Fayyad, U.M., Piatetsky-Shapiro, G. ve Uthurusamy, R. (1996b). Advances in knowdedge discovery and data mining. Cambridge, MA: MIT Press.

Foskett, D.J. (1997). Thesaurus. K.S. Jones ve P. Willet (eds.). Readings in information retrieval içinde (s. 111-134). New York, NY: Morgan Kaufmann Publishers, Inc.

Frawley, W.J., Piatetsky-Shapiro, G. ve Matheus, C.J. (1991). Knowledge discovery databases: An overview. G. Piatetsky-Shapiro ve W.J. Frawley (eds.). Knowledge discovery in databases içinde (s. 1-27). Cambridge, MA: AAAI/MIT.

Grzymala-Busse, J. W. (1991). On the unknown attribute values in learning from examples. Z. W. Ras ve M. Zemankowa (eds.). Methodologies for intelligent systems: Lecture notes içinde (AI, c. 542, s. 368-377). New York: Springer-Verlag.

Grzymala-Busse, D.M. ve Grzymala-Busse, J.W. (1993). Comparison of machine learning and knowledge acquisition methods of rule induction based on rough sets. The International Workshop on Rough Sets and Knowledge Discovery içinde (s. 297-306), Banff, Alberta.

Han, J., Cai, Y. ve Cercone, N. (1992). Knowledge discovery in databases: An attribute-oriented approach. 18th International Conference on Very Large Databases içinde (s. 547-559). Vancouver, British Columbia: Morgan Kaufmann.

Hulten, G., Spencer, L. ve Domingos, P. (2001). Mining time-changing data streams. 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining içinde (s. 97-106). San Fransisco, CA: ACM Press.

Hurson, A.R. ve Bright, M.W. (1991). Multidatabase systems: An advanced concept in handling distributed data. M.C. Yovits, (ed.). Advances in computers içinde (c. 32. s. 149-200). Boston, MA: Academic Press.

Kira, K. ve Rendell, L. (1992). The feature selection problem: Traditional methods and a new algorithm. W.R. Swartout, (ed.). Proceedings of the 10th National Conference of American Association on Artificial Intelligence, San Jose, CA, July 12-16 1992 içinde (s. 129-134). Cambridge, MA: AAAI/MIT Press.

Kryszkiewicz, M. (1998). Representative association rules. X. Wu, K. Ramamohanarao, K.B. Korb (eds.). Research and Development in Knowledge Discovery and Data Mining, Second Pacific-Asia Conference, PAKDD-98, Melbourne, Australia: Lecture notes in computer science içinde (c. 1394, s. 198-209). New York, NY: Springer.

Lee, S. K. (1992). An extended relational database model for uncertain and imprecise information. 18th International Conference on Very Large Databases içinde (s. 211-218). Vancouver, British Columbia.

Luba, T. ve Lasocki, R. (1994). On unknown attribute values in functional dependencies. T.Y. Lin (ed.). The International Workshop on Rough Sets and Soft Computing içinde (s. 490-497). San Jose, CA: The Society for Computer Simulation.

Matheus, C.J., Chan, P.K. ve Piatetsky-Shapiro, G. (1993). Systems for knowledge discovery in databases, IEEE Transactions on Knowledge and Data Engineering, 5(6): 903-912.

Michalski, R.S. ve Stepp, R.E. (1983). Learning from observation: Conceptual clustering. R. Michalski, J. Carbonell ve T. Mitchell (eds.). Machine learning: An artificial intelligence approach içinde (c.1, s. 331-363). San Mateo, CA: Morgan Kauffmann Inc.

Park, J. S., Chen, M.S. ve Yu, P.S. (1995). An effective Hash Based Algorithm for Mining Association Rules. ACM SIGMOD Conference on Management of Data içinde (s. 175-186). New York, NY: ACM Press.

Paton, N.W. ve Diaz, O. (1999). Active database systems. Computing Surveys, 31(1): s. 63-103.

Pawlak, Z. (1984). Rough classification. International Journal of Man-Machine Studies, 20: 469-483.

Pawlak, Z., Slowinski, K. ve Slowinski, R. (1986). Rough classification of patients after highly selective vagotomy for duodenal ulcer. International Journal of Man-Machine Studies, 24: 413-433.

Piatetsky-Shapiro, G. (1991). Discovery, analysis, and presentation of strong rules. G. Piatetsky-Shapiro ve W.J. Frawley (eds.). Knowledge discovery in databases içinde (s. 229-238). Cambridge: MA: AAAI/MIT Press.

Porter, J. (1998). Disk Trend 1998 Report. [Çevrim içi]. Elektronik adres: http://www.disktrend.com/pdf/portrpkg.pdf [2003-03-19].

Quinlan, J. R. (1986a). Induction of decision trees. Machine Learning, l: 81-106.

Quinlan, J. R. (1986b). The effect of noise on concept learning. Michalski, R. J. Carbonell, ve T. Mitchell (eds.). Machine learning: An artificial intelligence approach içinde (c. 2, s. 149-166). San Mateo, CA: Morgan Kauffmann Inc.

Raghavan, V.V., Deogun, J.S. ve Sever, H. (1998). Data mining: Trends and issues. Journal of American Society for Information Science and Technology, 49(5): 397-402.

Raghavan, V.V. ve Sever, H. (1994). The State of rough sets for database mining applications, T.Y. Lin (ed.). 23rd Computer Science Conference Workshop on Rough Sets and Database Mining içinde (s. 1-11). San Jose, CA.

Raghavan, V.V., Sever, H. ve Deogun, J.S. (1994). A system architecture for database mining applications. W.P. Ziarko, (ed.). Fuzzy Sets and Knowledge Discovery Workshops in Computing Series içinde (s. 82-89). Berlin: Springer-Verlag.

Rastogi, R. ve Shim, K. (1999). Scalable algorithms for mining large databases. 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining içinde (s. 73-140). San Diego: ACM Press.

Savasere, A., Omiecinski, E. ve Navathe, S. (1995). An efficient algorithm for mining association rules in large databases. 21st International Conference on Very Large Databases, VLDB’95 içinde (s. 134-145). Zurich: Morgan Kaufmann.

Sever, H., Raghavan, V.V. ve Johnsten, T.D. (1998). The State of rough sets for knowledge discovery in databases. S. Sivasundaram (ed.). ICNPAA-98: Second International Conference on Nonlinear Problems in Aviation and Aerospace, Daytona Beach, Florida, USA içinde (cl.2, s.673-680). Cambridge: Europen Conference Publications.

Silberschatz, A., Stonebraker, M. ve Ullman, J.D. (1990). Database systems: achievements and opportunities, Technical Report: TR-90-22, University of Texas at Austin.

Simoudis, E. (1996). Reality check for data mining. IEEE Expert: Intelligent Systems and Their Applications, 11(5): 26-33.

Srikant, R. ve Agrawal, R. (1995). Mining generalized association rules. 21st International Conference on Very Large Databases içinde (s. 407-419). Zurich: Morgan Kaufmann.

Srikant, R. ve Agrawal, R. (1996). Mining quantitative association rules in large relational tables. The ACM SIGMOD Conference on Management of Data içinde (s. 1-12). Montreal: ACM Press.

Thiesson, B. (1995). Accelerated quantification of Bayesian networks with incomplete data. U. Fayyad ve R. Uthurusamy (eds.). The First International Conference on Knowledge Discovery and Data Mining içinde (s. 306-311). Montreal: AAAI Press.

Tolun, M.R., Sever, H. ve Uludag, M. (1998). Improved rule discovery performance on uncertainty. X. Wu, K. Ramamohanarao, K.B. Korb (eds.). Research and Development in Knowledge Discovery and Data Mining, Second Pacific-Asia Conference, PAKDD-98, Melbourne. Lecture Notes in Computer Science içinde (c. 1394: s. 310-321). Melbourne: Springer.

Tonta, Y., Bitirim, Y. ve Sever, H. (2002). Türkçe arama motorlarında performans değerlendirme. Ankara: Total Bilisim Limited.

Uthurusamy, R., Fayyad, U.M. ve Spangler, S. (1991). Learning useful rules from inconclusive data. G. Piatetsky-Shapiro ve W.J. Frawley (eds.). Knowledge discovery in databases içinde (s. 141-157). Cambridge, MA: AAAI/MIT.

Weiss, S.M. ve Kulikowski, C.A. (1991). Computer systems that learn: classification and prediction methods from statistics, neural nets, machine learning, and expert systems. New York, NY: Morgan Kaufman.

Wille, R. (1982). Restructuring lattice theory: An approach based on hierarchies on concepts. I. Rival (ed.). Ordered sets içinde (s. 445-470). Dordrecht-Boston: D. Reidel Publishing Company.

Zaki, M.J. ve Ogihara, M. (1998). Theorical foundations of association rules. 3rd SIGMOD'98 Workshop on Research Issues in Data Mining and Knowledge Discovery (DMKD) içinde (s. 7:1-7:8). Seattle, WA, June 1998.

Zhong, N. ve Ohsuga, S. (1994). Discovering concept clusters by decomposing databases. Data and Knowledge Engineering, 12: 223-244.

Ziarko, W. (1991). The discovery, analysis, and representation of data dependencies in databases. G. Piatetsky-Shapiro ve W. J. Frawley (eds.). Knowledge discovery in databases. Cambridge: MA: AAAI/MIT.


Downloads

Downloads per month over past year

Actions (login required)

View Item View Item