Unified mathematical treatment of complex cascaded bipartite networks: The case of collections of journal papers

Morris, Steven A. Unified mathematical treatment of complex cascaded bipartite networks: The case of collections of journal papers., 2005 PhD dissertation thesis, Oklahoma State University (US). [Thesis]

[thumbnail of 2005_morris_thesis.pdf]
Preview
PDF
2005_morris_thesis.pdf

Download (4MB) | Preview

English abstract

In this study, a mathematical treatment is proposed for analysis of entities and relations among entities in complex networks consisting of cascaded bipartite networks. This treatment is applied to the case of collections of journal papers. In this case, entities are distinguishable objects and concepts, such as papers, references, paper authors, reference authors, paper journals, reference journals, institutions, terms, and term definitions. Relations are associations between entity-types such as papers and the references they cite, or paper authors and the papers they write. An entity-relationship model is introduced that explicitly shows direct links between entity-types and possible useful indirect relations. From this a matrix formulation and generalized matrix arithmetic are introduced that allow easy expression of relations between entities and calculation of weights of indirect links and co-occurrence links. Occurrence matrices, equivalence matrices, membership matrices and co-occurrence matrices are described. A dynamic model of growth describes recursive relations in occurrence and co-occurrence matrices as papers are added to the paper collection. Graph theoretic matrices are introduced to allow information flow studies of networks of papers linked by their citations. Similarity calculations and similarity fusion are explained. Derivation of feature vectors for pattern recognition techniques is presented. The relation of the proposed mathematical treatment to seriation, clustering, multidimensional scaling, and visualization techniques is discussed. It is shown that most existing bibliometric analysis techniques for dealing with collections of journal papers are easily expressed in terms of the proposed mathematical treatment: co-citation analysis, bibliographic coupling analysis, author co-citation analysis, journal co-citation analysis, Braam-Moed-vanRaan (BMV) co-citation/co-word analysis, latent semantic analysis, hubs and authorities, and multidimensional scaling. This report discusses an extensive software toolkit that was developed for this research for analyzing and visualizing entities and links in a collection of journal papers. Additionally, an extensive case study is presented, analyzing and visualizing 60 years of anthrax research through a collection of journal papers. When dealing with complex networks that consist of cascaded bipartite networks, the treatment presented here provides a general mathematical framework for all aspects of analysis of static network structure and network dynamic growth. As such, it provides a basic paradigm for thinking about and modeling such networks: computing direct and indirect links, expressing and analyzing statistical distributions of network characteristics, describing network growth, deriving feature vectors, clustering, and visualizing network structure and growth.

Item type: Thesis (UNSPECIFIED)
Keywords: knowledge mapping, citation analysis, bibliometrics, informetrics, visualization
Subjects: A. Theoretical and general aspects of libraries and information.
Depositing user: Steven A. Morris
Date deposited: 19 Sep 2005
Last modified: 02 Oct 2014 12:01
URI: http://hdl.handle.net/10760/6714

References

Ahlgren, P., Jarneving, B., & Rousseau, R. (2003). Requirements for a cocitation similarity measure, with special reference to Pearson's correlation coefficient. Journal of the American Society for Information Science and Technology, 54(6), 550-560.

Albert, R., & Barabasi, A. (2002). Statistical mechanics of complex networks. Reviews of Modern Physics, 74(1), 47-97.

Asnake, B. (2003). Automatic scientific literature classification using multiple information sources for data mining purposes. Unpublished Master of Science thesis, Oklahoma State University, Stillwater, Oklahoma, USA.

Bar-Joseph, Z., Gifford, D. K., & Jaakola, T. S. (2001). Fast optimal leaf ordering for hierarchical clustering. Bioinformatics, 17(S1), S22-S29.

Beaver, D. D. (1978). Studies in scientific collaboration. Part 1. The professional origins of scientific coauthorship. Scientometrics, 1, 65-84.

Berry, M. W., Dumais, S. T., & O'Brien, G. W. (1995). Using linear algebra for intelligent information retrieval. SIAM Review, 37(4), 573-595.

Bezdek, J. C. (1981). Pattern recognition with fuzzy objective function algorithms. New York: Plenum Press.

Bhatnagar, R., & Batra, S. (2001). Anthrax toxin. Critical Reviews in Microbiology, 27(3), 167-200.

Bookstein, A. (1990). Informetric distributions, part I: unified overview. Journal of the American Society for Information Science and Technology, 41(5), 368-375.

Borner, K., Chen, C., & Boyack, K. W. (2002). Visualizing knowledge domains. Annual Review of Information Science and Technology, 37, 179-255.

Borner, K., Maru, J. T., & Goldstone, R. L. (2004). The simultaneous evolution of author and paper networks. Proceedings of the National Academy of Science of the United States, 101(suppl. 1), 5266-5273.

Boyack, K. W., Wylie, B. N., & Davidson, G. S. (2002). Domain visualization using VxIsight for science and technology management. Journal of the American Society for Information Science and Technology, 53(9), 764-774.

Braam, R. R., Moed, H. F., & van Raan, A. F. J. (1991). Mapping of science by combined co-citation and word analysis. I. Structural aspects. Journal of the American Society for Information Science and Technology, 42(4), 233-251.

Bradford, S. C. (1934). Sources of information on specific subjects. Engineering, 137, 85-86.

Brower, J. C., & Kile, K. M. (1988). Seriation of an original data matrix as applied to paleoecology. Lethaia, 21, 79-93.

Burrell, Q. L. (2001). Stochastic modelling of the first-citation distribution. Scientometrics, 52(1), 3-12.

Callon, M., Courtial, J. P., & Laville, F. (1991). Co-word analysis as a tool for describing the network of interactions between basic and technological research - the case of polymer chemistry. Scientometrics, 22(1), 155-205.

Chen, C. (1998). Bridging the gap: the use of pathfinder networks in visual navigation. Journal of Visual Languages and Computing, 9, 267-286.

Chen, C., Cribbin, T., Macredie, R., & Morar, S. (2002). Visualizing and tracking the growth of competing paradigms: two case studies. Journal of the American Society for Information Science and Technology, 53(8), 678-689.

Chen, C. M., & Morris, S. A. (2003, October 19-21, 2003). Visualizing evolving networks: Minimum spanning trees versus Pathfinder networks. Paper presented at the IEEE Symposium on Information Visualization, Seattle, Washington.

Chen, P. (1976). The entity-relationship model—toward a unified view of data. ACM Transactions on Database Systems, 1(1), 9-36.

Cios, K. J., Pedrycz, W., & Swiniarski, R. (1998). Data mining methods for knowledge discovery. Boston: Kluwer Academic.

Crane, D. (1980). An exploratory study of Kuhnian paradigms in theoretical high energy physics. Social Studies of Science, 10, 23-54.

Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6), 391-407.

Ding, Y., Chowdhury, G. G., & Foo, S. (2000). Journal as markers of intellectual space: journal co-citation analysis of information retrieval area, 1987-1997. Scientometrics, 47(1), 55-73.

Doreian, P. (1988). Testing structural-equivalence hypotheses in a network of geographic journals. Journal of the American Society for Information Science, 39(2), 79-85.

Dorogovtsev, S. N., & Mendes, J. F. F. (2002). Evolution of networks. Advances in Physics, 51(4), 1079- 1187.

Duda, R. O., Hart, P. E., & Stork, D. G. (2001). Pattern classification (2nd ed.). New York: Wiley.

Fairthorne, R. A. (1969). Empirical hyperbolic distributions (Bradford-Zipf-Mandelbrot) for bibliometric description and prediction. Journal of Documentation, 25(4), 319-343.

Garfield, E. (1994). Research fronts. Current Contents, 41, 3-7.

Garfield, E., Pudovkin, A. I., & Istomin, V. S. (2003). Mapping the output of topical searches in the Web of Knowledge and the case of Watson-Crick. Information Technology and Libraries, 22(4), 183-187.

Goldstein, M. L., Morris, S. A., & Yen, G. (2004). Problems with fitting to the power-law distribution. European Physical Journal B, 41, 255-258.

Goldstein, M. L., Morris, S. A., & Yen, G. G. (in print). A group-based model for bipartite author-paper networks. Physical Review E, cond-mat/0409205.

Gordon, A. D. (1999). Classification (2nd ed.). Boca Raton: Chapman & Hall/CRC.

Hargens, L. L. (2000). Using the literature: reference networks, reference contexts, and the social structure of scholarship. American Sociological Review, 65(6), 846-865.

Jones, W. P., & Furnas, G. W. (1987). Pictures of relevance: a geometrical analysis of similarity measures. Journal of the American Society for Information Science and Technology, 38(6), 420-442.

Kessler, M. M. (1963). Bibliographic coupling between scientific papers. American Documentation, 14, 10- 25.

Kleinberg, J. M. (1999). Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5), 604-632.

Kruskal, J. B., & Wish, M. (1978). Multidimensional scaling. Beverly Hills: Sage Publications.

Kuhn, T. S. (1970). The structure of scientific revolutions (2d ed.). Chicago: University of Chicago Press.

Lambert, D. (1992). Zero inflated Poisson regression, with application to defects in manufacturing. Technometrics, 34(1), 1-14.

Lenstra, J. K. (1974). Clustering a data array and the traveling salesman problem. Operations Research, 22, 413-414.

Leydesdorff, L. A. (1995). The challenge of scientometrics: the development, measurement, and selforganization of scientific communications. Leiden: DSWO Press, Leiden University.

Lotka, A. J. (1926). The frequency distribution of scientific productivity. Journal of the Washington Academy of Sciences, 16, 317-323.

McCain, K. W. (1990). Mapping authors in intellectual space: a technical overview. Journal of the American Society for Information Science and Technology, 41(6), 433-443.

McCain, K. W. (1991). Mapping economics through the journal literature: an experiment in journal cocitation analysis. Journal of the American Society for Information Science, 42(4), 290-296.

McCain, K. W. (1998). Neural networks research in context: a longitudinal journal cocitation analysis of an emerging interdisciplinary field. Scientometrics, 41(3), 389-410.

Morris, S. A. (2004). Manifestation of emerging specialties in journal literature: a growth model of papers, references, exemplars, bibliographic coupling, co-citation, and clustering coefficient distribution. Journal of the American Society for Information Science and Technology, in press.

Morris, S. A., Asnake, B., & Yen, G. (2003). Optimal dendrogram seriation using simulated annealing. Information Visualization, 2(2), 95-104.

Morris, S. A., DeYong, C., Wu, Z., Salman, S., & Yemenu, D. (2002). DIVA: a visualization system for exploring document databases for technology forecasting. Computers and Industrial Engineering, 43(4), 841-862.

Morris, S. A., Wu, Z., & Yen, G. (2001, July 14-19). A SOM mapping technique for visualizing documents in a database. Paper presented at the International Joint Conference on Neural Networks Proceedings, Washington D. C.

Morris, S. A., & Yen, G. (2004). Crossmaps: visualization of overlapping relationships in collections of journal papers. Proceedings of the National Academy of Science of the United States, 101(suppl. 1), 5291-5296.

Morris, S. A., Yen, G., Wu, Z., & Asnake, B. (2003). Timeline visualization of research fronts. Journal of the American Society for Information Science and Technology, 54(5), 413-422.

Naranan, S. (1971). Power law relations in science bibliography- a self-consistent interpretation. Journal of Documentation, 27(2), 83-97.

Newman, M. E. J., Watts, D. J., & Strogatz, S. H. (2002). Random graph models of social networks. Proceedings of the National Academy of Sciences of the United States of America, 99(suppl 1), 2566-2572.

Packer, C. V. (1989). Applying row-column permutation to matrix representations of large citation networks. Information Processing & Management, 25(3), 307-314.

Persson, O. (1994). The intellectual base and research fronts of JASIS 1986-1990. Journal of the American Society for Information Science and Technology, 45(1), 31-38.

Price, D. (1965). Networks of scientific papers. Science, 149(3683), 510-515.

Price, D. (1976). A general theory of bibliometric and other cumulative advantage processes. Journal of the American Society for Information Science, 27(5-6), 292-306.

Redner, S. (1998). How popular is your paper? an empirical study of the citation distribution. European Physical Journal B, 4(2), 131-134.

Robinson, W. A. (1951). A method for chronologically ordering archaeological deposits. American Antiquity, 16(4), 1350-1362.

Salton, G. (1971). The SMART retrieval system; experiments in automatic document processing. Englewood Cliffs: Prentice-Hall.

Salton, G. (1989). Automatic text processing: the transformation, analysis, and retrieval of information by computer. Reading: Addison-Wesley.

Schagrin, M. L. (1963). Resistance to Ohm's Law. American Journal of Physics, 31, 536-547.

Schvaneveldt, R. W., Dearholt, D. W., & Durso, F. T. (1988). Graph theoretic foundations of pathfinder networks. Comput. Math. Applic., 15(4), 337-345.

Schvaneveldt, R. W., Durso, F. T., & Dearholt, D. W. (1989). Network structures in proximity data. The Psychology of Learning and Motivation, 24, 249-284.

Seglen, P. O. (1992). The skewness of science. Journal of the American Society for Information Science, 43(9), 628-638.

Simon, H. A. (1955). On a class of skew distribution functions. Biometrika, 42, 425-440.

Small, H. (1973). Cocitation in scientific literature - new measure of relationship between 2 documents. Journal of the American Society for Information Science, 24(4), 265-269.

Small, H. (1978). Cited documents as concept symbols. Social Studies of Science, 8, 327-340.

Small, H. (1997). Update on science mapping: creating large document spaces. Scientometrics, 38(2), 275-293.

Turnbull, P. C. B. (1991). Anthrax vaccines: past present and future. Vaccine, 9, 533-539.

White, H. D. (2001). Authors as citers over time. Journal of the American Society for Information Science and Technology, 52(2), 87-108.

White, H. D. (2003a). Author cocitation analysis and Pearson's r. Journal of the American Society for Information Science and Technology, 54(13), 1250-1259.

White, H. D. (2003b). Pathfinder networks and author cocitation analysis: a remapping of paradigmatic information scientists. Journal of the American Society for Information Science and Technology, 54(5), 423-434.

White, H. D., & Griffith, B. C. (1981). Author cocitation: a literature measure of intellectual structure. Journal of the American Society for Information Science, 32(3), 163-172.

White, H. D., & McCain, K. W. (1989). Bibliometrics. Annual Review of Information Science and Technology, 24, 119-186.

White, H. D., & McCain, K. W. (1998). Visualizing a discipline: an author co-citation analysis of information science, 1972-1995. Journal of the American Society for Information Science and Technology, 49(4), 327-355.

Zhu, D., & Porter, A. (2002). Automated extraction and visualization of information for technological intelligence and forecasting. Technological Forecasting and Social Change, 69, 495-506.

Zipf, G. K. (1949). Human behavior and the principle of least effort. Reading: Addison-Wesley.


Downloads

Downloads per month over past year

Actions (login required)

View Item View Item