期刊全称 | Algorithms and Models for the Web-Graph | 期刊简称 | 6th International Wo | 影响因子2023 | Konstantin Avrachenkov,Debora Donato,Nelly Litvak | 视频video | http://file.papertrans.cn/154/153189/153189.mp4 | 学科分类 | Lecture Notes in Computer Science | 图书封面 |  | 影响因子 | This book constitutes the refereed proceedings of the 6th International Workshop on Algorithms and Models for the Web-Graph, WAW 2009, held in Barcelona, Spain, in February 2009 - co-located with WSDM 2009, the Second ACM International Conference on Web Search and Data Mining. The 14 revised full papers presented were carefully reviewed and selected from numerous submissions for inclusion in the book. The papers address a wide variety of topics related to the study of the Web-graph such as theoretical and empirical analysis of the Web graph and Web 2.0 graphs, random walks on the Web and Web 2.0 graphs and their applications, and design and performance evaluation of the algorithms for social networks. The workshop papers have been naturally clustered in three topical sections on graph models for complex networks, pagerank and Web graph, and social networks and search. | Pindex | Conference proceedings 2009 |
1 |
Front Matter |
|
|
Abstract
|
2 |
Information Theoretic Comparison of Stochastic Graph Models: Some Experiments |
Kevin J. Lang |
|
Abstract
The Modularity-Q measure of community structure is known to falsely ascribe community structure to random graphs, at least when it is naively applied. Although Q is motivated by a simple kind of comparison of stochastic graph models, it has been suggested that a more careful comparison in an information-theoretic framework might avoid problems like this one. Most earlier papers exploring this idea have ignored the issue of skewed degree distributions and have only done experiments on a few small graphs. By means of a large-scale experiment on over 100 large complex networks, we have found that modeling the degree distribution is essential. Once this is done, the resulting information-theoretic clustering measure does indeed avoid Q’s bad property of seeing cluster structure in random graphs.
|
3 |
Approximating the Number of Network Motifs |
Mira Gonen,Yuval Shavitt |
|
Abstract
World Wide Web, the Internet, coupled biological and chemical systems, neural networks, and social interacting species, are only a few examples of systems composed by a large number of highly interconnected dynamical units. These networks contain characteristic patterns, termed ., which occur far more often than in randomized networks with the same degree sequence. Several algorithms have been suggested for counting or detecting the number of induced or non-induced occurrences of network motifs in the form of trees and bounded treewidth subgraphs of size .(log.), and of size at most 7 for some motifs..In addition, counting the number of motifs a node is part of was recently suggested as a method to classify nodes in the network. The promise is that the distribution of motifs a node participate in is an indication of its function in the network. Therefore, counting the number of network motifs . provides a major challenge. However, no such practical algorithm exists..We present several algorithms with time complexity .. that, for the first time, approximate for every vertex the number of non-induced occurrences of the motif the vertex is part of, for .-length cycles, .-length cycles
|
4 |
Finding Dense Subgraphs with Size Bounds |
Reid Andersen,Kumar Chellapilla |
|
Abstract
We consider the problem of finding dense subgraphs with specified upper or lower bounds on the number of vertices. We introduce two optimization problems: the densest at-least-.-subgraph problem (.), which is to find an induced subgraph of highest average degree among all subgraphs with at least . vertices, and the densest at-most-.-subgraph problem (.), which is defined similarly. These problems are relaxed versions of the well-known densest .-subgraph problem (.), which is to find the densest subgraph with exactly . vertices. Our main result is that . can be approximated efficiently, even for web-scale graphs. We give a (1/3)-approximation algorithm for . that is based on the core decomposition of a graph, and that runs in time .(. + .), where . is the number of nodes and . is the number of edges. In contrast, we show that . is nearly as hard to approximate as the densest .-subgraph problem, for which no good approximation algorithm is known. In particular, we show that if there exists a polynomial time approximation algorithm for . with approximation ratio ., then there is a polynomial time approximation algorithm for . with approximation ratio ../8. In the experimental section,
|
5 |
The Giant Component in a Random Subgraph of a Given Graph |
Fan Chung,Paul Horn,Linyuan Lu |
|
Abstract
We consider a random subgraph .. of a host graph . formed by retaining each edge of . with probability .. We address the question of determining the critical value . (as a function of .) for which a giant component emerges. Suppose . satisfies some (mild) conditions depending on its spectral gap and higher moments of its degree sequence. We define the second order average degree . to be . where .. denotes the degree of .. We prove that for any .> 0, if . then asymptotically almost surely the percolated subgraph .. has a giant component. In the other direction, if . then almost surely the percolated subgraph .. contains no giant component.
|
6 |
Quantifying the Impact of Information Aggregation on Complex Networks: A Temporal Perspective |
Fernando Mourão,Leonardo Rocha,Lucas Miranda,Virgílio Almeida,Wagner Meira Jr. |
|
Abstract
Complex networks are a popular and frequent tool for modeling a variety of entities and their relationships. Understanding these relationships and selecting which data will be used in their analysis is key to a proper characterization. Most of the current approaches consider all available information for analysis, aggregating it over time. In this work, we studied the impact of such aggregation while characterizing complex networks. We model four real complex networks using an extended graph model that enables us to quantify the impact of the information aggregation over time. We conclude that data aggregation may distort the characteristics of the underlying real-world network and must be performed carefully.
|
7 |
A Local Graph Partitioning Algorithm Using Heat Kernel Pagerank |
Fan Chung |
|
Abstract
We give an improved local partitioning algorithm using heat kernel pagerank, a modified version of PageRank. For a subset . with Cheeger ratio (or conductance) ., we show that there are at least a quarter of the vertices in . that can serve as seeds for heat kernel pagerank which lead to local cuts with Cheeger ratio at most ., improving the previously bound by a factor of ..
|
8 |
Choose the Damping, Choose the Ranking? |
Marco Bressan,Enoch Peserico |
|
Abstract
To what extent can changes in PageRank’s damping factor affect node ranking? We prove that, at least on some graphs, the top . nodes assume . possible .! orderings as the damping factor varies, even if it varies within an . interval (e.g. [0.84999, 0.85001]). Thus, the rank of a node for a given (finite set of discrete) damping factor(s) provides very little information about the rank of that node as the damping factor varies over a continuous interval..We bypass this problem introducing . and proving that there is a simple condition, with a “natural” interpretation independent of PageRank, that allows one to verify “in one shot” if a node outperforms another . for all damping factors and all damping variables (informally, time variant damping factors). The novel notions of . and . of a node provide a measure of the fuzziness of the rank of that node, of the objective orderability of a graph’s nodes, and of the quality of results returned by different ranking algorithms based on the random surfer model..We deploy our analytical tools on a 41M node snapshot of the .it Web domain and on a 0.7M node snapshot of the CiteSeer citation graph. Among other findings, we show that rank is in
|
9 |
Characterization of Tail Dependence for In-Degree and PageRank |
Nelly Litvak,Werner Scheinhardt,Yana Volkovich,Bert Zwart |
|
Abstract
The dependencies between power law parameters such as in-degree and PageRank, can be characterized by the so-called angular measure, a notion used in extreme value theory to describe the dependency between very large values of coordinates of a random vector. Basing on an analytical stochastic model, we argue that the angular measure for in-degree and personalized PageRank is concentrated in two points. This corresponds to the two main factors for high ranking: large in-degree and a high rank of one of the ancestors. Furthermore, we can formally establish the relative importance of these two factors.
|
10 |
Web Page Rank Prediction with PCA and EM Clustering |
Polyxeni Zacharouli,Michalis Titsias,Michalis Vazirgiannis |
|
Abstract
In this paper we describe learning algorithms for Web page rank prediction. We consider linear regression models and combinations of regression with probabilistic clustering and Principal Components Analysis (PCA). These models are learned from time-series data sets and can predict the ranking of a set of Web pages in some future time. The first algorithm uses separate linear regression models. This is further extended by applying probabilistic clustering based on the EM algorithm. Clustering allows for the Web pages to be grouped together by fitting a mixture of regression models. A different method combines linear regression with PCA so as dependencies between different web pages can be exploited. All the methods are evaluated using real data sets obtained from Internet Archive, Wikipedia and Yahoo! ranking lists. We also study the temporal robustness of the prediction framework. Overall the system constitutes a set of tools for high accuracy pagerank prediction which can be used for efficient resource management by search engines.
|
11 |
Permuting Web Graphs |
Paolo Boldi,Massimo Santini,Sebastiano Vigna |
|
Abstract
Since the first investigations on web graph compression, it has been clear that the ordering of the nodes of the graph has a fundamental influence on the compression rate (usually expressed as the number of bits per link). The author of the LINK database [1], for instance, investigated three different approaches: an . ordering (URL ordering) and two . (or .) orderings based on the rows of the adjacency matrix (lexicographic and Gray code); they concluded that URL ordering has many advantages in spite of a small penalty in compression. In this paper we approach this issue in a more systematic way, testing some old orderings and proposing some new ones. Our experiments are made in the WebGraph framework [2], and show that the compression technique and the structure of the graph can produce significantly different results. In particular, we show that for the . web graph URL ordering is significantly less effective, and that some new orderings combining host information and Gray/lexicographic orderings outperform all previous methods. In particular, in some large transposed graphs they yield the quite incredible compression rate of 1 bit per link.
|
12 |
A Dynamic Model for On-Line Social Networks |
Anthony Bonato,Noor Hadi,Paul Horn,Paweł Prałat,Changping Wang |
|
Abstract
We present a deterministic model for on-line social networks based on transitivity and local knowledge in social interactions. In the Iterated Local Transitivity (ILT) model, at each time-step and for every existing node ., a new node appears which joins to the closed neighbour set of .. The ILT model provably satisfies a number of both local and global properties that were observed in real-world on-line social and other complex networks, such as a densification power law, decreasing average distance, and higher clustering than in random graphs with the same average degree. Experimental studies of social networks demonstrate poor expansion properties as a consequence of the existence of communities with low number of inter-community links. A spectral gap for both the adjacency and normalized Laplacian matrices is proved for graphs arising from the ILT model, thereby simulating such bad expansion properties.
|
13 |
,: Ranking the Social Web |
Antonio Gulli,Stefano Cataudella,Luca Foschini |
|
Abstract
Web search is extensively adopted for accessing information on the web, and recent development in personalized search, social bookmarking and folksonomy systems has tremendously eased the user’s path towards the desired information..In this paper, we discuss ., a novel link-based algorithm which extends state-of-the-art algorithms for folksonomy systems. The algorithm leverages the importance of users in the social community, the importance of the bookmarks/resource they share, and additional temporal information and clicks information. Temporal information has a primary importance in social bookmarking search since users continuously post new and fresh information. The importance of this information may decay after a while, if it is no longer tagged or clicked..As a case study for testing the effectiveness of ., we discuss . a novel folksonomy system that unifies web search and social bookmarking by transparently leveraging a Wiki-based collaborative editing system. When an interesting search result is found, a user can share it with the . community by simply clicking a button. This information is implicitly tagged with the query submitted to any commodity search engine. Later on,
|
14 |
Exploiting Positive and Negative Graded Relevance Assessments for Content Recommendation |
Maarten Clements,Arjen P. de Vries,Marcel J. T. Reinders |
|
Abstract
Social media allow users to give their opinion about the available content by assigning a rating. Collaborative filtering approaches to predict recommendations based on these graded relevance assessments are hampered by the sparseness of the data. This sparseness problem can be overcome with graph-based models, but current methods are not able to deal with negative relevance assessments..We propose a new graph-based model that exploits both positive and negative preference data. Hereto, we combine in a single content ranking the results from two graphs, one based on positive and the other based on negative preference information. The resulting ranking contains less false positives than a ranking based on positive information alone. Low ratings however appear to have a predictive value for relevant content. Discounting the negative information therefore does not only remove the irrelevant content from the top of the ranking, but also reduces the recall of relevant documents.
|
15 |
Cluster Based Personalized Search |
Hyun Chul Lee,Allan Borodin |
|
Abstract
We study personalized web ranking algorithms based on the existence of document clusterings. Motivated by the topic sensitive page ranking of Haveliwala [20], we develop and implement an efficient “local-cluster” algorithm by extending the web search algorithm of Achlioptas et al. [10]. We propose some formal criteria for evaluating such personalized ranking algorithms and provide some preliminary experiments in support of our analysis. Both theoretically and experimentally, our algorithm differs significantly from Topc Sensitive Page Rank.
|
16 |
Back Matter |
|
|
Abstract
|
|
|