Towards a Unified Treatment of Dynamic Graphs (15w5162)

Arriving in Banff, Alberta Sunday, March 29 and departing Friday April 3, 2015

Organizers

(University of British Columbia)

(University of Victoria)

(University of Massachusetts Amherst)

Mikkel Thorup (University of Copenhagen)

Objectives

The main objective of the workshop is to bring together researchers working in these areas (graph sparsification, dynamic graph algorithms, graph streaming and distributed graph algorithms) and to stimulate the exchange of ideas and techniques between these groups. Our proposed schedule would include several carefully chosen survey talks so that all participants can understand the foundational assumptions and viewpoints of all of the fields. We also plan to have many focused technical talks by leading experts to present the recent research results and techniques that could be used by participants in all of the fields.Another of our aims is to allow Canadian researchers working in these areas to meet. Two of our organizers are Canadian, we will encourage Canadian graduate students to get involved in these areas. We also aim to promote female participation: one of our organizers is female, and we have invited at least four other female colleagues.Next we give a more in-depth view of the areas and the open problems in each and then conclude with a discussion of where we anticipate the points of interaction will be. Dynamic graph algorithms A data structure is a method for storing, accessing, and updating information, for example, items may be stored in a list sorted by key and allow accessing by binary search on the key. The goal of these methods is primarily is to minimize the cost of a worst case online sequence of update and query operations in terms of time of execution, as a function of the number of nodes $n$ and current number of edges $m$. Data structures for maintaining graph properties have been studied since the early 1960's, with substantial progress made in each decade. Any optimization problem on graphs can be considered in this context and many have been: e.g., maintaining a minimum spanning tree, a minimum cut/maximum flow, k-edge connectivity, shortest paths, and maximal matching. The sequence of operations consist of update operations (edge insertions or deletions) and queries, such as ``what is the size of the min cut?" Typically, the concern was to minimize the worst case running time of each operation. In the early 1970's, Bob Tarjan developed techniques for the amortized analysis of data structures, that is, finding the average cost per operation over a worst case sequence of operations. The mid to late 1990's saw the introduction of amortized running time for dynamic graph algorithms which were polylogarithmic in the size of the graph, for a variety of problems. These, however, had very bad worst case performance, on the order of linear in the size of the graph. The key to these methods is that if an algorithm spends a long time on one operation, it leaves behind a more useful representation of the graph, enabling the speed-up of future operations. In the last decade, especially in the last five years, there has been a great deal of progress in maintaining approximate shortest distances, for directed as well as undirected graphs. Typically these techniques involve sampling of the edges. Some techniques have explicitly used sparse representations which preserve approximate distances, known as spanners. Work on transitive closure of graphs have led to dynamic algorithms for matrices representing graphs, such as methods for updating determinants and inverses of matrices, as matrix entries change. Very recently, a method for maintaining dynamic connectivity was developed which allows polylogarithmic worst case update time (SODA 2013). While its techniques were developed independently, they echoed the sparsification techniques of the graph sketching community developed the previous year. This coincidence is in part the inspiration for two of the authors involved, from these different communities to co-author this proposal. In the past year, we’ve seen the first algorithm which maintains approximate single source shortest paths in a graph undergoing edge deletions in sublinear time overall (to appear in SODA 2014). There have also been milestones in maintaining all-pairs approximate shortest paths in undirected and directed graphs, in STOC 2013 and FOCS 2013. The algorithms tend to be complicated and to some extent, a bit ad hoc. Another important development of the last decade is the work relating lower bounds for dynamic graph algorithms to communication lower bounds. In his STOC 2010 paper, Mihai Patrascu showed that we could polynomial lower bounds for dynamic graph problems if we could prove lower-bounds on a certain three-party communication game. Getting such polynomial lower bounds would be an immense step forward. Patrascu gave a series of reductions to communications lower bounds, and posed some important open problems there. The first super $log n$ lower bound were demonstrated recently for some problems. How far can this be pushed? Open Problem: For what graph problems can we find polylogarithmic worst case time bounds? Open Problem: Can we prove super polylogarithmic lower bounds on dynamic graph problems? Open Problem: Can we develop or apply general sampling methods to simplify and improve dynamic graph algorithms, both directed and undirected? Graph Streams and SketchingThere is a growing body of work over the last decade on processing graphs in the data stream model. In the basic version of this model, a sequence of edges arrive in an arbitrary order and the goal is to evaluate properties of the resulting graph subject to a memory constraint. The majority of existing research has focused on the ``semi-streaming" memory constraint in which an algorithm may use memory proportional to the number of nodes but needs to return correct answers on any input graph, no matter how dense. With less memory, it can be shown that virtually no interesting properties of the graph can be computed whereas there exists semi-streaming algorithms for a large range of problems including estimating the size of cuts, finding a minimum spanning tree, and approximating maximum weight matchings.In the last two years, the first data stream algorithms for processing fully dynamic graphs, where edges are both inserted and deleted, have been developed. These algorithms are based on applying a random linear projection to a matrix representation of the graph, a technique known as ``sketching" the data. The projection compresses the graph into a lower dimensional space such that the projection can be stored in small space. The linearity of the projection ensures that the projection can be updated easily as edges are added and removed without the need to store the entire graph. Low dimensional sketches have been designed for estimating connectivity and other spectral properties of a graphs. In addition to applications in data streams, linear sketches also result in surprisingly efficient communication protocols. For example, suppose the rows of an $ntimes n$ adjacency matrix are partitioned between $n$ different players. A surprising corollary of the sketching result is that it is possible for the players to determine whether the graph is connected while each player only communicates $O(log^3 n)$ bits about his or her input. It would be natural to expect applications of this technique when processing graphs in distributed models, but these are yet to be explored.
Graph Spanners and SparsifiersGraph spanners and sparsifiers are sparse subgraphs that preserve interesting properties of the original graph. In an $alpha$ spanner, the length of the shortest path between any two nodes is at most a factor $alpha$ larger than the length in the original graph. As $alpha$ increases, the number of edges required in the spanner decreases until, for $alphageq log n$, the average degree of the spanner can be made constant. In a cut sparsifier, the weight of any cut in the subgraph is within a $(1+epsilon)$ factor of the weight of the corresponding cut in the original graph. A more powerful form of sparsification is spectral sparsification in which other spectral properties (in addition to cut sizes) are also approximately preserved. These include the eigenvalues of the graph and hence also properties of random walks and natural partitions of the graph. Given any graph, it is possible to construct a spectral sparsifier with only $O(epsilon^{-2 nlog n)$ edges in near-linear time. Open problem: Improve the spectral sparsification algorithm for dynamic graph streams. The best algorithm to date uses roughly $n^{5/3}$ space whereas cut sparsification is possible in roughly $n$ space. Alternatively, can we prove a lower bound would separate cut and spectral sparsification? Open problem: What can be said about matching in dynamic graph streams? If there are only edge inserts, the trivial greedy algorithm gives a 2 approximation in roughly n space. With edge inserts, it's known that beating $e/(e-1)$ requires more than $n ~polylog(n)$ space but a) we can't match that in the general case (edges arriving in an arbitrary order) and b) we don't know what is possible in $n^1.001 $ space (the lower bound is very fragile and doesn't extend). There are also numerous problems in the weighted case. Open problem: Distance approximation in dynamic streams: When there are only edge insertions, we can greedily construct a 2t-1 spanner using only the space required to store the spanner, i.e.,$ O(n^{1+1/t})$. Can we match this when there are edges inserted and deleted, in o(t) passes or even a single pass? Open problem: What properties of directed graphs can be estimated in the data stream model? It can be shown that even when the are only edge insertions, $Omega(n^2)$ space is required to distinguish between the cases a) there exists a path of length 3 from node s to node t and b) there is no path of any length between s and t. But this is trivially possible in $O(n)$ space given three passes over the data. It would be natural to explore pass-space trade-offs for the problem. Points of interactionGraph sketching for the semi-streaming model and graph algorithms are incomparable models in terms of their strengths. The semi-streaming model allows for space proportional to $n$ or $n log^c n$, while the time (at least amortized) for many dynamic graph algorithms is proportional to $log^c n$. Thus while the entire graph is accessible to the dynamic graph algorithms, only rarely (if amortized) can it be accessed to process an update. However the semi-streaming algorithm could take $n log^c n$ time with every update to process each update. Graph sketching may suggest techniques for paring down the information which never needs to be examined, resulting in the possible speed-up of dynamic graph algorithms. Another possibility is that we will discover tradeoffs between the space and time constraints:A common basis is the use of graph sampling. From the view of dynamic graph algorithms, this has been done in a rather ad hoc fashion, designed for each problem, but also done adaptively as the updates occur. In graph sketching, the sampling is more oblivious, for example, a random projection to reduce dimensions. The latter approach may help to unify and make sense of the dynamic graph techniques, while it may be hoped that the less regular techniques of the dynamic graph and distributed graph community may suggest a broader range of possibilities to the graph sketching community. Both the graph sketching and the dynamic graph community recognize that communication bounds are directly related to their problems and central to the proofs of lower bounds in their respective areas, and these are implicitly or explicitly of concern to people working in parallel and distributed computing, for graph problems and dynamic graph problems. For example, in the distributed dynamic minimum spanning tree problem, where each processor corresponds to a node in the graph and communication is along edges, the current state of art requires either linear communication per update or large local memory: each processor maintains the whole spanning forest in local memory. Open problem: Can we give algorithms for time/space tradeoffs in dynamic graphs? Open problem: How can we prove lower bounds using communication bounds for dynamic graphs, in terms of space and time? Open problem: Can we develop new methods for parallel and distributed computing for graph problems and dynamic graph problems?