# Communication Complexity and Applications, II (17w5147)

Arriving in Banff, Alberta Sunday, March 19 and departing Friday March 24, 2017

## Organizers

Amit Chakrabarti (Dartmouth College)

Funda Ergun (Simon Fraser University)

Andrew McGregor (University of Massachusetts Amherst)

Anup Rao (University of Washington)

## Objectives

In August 2014 we organized the first ever workshop on Communication Complexity and Applications at BIRS, with the objective of bringing together researchers working on foundational questions about communication with researchers using communication complexity techniques and results in other areas of theoretical computer science. We hereby propose a second edition of the workshop to capitalize on the success of the inaugural one and bring other new groups of researchers into the fold.

Communication complexity has proven to be a very useful abstraction for studying computational processes and the interactions of their internal components, as well as interactions between separate processes. It is hard to imagine a computational process that does not involve some form of communication. For example, one can take any computational device (such as a cellphone or a computer), divide it into two parts, and then view the device as executing a communication protocol between these parts. In this way, a lower bound on the communication required can be translated into a lower bound on the resources (for example, the number of wires) needed to build the device. Likewise, any distributed architecture (be it a multi-core processor or the entire Internet) naturally represents the execution of a communication protocol amongst the constituent processes.

Over the past three decades, the communication complexity model has been applied to the study of a wide array of subfields of computation. Recently, the model has been exploited to spectacular effect in the study of space-efficient data stream algorithms. Given a streaming problem, one can view it as being solved by communicating processes as follows. One can partition the stream into two (or more) parts, and view the contents of the memory during the execution of the algorithm as a communication between the two (or more) processes working on the respective parts. Indeed, lower bounds on the space requirements of data-stream algorithms are almost exclusively proved by appealing to communication bounds.

A more recent application for communication complexity techniques is in property testing, where the queries being made by a tester for a property of some large object can be modeled as messages between multiple parties communicating to compute a shared function, which is closely linked to the property being tested. Lower bounds for testing the property are then implied by the communication required.

Yet another application, witnessed in a flurry of recent work that has achieved major breakthroughs, is in mathematical programming methods for combinatorial optimization. A key question in this area is: given a particular optimization problem, how large a linear (or semidefinite) program must one solve to either exactly or approximately capture its optima? Techniques from communication complexity have enabled new bounds that deepen our understanding of how efficiently optimization problems can be modeled as linear (or semidefinite) programs.

Over the last quarter century, a wide range of techniques for proving communication lower bounds have been established. Examples include linear algebraic methods based on factorization norms, singular value estimation, and semidefinite programming, combinatorial methods based on monochromatic and near-monochromatic rectangles, and information-theoretic methods based on entropy and mutual information. Further, the subject has branched out into quantum communication complexity, which has provided its own set of unique insights and ideas. Significant progress has been made towards unifying these approaches via the information-theoretic approach. In addition to being an important step in establishing a unifying theory, these advances have brought us closer to solving longstanding open questions in communication complexity, such as the direct-sum and direct-product questions.

In the direct-sum problem, we consider the complexity of computing n copies of a function, in terms of the complexity of a single copy. For example, known fast matrix multiplication algorithms demonstrate an "economy of scale" phenomenon: there exists a linear function $L$ on $n$ bits for which the smallest circuit computing $L(x)$ has size at least $n^2$, whereas the smallest circuit computing $L(x_1), L(x2), \ldots, L(x_n)$ for $n$ different inputs is of size $\ll n^{2.4}$. A central open question is whether a similar phenomenon exists in communication complexity. Advances in recent years have used information-theoretic methods to give interesting answers to this fundamental question. In particular, a near-equivalent formulation is whether it is possible to compress interactive communication to a size proportional to the information content of the communication. If so, such a result would generalize classical results of Shannon and Huffman in the non-interactive setting.

Concurrent with research in communication complexity, the study of data stream algorithms has achieved significant maturity in the sense that we now have a comprehensive understanding of estimating "frequency-based" statistics such as quantiles and other functions of the distribution of elements in the stream. Space-efficient algorithms have been designed that have been shown to be optimal by appealing to communication complexity lower bounds. However, the focus in stream processing has now moved to significantly more structured problems such as linear-algebraic problems (e.g., low-rank approximation, regression) and graph problems (e.g., spectral sparsification, analyzing matchings and related polytopes, spanners and distances). Understanding the complexity of problems that do not cleanly decompose into "primitive problems" (as was the case in frequency-based statistics) is an important direction for future research. In addition, new applications have been discovered for communication complexity: for example, we have the aforementioned application to property testing, and also applications to the design of data structures and compressed sensing.

In September 2012, in proposing the inaugural edition of this workshop, we had noted that the research community focused on the theory of communication complexity had remained largely segregated from that working on applications to space-efficient algorithms, perhaps due to their being naturally sub-communities within "complexity theory" and "algorithms" respectively. That inaugural workshop at BIRS took the important step of getting these communities to mix and collaborate. Developments over the last three years have only increased the importance of such meetings across otherwise segregated communities. We note two specific examples.

1. On the application side, work done largely in the optimization and operations research communities has brought communication complexity techniques to bear in a sophisticated way to answer long-standing open questions about the complexity of extended formulations for the maximum clique and maximum matching problems.

2. On the pure theory side, long-standing gaps between communication lower bounds implied by matrix rank and actual communication complexity have been narrowed recently. This was partly achieved using combinatorial techniques that are normally the purview of functional analysts. Only a small community within computer science is fluent in them.

An important goal of this workshop is to foster closer ties between the communities identified above. This also involves training young researchers who will become skilled in more than one area.

More technical goals of this workshop include the following. 1) Identifying promising approaches to resolving core open problems in communication complexity such as the direct-sum question and the log-rank conjecture. 2) Extending the clean setting of direct-sum to cover general combiner functions including partial functions, as well as primitive problems that cannot be made fully independent. Collaborations with applications researchers would help identify the most interesting extensions to study. 3) Identifying new communication problems arising in established application areas such as data streams plus emerging ones such as combinatorial optimization. 4) Strengthening the connections of communication complexity to the field of property testing. 5) Furthering the study of communication complexity classes “up the polynomial hierarchy”, such as Arthur-Merlin, $\Sigma_2$, and $\Pi_2$, some of these having been the focus of recent theoretical and application-oriented research.

Communication complexity has proven to be a very useful abstraction for studying computational processes and the interactions of their internal components, as well as interactions between separate processes. It is hard to imagine a computational process that does not involve some form of communication. For example, one can take any computational device (such as a cellphone or a computer), divide it into two parts, and then view the device as executing a communication protocol between these parts. In this way, a lower bound on the communication required can be translated into a lower bound on the resources (for example, the number of wires) needed to build the device. Likewise, any distributed architecture (be it a multi-core processor or the entire Internet) naturally represents the execution of a communication protocol amongst the constituent processes.

Over the past three decades, the communication complexity model has been applied to the study of a wide array of subfields of computation. Recently, the model has been exploited to spectacular effect in the study of space-efficient data stream algorithms. Given a streaming problem, one can view it as being solved by communicating processes as follows. One can partition the stream into two (or more) parts, and view the contents of the memory during the execution of the algorithm as a communication between the two (or more) processes working on the respective parts. Indeed, lower bounds on the space requirements of data-stream algorithms are almost exclusively proved by appealing to communication bounds.

A more recent application for communication complexity techniques is in property testing, where the queries being made by a tester for a property of some large object can be modeled as messages between multiple parties communicating to compute a shared function, which is closely linked to the property being tested. Lower bounds for testing the property are then implied by the communication required.

Yet another application, witnessed in a flurry of recent work that has achieved major breakthroughs, is in mathematical programming methods for combinatorial optimization. A key question in this area is: given a particular optimization problem, how large a linear (or semidefinite) program must one solve to either exactly or approximately capture its optima? Techniques from communication complexity have enabled new bounds that deepen our understanding of how efficiently optimization problems can be modeled as linear (or semidefinite) programs.

Over the last quarter century, a wide range of techniques for proving communication lower bounds have been established. Examples include linear algebraic methods based on factorization norms, singular value estimation, and semidefinite programming, combinatorial methods based on monochromatic and near-monochromatic rectangles, and information-theoretic methods based on entropy and mutual information. Further, the subject has branched out into quantum communication complexity, which has provided its own set of unique insights and ideas. Significant progress has been made towards unifying these approaches via the information-theoretic approach. In addition to being an important step in establishing a unifying theory, these advances have brought us closer to solving longstanding open questions in communication complexity, such as the direct-sum and direct-product questions.

In the direct-sum problem, we consider the complexity of computing n copies of a function, in terms of the complexity of a single copy. For example, known fast matrix multiplication algorithms demonstrate an "economy of scale" phenomenon: there exists a linear function $L$ on $n$ bits for which the smallest circuit computing $L(x)$ has size at least $n^2$, whereas the smallest circuit computing $L(x_1), L(x2), \ldots, L(x_n)$ for $n$ different inputs is of size $\ll n^{2.4}$. A central open question is whether a similar phenomenon exists in communication complexity. Advances in recent years have used information-theoretic methods to give interesting answers to this fundamental question. In particular, a near-equivalent formulation is whether it is possible to compress interactive communication to a size proportional to the information content of the communication. If so, such a result would generalize classical results of Shannon and Huffman in the non-interactive setting.

Concurrent with research in communication complexity, the study of data stream algorithms has achieved significant maturity in the sense that we now have a comprehensive understanding of estimating "frequency-based" statistics such as quantiles and other functions of the distribution of elements in the stream. Space-efficient algorithms have been designed that have been shown to be optimal by appealing to communication complexity lower bounds. However, the focus in stream processing has now moved to significantly more structured problems such as linear-algebraic problems (e.g., low-rank approximation, regression) and graph problems (e.g., spectral sparsification, analyzing matchings and related polytopes, spanners and distances). Understanding the complexity of problems that do not cleanly decompose into "primitive problems" (as was the case in frequency-based statistics) is an important direction for future research. In addition, new applications have been discovered for communication complexity: for example, we have the aforementioned application to property testing, and also applications to the design of data structures and compressed sensing.

In September 2012, in proposing the inaugural edition of this workshop, we had noted that the research community focused on the theory of communication complexity had remained largely segregated from that working on applications to space-efficient algorithms, perhaps due to their being naturally sub-communities within "complexity theory" and "algorithms" respectively. That inaugural workshop at BIRS took the important step of getting these communities to mix and collaborate. Developments over the last three years have only increased the importance of such meetings across otherwise segregated communities. We note two specific examples.

1. On the application side, work done largely in the optimization and operations research communities has brought communication complexity techniques to bear in a sophisticated way to answer long-standing open questions about the complexity of extended formulations for the maximum clique and maximum matching problems.

2. On the pure theory side, long-standing gaps between communication lower bounds implied by matrix rank and actual communication complexity have been narrowed recently. This was partly achieved using combinatorial techniques that are normally the purview of functional analysts. Only a small community within computer science is fluent in them.

An important goal of this workshop is to foster closer ties between the communities identified above. This also involves training young researchers who will become skilled in more than one area.

More technical goals of this workshop include the following. 1) Identifying promising approaches to resolving core open problems in communication complexity such as the direct-sum question and the log-rank conjecture. 2) Extending the clean setting of direct-sum to cover general combiner functions including partial functions, as well as primitive problems that cannot be made fully independent. Collaborations with applications researchers would help identify the most interesting extensions to study. 3) Identifying new communication problems arising in established application areas such as data streams plus emerging ones such as combinatorial optimization. 4) Strengthening the connections of communication complexity to the field of property testing. 5) Furthering the study of communication complexity classes “up the polynomial hierarchy”, such as Arthur-Merlin, $\Sigma_2$, and $\Pi_2$, some of these having been the focus of recent theoretical and application-oriented research.