Interactive proofs with competing teams of no-signaling provers

This paper studies a generalization of multi-prover interactive proofs in which a verifier interacts with two competing teams of provers: one team attempts to convince the verifier to accept while the other attempts to convince the verifier to reject. Each team consists of two provers who jointly implement a no-signaling strategy. No-signaling strategies are a curious class of joint strategy that cannot in general be implemented without communication between the provers, yet cannot be used as a black box to establish communication between them. Attention is restricted in this paper to two-turn interactions in which the verifier asks questions of each of the four provers and decides whether to accept or reject based on their responses. We prove that the complexity class of decision problems that admit two-turn interactive proofs with competing teams of no-signaling provers is a subset of PSPACE. This upper bound matches existing PSPACE lower bounds on the following two disparate and weaker classes of interactive proof: 1. Two-turn multi-prover interactive proofs with only one team of no-signaling provers. 2. Two-turn competing-prover interactive proofs with only one prover per team. Our result implies that the complexity of these two models is unchanged by the addition of a second competing team of no-signaling provers in the first case and by the addition of a second no-signaling prover to each team in the second case. Moreover, our result unifies and subsumes prior PSPACE upper bounds on these classes.


Introduction
Interactive proofs were introduced in the mid-1980's as a generalization of the concept of efficient proof verification and the complexity class NP [Bab85, BM88,GMR89]. Informally speaking, an interactive proof is a conversation between a randomized polynomial-time verifier and a computationally unbounded prover regarding some common input string x. A decision problem L is said to admit an interactive proof if there exists a verifier such that (i) if x is a yes-instance of L then there is a prover who can convince the verifier to accept x with high probability, and (ii) if x is a no-instance of L then no prover can convince the verifier to accept x except with small probability. In a dramatic testament to the surprising power of randomization and interaction, it was soon discovered that every problem in PSPACE admits an interactive proof, yielding the well-known identity IP = PSPACE [LFKN92,Sha92].

Inspiration from quantum information
Though the present paper contains no formal discussion of quantum information, it is proper to acknowledge its role in motivating the study of no-signaling provers. Interest in this model was originally drawn from the study of multi-prover quantum interactive proofs, in which the provers (and possibly the verifier) are permitted to exchange and manipulate quantum information.
It is easy to see that interactive proofs with ordinary, "classical" provers are not affected by the ability of the provers to sample from a common source of randomness. Quantum provers, on the other hand, might use shared pieces of some entangled quantum state to implement a nonlocal strategy that correlates their messages in ways that cannot otherwise be achieved [Bel64]. (The phenomenon of nonlocality was famously branded by Einstein as "spooky action at a distance.") Indeed, some classical protocols which are sound against classical provers are known to become unsound when the provers share entanglement [CHTW04,CGJ09].
Whereas the set of strategies that admit shared entanglement is highly complex, the set of no-signaling strategies is relatively simple and it includes entanglement-sharing strategies as a proper subset. So, for example, any protocol that is sound against no-signaling provers is also sound against quantum provers who share entanglement. It is also interesting to find differences between no-signaling strategies and entanglement-sharing strategies, as this difference sheds light on the extent to which no-signaling can be used as a proxy for shared entanglement. In some protocols the allowance of arbitrary no-signaling strategies leads to implausible consequences [vD05, BBL + 06]. Such protocols can be viewed as mathematical evidence against physical theories that admit so-called "super-strong" nonlocality such as that found in nosignaling strategies but not entanglement-sharing strategies. The present paper establishes a scenario in which two no-signalling provers are equivalent to two signaling provers.

Interactive proofs with competing provers
Another generalization of the single-prover model is an interactive proof with competing provers, in which one prover tries to convince the verifier to accept the input string x while the other prover tries to convince the verifier to reject x. One may consider proofs in which all messages are known to all provers (complete information) or in which each prover sees only the messages he exchanges with the verifier (incomplete information). These two forms of competing-prover interactive proofs were studied by several authors in the 1990's [FST90,FS92,FKS95,FK97]. But for our purpose in this paper it only makes sense to consider protocols with incomplete information.
In the jargon of game theory, interactive proofs with competing provers are zero-sum games, about which there exists a vast body of literature in computer science, economics, and other disciplines. For instance, fast algorithms for zero-sum games of incomplete information in extensive form imply that the complexity class RG of problems that admit interactive proofs with competing provers is a subset of EXP [KM92,KMvS94]. Feige and Kilian proved the reverse containment [FK97], yielding the competing-prover analogy RG = EXP of the aforementioned identity IP = PSPACE for single-prover interactive proofs.
Feige and Kilian also studied two-turn interactive proofs with competing provers, providing a matching upper and lower bound of PSPACE on the complexity of this model [FK97]. The complexity of k-turn interactive proofs with competing provers for constants k ≥ 3 is an open question of interest to both complexity theorists and game theorists alike.

Interactive proofs with competing teams of provers, our result
Multi-prover interactive proofs and interactive proofs with competing provers are two distinct generalizations of the single-prover model. The next logical step is to unify these two generalizations in the obvious way via interactive proofs with competing teams of provers. Combining established naming conventions for complexity classes based on interactive proofs, we let MRG denote the class of decision problems that admit interactive proofs with competing teams of provers.
To the author's knowledge, this model was considered prior to the present work only by Feigenbaum, Koller, and Shor [FKS95]. Those authors studied this class under the game-theoretic guise of zero-sum games of imperfect recall and proved the containments where Σ EXP 2 and Π EXP 2 are classes in the second level of the exponential hierarchy, which is the exponentialtime version of the familiar polynomial hierarchy.
In this paper, we consider interactive proofs with competing teams of no-signaling provers. Our main result is as follows.

Theorem 1. Every decision problem that admits a two-turn interactive proof with competing teams of two no-signaling provers per team is also in PSPACE.
This upper bound matches the aforementioned PSPACE lower bounds on the following two disparate and weaker classes of interactive proof: 1. Two-turn multi-prover interactive proofs with only one team of no-signaling provers [CCL94,IKM09].
Our result implies that the complexity of these two models is unchanged by the addition of a second competing team of no-signaling provers in the first case and by the addition of a second no-signaling prover to each team in the second case. Moreover, our result unifies and subsumes prior PSPACE upper bounds on these classes [Ito10,FK97].

Limitations of the present approach
Attention is restricted in this paper to interactions with no more than two no-signaling provers per team and no more than two messages exchanged with each prover. The purpose for this restriction, quite simply, is that this class of interactions appears to be the largest to which our techniques apply.
For all we know, interactions with three messages for a prover or three provers on a team could be sufficiently powerful to capture all of EXP. Indeed, it is consistent with current knowledge that a threemessage protocol for EXP might require only one prover per team, or that a three-prover no-signaling protocol for EXP might require only one team of provers. Given this paucity of upper bounds for similar, seemingly weaker models it is hoped that any reservation at the restrictions in our model is more than compensated by the fact that we are able to say anything at all about it.
Let us list some natural extensions of the two-prover, two-turn model and point out exactly where our metchod fails for these extensions.
More than two turns, only one prover per team. Perhaps the most important open problem related to our work is the complexity of k-turn interactive proofs with competing provers for constants k ≥ 3. This problem, which dates back at least to 1997 [FK97], is still open even in the special case of only one prover per team. With only one prover per team, the question is really a game-theoretic question with a much wider application than just interactive proofs.
Our method fails for this case because we do not have a bound on the verifier matrix of the form V ≤ e A 01 B 01 p * such as that appearing in Proposition 3. Thus, we do not obtain a good enough bound on the loss vectors appearing in our variant of the multiplicative weights update method.
More than two turns, only one team of no-signaling provers. The complexity of k-turn multi-prover interactive proofs with two no-signaling provers is still open for k ≥ 3, even with only one team of provers [Ito10]. For ordinary multi-prover interactive proofs-in which the provers are not allowed to implement arbitrary no-signaling strategies-it is known that a multi-turn protocol with any number of provers can be simulated by another protocol with only two turns and two provers [FL92].
Our method fails here for the same reason as above-that we cannot bound the loss vectors in the multiplicative weights update method for a multi-turn verifier.
More than two provers, only one team of no-signaling provers. Similarly, the complexity of two-turn multiprover interactive proofs with more than two no-signaling provers is still open, even with only one team of provers [Ito10]. As mentioned above, ordinary multi-prover interactive proofs require only two provers [FL92].
Our method does not extend to this case either, as there is no known analogue of Lemma 6 for more than two provers.
Quantum verifier and/or provers. Even with two no-signaling provers, two turns of interaction, and only one team of provers, it is still not known that the PSPACE upper bound holds when either the verifier or provers can send quantum messages [Ito10]. Here the problem is that Lemma 6 does not hold for quantum states.

Techniques
Theorem 1 is proven by means of an efficient parallel algorithm that, given an explicit description of a verifier and an accuracy parameter δ, finds no-signaling strategies for the teams that are within δ of optimal. Containment in PSPACE then follows in the usual way by observing that the description of the verifier has size exponential in the length of the input string x and then employing the fact that a parallel algorithm with succinct input can be simulated in polynomial space [Bor77]. Our algorithm is an example of the multiplicative weights update method (MWUM) as discussed in the survey paper [AHK05] and in the PhD thesis of Kale [Kal07]. (See also Ref. [WK06].) In its simplest form, the MWUM solves a min-max optimization problem on probability distributions. In the present paper we use the MWUM to optimize not just a single distribution, but many distributions simultaneously in the form of a stochastic matrix that represents a strategy for one of the teams. This trick seems to work only for two-turn protocols, as otherwise it is not clear how to ensure sufficient accuracy.
Let us compare our algorithm to the two previous algorithms it subsumes: • The polynomial-space algorithm of Feige and Kilian for two-turn interactive proofs with competing provers [FK97] is a complicated and highly specialized precursor to the MWUM that, like our algorithm, optimizes over stochastic matrices that represent strategies for the provers.
Their algorithm works by nondeterministically guessing the entries of the matrix and scanning them in a read-once fashion. This approach cannot be extended to optimize over no-signaling strategies, as the read-once model does not allow verification of the no-signaling condition.
• The parallel algorithm of Ito for two-turn, two-prover interactive proofs with no-signaling provers [Ito10] is essentially a reduction to the mixed packing and covering problem, which is a special type of linear program that is known to admit an efficient parallel algorithm [You01].
This approach, too, cannot be extended to competing teams of no-signaling provers, as any linear programming formulation of the protocol is unlikely to be a mixed packing and covering problem.
Our study has benefitted from the valuable experience of recent applications of the MWUM to parallel algorithms for quantum complexity classes [JW09, JUW09, JJUW10, Wu10, GW11]. Indeed, we follow the same high-level approach as the recent proof of DQIP = DIP = PSPACE [GW11]. Namely, • The domain of admissible (no-signaling) strategies is a strict subset of the "natural" domain (stochastic matrices) for the MWUM.
• To get around this problem, the strategy domain is extended to all the stochastic matrices and a penalty term is introduced so as to remove any incentive for a team to use an inadmissible strategy. (See Section 3).
Verifier accept/reject Figure 1: A two-turn interactive proof with competing teams of two no-signaling provers per team.
• Finally, one must prove a "rounding" theorem (Corollary 4.1), which establishes that near-optimal, fully-admissible strategies can be obtained from near-optimal strategies in the extended domain with penalty term.

Definition of two-turn interactive proofs with competing teams of provers
In this paper we are concerned with decision problems that admit two-turn interactive proofs with competing teams of no-signaling provers. Let us clarify this concept. A two-turn verifier is a randomized polynomialtime algorithm that, given an input string x, produces questions i, j for the two teams of provers. The teams select their answers k, l (possibly using randomness to do so) and then the verifier accepts or rejects the input x according to some boolean function of i, j, k, l. For convenience, the teams shall be called Team Alice and Team Bob. It is the goal of Team Alice to convince the verifier to accept the input string x, while Team Bob's goal is to convince the verifier to reject x.
In the protocols we consider each team consists of two provers. The provers of Team Alice shall be called Alice 0 and Alice 1 , while the provers of Team Bob shall be called Bob 0 and Bob 1 . Each individual prover on each team receives his or her own private question and supplies his or her own separate answer to the verifier. In particular, the question i asked of Team Alice is actually a pair i = (i 0 , i 1 ) with question i c going to prover Alice c for both values of the bit c ∈ {0, 1}. Similarly, the question j asked of Team Bob is also a pair j = (j 0 , j 1 ) with question j c going to prover Bob c . The answers k, l received from the two teams are also pairs k = (k 0 , k 1 ) and l = (l 0 , l 1 ) with answers k c and l c coming from Alice c and Bob c , respectively. The entire interaction is illustrated in Figure 1.
Each team may jointly implement any no-signaling strategy in order to produce its answers. Briefly, a strategy for, say, Team Alice is no-signaling if the marginal distribution on answers k 0 from Alice 0 does not depend upon the question i 1 asked of Alice 1 and vice versa. No-signaling strategies are discussed in greater detail in Section 2.5.
A decision problem L is said to admit a two-turn interactive proof with competing teams of no-signaling provers with completeness c and soundness s if there exists a fixed two-turn verifier with the following properties: Completeness. If the input string x is a yes-instance of L then there exists a no-signaling strategy for Team Alice that convinces the verifier to accept x with probability at least c, regardless of the no-signaling strategy employed by Team Bob.
Soundness. If the input string x is a no-instance of L then there exists a no-signaling strategy for Team Bob that convinces the verifier to reject x with probability at least 1 − s, regardless of the no-signaling strategy employed by Team Alice.
The completeness and soundness parameters need not be fixed constants. Rather, they may vary as a function of the input string x. The complexity class MRG ns (2, 2) consists of all decision problems that admit twoturn interactive proofs with competing teams of two no-signaling provers per team with completeness c and soundness s such that there exists a fixed polynomial-bounded function p on strings with c − s ≥ 1/p. (The first parameter of the class MRG ns (2, 2) denotes the number of provers per team, the second denotes the number of turns in the protocol. It is also common to parameterize interactive proof classes according to the number of rounds of communication, rather than the number of turns. Under this scheme, the class MRG ns (2, 2) might be called MRG ns (2, 1) by some authors.) In this paper we prove MRG ns (2, 2) ⊆ PSPACE. It then follows from existing lower bounds on weaker classes [IKM09,FK97] that MRG ns (2, 2) = PSPACE.

Notation, the Kronecker product
To each interactive proof with input x we associate eight distinct finite-dimensional real Euclidean spacesfour question spaces and four answer spaces. These spaces are denoted as follows for both c ∈ {0, 1}: Answers from Bob c : Since the verifier acts in polynomial time, the bit length of the questions and answers is at most a polynomial in the bit length |x| of the input string x. Since n bits suffice to encode 2 n distinct questions or answers, the dimension of the spaces S c , T c , A c , B c can be exponential in |x|. The Kronecker product (or tensor product) of two spaces X , Y is another space with dimension dim(X ) dim(Y). This product space is typically denoted by X ⊗ Y, which we abbreviate to X Y. Kronecker products involving the eight spaces S c , T c , A c , B c are further abbreviated so that and so on. The Kronecker product extends in a natural way to vectors and linear operators. In this paper each vector or linear operator is implicitly associated with its representation as a column or a matrix, for which the Kronecker product is given by a straightforward formula. For example, if A, B are 2 × 2 matrices given by then the Kronecker product A ⊗ B is given by ap aq bp bq ar as br bs cp cq dp dq cr cs dr ds This definition extends in the obvious way to arbitrary matrices of any dimension, including column vectors and other non-square matrices.
We also make use of the following symbols:

Min-max formalism for interactive proofs with competing provers
Given a fixed two-turn verifier and a fixed input string x, let π i,j denote the probability with which the verifier asks questions i = (i 0 , i 1 ) to Team Alice and j = (j 0 , j 1 ) to Team Bob. For each 4-tuple (i, j) of questions to the provers let v i,j ∈ A 01 B 01 denote the 0-1 vector of payouts to Team Bob. That is, for each k = (k 0 , k 1 ) and each l = (l 0 , l 1 ) the (k, l)th entry of v i,j is either zero or one according to whether the verifier accepts or rejects x in the event that the verifier asks questions (i, j) to the teams and they respond with answers (k, l). 1,2 Consider the entrywise nonnegative matrix This matrix uniquely specifies the actions of the verifier. Strategies for the teams are specified as follows. For each pair i of questions let a i ∈ A 01 denote the probability vector of Team Alice's responses to i. That is, for each pair k of answers the kth entry of a i denotes the probability with which Team Alice replies with answers k given that questions i were asked. Thus, the actions of Team Alice are uniquely specified by the stochastic matrix A : S 01 → A 01 whose ith column is a i . Similarly, for each pair j of questions let b j ∈ B 01 denote the probability vector of Team Bob's responses to j. The actions of Team Bob are uniquely specified by the stochastic matrix B : T 01 → B 01 whose jth column is b j . Not every stochastic matrix denotes a valid no-signaling strategy for the teams. Criteria for no-signaling strategies are discussed in Section 2.5. For now, it suffices to note that the set of all strategies available to each team is a compact convex subset of stochastic matrices.
Conditioned on the verifier asking questions (i, j), it is clear that the probability of rejection is given by It follows that the probability of rejection-taken over all questions (i, j)-given strategies A for Team Alice and B for Team Bob is given by the matrix inner product Of course, Team Bob wishes to maximize this quantity while Team Alice wishes to minimize this quantity. Given that the above inner product is bilinear in (A, B) and that the sets of admissible strategies for the two teams are compact and convex, it follows from standard min-max theorems [Vil38,Fan53] that every interactive proof with verifier V has an equilibrium value, which we denote by λ(V ), given by where the minimum is over all no-signaling matrices A : S 01 → A 01 and the maximum is over all nosignaling matrices B : T 01 → B 01 . In particular, for every protocol there exists at least one equilibrium point (A ⋆ , B ⋆ ) with the property that Thus, the strategy B ⋆ always ensures maximum likelihood of rejection, while A ⋆ always ensures minimum likelihood of rejection.
This min-max theorem applies to every min-max expression considered throughout this paper. Henceforth we do not bother to explicitly remark upon this fact. Here and throughout the paper we adopt the convention that for any min-max problem of the form Elements that are 0-optimal-such as A ⋆ , B ⋆ above-are simply called optimal.

Notation for marginal distributions
Before we discuss no-signaling strategies in detail it is beneficial to introduce notation for marginal probability distributions that will be used throughout the remainder of this paper. Suppose, for instance, that a ∈ A 01 is a probability vector of answers from Team Alice to some question from the verifier. We let mar A 1 (a) ∈ A 0 denote the probability vector for the marginal distribution on answers from the prover Alice 0 . Basic probability theory dictates that the mapping mar A 1 satisfy Of course, this mapping may be extended to arbitrary real vectors. For arbitrary spaces X , Y the linear mapping mar Y is defined by (The matrix representation of mar Y is e * Y ⊗I X .) While this mapping is primarily intended to denote marginal probability distributions, we will have occasion to use it on non-probability vectors in this paper.
The mapping mar Y is to vectors as the partial trace is to square matrices. Readers familiar with quantum information know that the state of a quantum register can be computed from a joint state of several registers via the partial trace. So too with probability distributions: the distribution on states of a classical register can be computed from a joint distribution on states of several registers via mar Y .
The mapping mar Y extends naturally from vectors to matrices by applying mar Y to each column: So, for example, if Team Alice acts according to the stochastic matrix A then the stochastic matrix mar A 1 (A) : S 01 → A 0 describes the "marginal" strategy for prover Alice 0 . That is, the (i 0 , i 1 )th column of mar A 1 (A) is the distribution on answers k 0 from Alice 0 given questions (i 0 , i 1 ) from the verifier.

Characterization of no-signaling strategies
Recall that a strategy for Team Alice is no-signaling if for both values of the bit c ∈ {0, 1} the marginal distribution on answers k c from Alice c does not depend on the question i c asked of Alice c . In terms of Team Alice's stochastic matrix A, this condition means that for each i c the (i 0 , i 1 )th column of mar A c (A) is identical for all subindices i c . Letting a ic denote this fixed probability vector and letting A c : S c → A c denote the stochastic matrix whose columns are a ic , the above condition can be written as We have just proven the following simple proposition.
A similar characterization holds for Team Bob.
Stochastic matrices A meeting this condition are called no-signaling matrices. The matrices A c are said to witness the fact that A is a no-signaling matrix. It follows immediately from Proposition 2 that the set of all no-signaling strategies available to each team is compact and convex-a fact already used in Section 2.3 to assert the existence of optimal strategies for the teams.

A relaxed min-max problem with penalties
As mentioned in the introduction, the MWUM in its simplest form solves min-max optimization problems over probability vectors. We optimize over stochastic matrices for the teams by using the MWUM simultaneously on each column of these matrices-a trick that works only for two-turn protocols, as we shall soon see.
We noted in Section 2.5 that the no-signaling matrices available to the teams form a strict subset of the stochastic matrices. In order to optimize only over no-signaling matrices, in this section we specify a new min-max optimization problem µ(V ) in which the teams may use arbitrary strategies but pay a penalty for strategies that violate the no-signaling condition. By a careful choice of penalty, we remove the incentive of the teams to select inadmissible strategies without ruining the precarious convergence properties of the MWUM.
Some preliminary observations are given in Section 3.1 before the formal definition of the new min-max problem µ(V ) in Section 3.2. Equivalence of µ(V ) and λ(V ) is proven in Section 3.3 with proofs of some lemmas in Section 3.4.

Bounds on two-turn verifiers
First, for ease of notation we let Φ V denote the unique linear transformation satisfying for all matrices A, B. Though a precise formula for Φ V is of little use in this paper, for completeness we note that where Tr S 01 and Tr T 01 denote partial trace transformations. At the risk of hijacking terminology from functional analysis, the matrix Φ V (A) can be viewed as a partial inner product between V and A. This matrix can also be viewed as a new two-turn verifier for Team Bob obtained by "hard-wiring" Team Alice's strategy A into the original verifier V . Next, let p ∈ S 01 T 01 denote the probability vector for the distribution on questions asked by the verifier. In the notation of Section 2.3, the (i, j)th entry of p is π i,j -the probability with which the verifier asks questions i to Team Alice and j to Team Bob. Let p Alice ∈ S 01 denote the marginal distribution p Alice = mar T 01 (p) on questions to Team Alice, so that the ith entry of p Alice is j π i,j . It is not hard to see that V ≤ e A 01 B 01 p * with equality achieved in the extreme case that each of the verifier's payout vectors v i,j is equal to the allones vector e A 01 B 01 . (Recall that matrix inequalities are entrywise.) Similarly, it is easy to prove analogous inequalities for Φ V (A), Φ * V (B). For example: Proposition 3. For any stochastic matrix B : T 01 → B 01 it holds that Φ * V (B) ≤ e A 01 p * Alice .
Proof. Let A : S 01 → A 01 be any nonnegative matrix and let a i , b j denote the columns of A, B, respectively. Then As B is stochastic it must be that e B 01 , b j = 1 for each j. The above expression then simplifies to As this inequality holds for all nonnegative matrices A it must be that Φ * V (B) ≤ e A 01 p * Alice as claimed.

Definition of the relaxed min-max problem
The relaxation µ(V ) of λ(V ) is defined by where the triples (A, A 0 , A 1 ) and (B, Π 0 , Π 1 ) have the form The linear mapping f V appearing in the inner product (and its adjoint) is defined by

Intuition
Some explanation is in order. As with the original min-max problem λ(V ), the matrices A and B represent the strategies employed by the teams. Note, however, that in the definition of µ(V ) Team Alice is now free to choose among arbitrary stochastic matrices for its strategy. The matrices A 0 , A 1 for Team Alice are purported witnesses to the claim that A is a valid no-signaling matrix.
For the moment, we are concerned with relaxing the domain only of Team Alice's strategies, so Bob's strategy B must still be no-signaling. Bob's strategies will be addressed in Section 4.2. The matrices Π 0 , Π 1 for Team Bob are penalty matrices-they are the means by which Team Bob penalizes Team Alice according to the extent that A 0 , A 1 are false witnesses to the claim that A is no-signaling.
The new objective function f V (A, A 0 , A 1 ), (B, Π 0 , Π 1 ) equals the old objective function V, A ⊗ B plus two penalty terms. If A is not a no-signaling matrix then the difference matrix must be nonzero for at least one c. In this case, Bob selects Π c to pick out the positive entries of ∆ c , which are then added the verifier's probability of rejection.
Let us informally explain why the restriction 0 ≤ Π c ≤ e Ac p * Alice on penalty matrices is sufficient to remove Team Alice's incentive to cheat. Suppose the k c th entry of the ith column of the difference matrix ∆ c is a positive real number δ > 0 and suppose that A ′ is a valid no-signaling matrix witnessed by A 0 , A 1 . Since the verifier asks questions i of Team Alice with probability π i , it must be that, when selecting the probability with which to answer k c , the advantage gained by Team Alice from using the inadmissible strategy A instead of the no-signaling strategy A ′ is at most δπ i . By selecting a penalty matrix Π c so that the k c th entry of the ith column of Π c is equal to π i , Team Bob adds precisely the quantity δπ i to the verifier's probability of rejection, thus eliminating the advantage obtained by Team Alice in acting according to A instead of A ′ for this particular choice of questions i and answer k c from Alice c .
Repeating this logic for all entries (i, k c ) of ∆ c , we find that Team Bob should select the penalty matrix Π c so that the (i, k c )th entry is either zero or π i according to whether the corresponding entry of ∆ c is nonpositive or positive. A penalty matrix of this form is called optimal for (A, A 0 , A 1 ) and satisfies ∆ c , Π c = ∆ + c , e Ac p * Alice where ∆ + c is the positive part of ∆ c . (Here the positive part of a real matrix X is the matrix X + with the property that if x is any entry of X then the corresponding entry of X + is max{0, x}.)

Equivalence of the two min-max problems
We are now ready to prove the desired "rounding theorem" mentioned in the introduction, a corollary of which is the equivalence of the min-max problems µ(V ) and λ(V ) (Corollary 4.1). The theorem employs two lemmas and their corollaries, the proofs of which appear below in Section 3.4.
Theorem 4 (Rounding theorem). Let (A, A 0 , A 1 ) be a feasible solution for µ(V ) and let Π A 0 , Π A 1 be optimal penalties for (A, A 0 , A 1 ). There exists a no-signaling matrix A ns witnessed by A 0 , A 1 such that for all stochastic matrices B it holds that Moreover, A ns can be computed efficiently in parallel given (A, A 0 , A 1 ).
Proof. For both c ∈ {0, 1} let ∆ + c be the positive part of mar A c (A) − A c ⊗ e * S c and observe that By Corollary 5.1 below there exists a preimage D + 0 ≥ 0 of ∆ + 0 with Let Γ + 1 be the positive part of mar A 0 A − D + 0 − A 1 ⊗ e * S 0 . As with ∆ c above, observe that (Moreover, it is easy to see that Γ + 1 ≤ ∆ + 1 -a fact we employ later in this proof.) Apply Corollary 5.1 again to obtain a preimage C + 1 ≥ 0 of Γ + 1 with

Thus, we have a matrix
Hence there exist nonnegative matrices T c : Applying mar Ac to both sides of this equation we see that mar A 0 (T 0 ) = mar A 1 (T 1 ). By Corollary 6.1 below there exists a nonnegative matrix T : S 01 → A 01 with mar A c (T ) = T c for both c ∈ {0, 1}. The desired no-signaling matrix A ns is given by As D + 0 , C + 1 , and T can be computed efficiently in parallel, so too can A ns . To see that A ns is a no-signaling matrix witnessed by A 0 , A 1 it suffices to observe that It remains only to verify the stated inequality. To this end, we have As A ns and A are both stochastic matrices, it must be that D + 0 + C + 1 and T have the same column sums. As T, e A 01 p * Alice equals the sum of the column sums of T weighted according to p Alice , the matrix T can be replaced by D + 0 + C + 1 without affecting this inner product. That is Expanding the right side of this equality we obtain As Γ + 1 ≤ ∆ + 1 this quantity is at most Putting everything together, we have as desired.
Corollary 4.1 (Equivalence of min-max problems). The following hold for any verifier V and any δ ≥ 0: is δ-optimal for µ(V ) then there exists A ns such that A ns is δ-optimal for λ(V ) and A ns can be computed efficiently in parallel given (A µ , A µ 0 , A µ 1 ).
Proof. We begin with item 1. It is easy to prove λ(V ) ≥ µ(V ): let A λ be optimal for λ(V ), let A 0 , A 1 witness the fact that A λ is no-signaling, and let (B µ , Π µ 0 , Π µ 1 ) be optimal for µ(V ). Then For the reverse inequality, let (A µ , A µ 0 , A µ 1 ) be optimal for µ(V ), let Π A µ 0 , Π A µ 1 be optimal penalties for (A µ , A µ 0 , A µ 1 ), and let B λ be optimal for λ(V ). By Theorem 4 there exists a no-signaling matrix A ns witnessed by A µ 0 , A µ 1 such that The desired inequality λ(V ) ≤ µ(V ) follows from the fact that the left side is at least λ(V ) and the right side is at most µ(V ). The proof of item 1 is complete. Item 2 follows easily from item 1. Let A be a no-signaling matrix and let A 0 , A 1 witness this fact. Then As A was chosen arbitrarily, it follows that B µ is δ-optimal for λ(V ). For item 3, let B be any no-signaling matrix and let Π A µ 0 , Π A µ 1 be optimal penalties for the given δoptimal solution (A µ , A µ 0 , A µ 1 ). By Theorem 4 there exists a no-signaling matrix A ns witnessed by A µ 0 , A µ 1 such that As B was chosen arbitrarily, it follows that A ns is δ-optimal for λ(V ).

Lemmas used in the rounding theorem
The lemmas used in the proof of Theorem 4 are not difficult. It is quite likely that some form of these lemmas is part of computer science "folklore," though our notation may be nonstandard.
Lemma 5 (Small marginals have small preimages). Let a ∈ A 01 and δ ∈ A 0 be nonnegative vectors with δ ≤ mar A 1 (a). There exists a nonnegative vector d ∈ A 01 with d ≤ a and mar A 1 (d) = δ. Moreover, d can be computed efficiently in parallel given a, δ.
Proof. Let a (k 0 ,k 1 ) and δ k 0 denote the nonnegative entries of a and δ, respectively. Let s k 0 denote the k 0 th entry of mar A 1 (a) so that The desired vector d has entries d (k 0 ,k 1 ) given by is "spread out" over each d (k 0 ,k 1 ) proportionately according to a (k 0 ,k 1 ) .) It is clear that this construction can be implemented efficiently in parallel. Let us verify that d ≤ a. Observe that for the case s k 0 = 0 the ratio δ k 0 /s k 0 is at most one because δ ≤ mar A 1 (a). Then as desired. Of course, if s k 0 = 0 then d (k 0 ,k 1 ) = 0 by definition and hence d (k 0 ,k 1 ) ≤ a (k 0 ,k 1 ) because a ≥ 0. Let us verify that mar A 1 (d) = δ. For the case s k 0 = 0 the k 0 th entry of mar A 1 (d) is given by a (k 0 ,k 1 ) = δ k 0 as desired. As above, if s k 0 = 0 then by definition d (k 0 ,k 1 ) = 0 for each k 1 and hence k 1 d (k 0 ,k 1 ) = 0. As 0 ≤ δ k 0 ≤ s k 0 it must be that δ k 0 = 0, too. Proof. Apply Lemma 5 to each of the columns of A, ∆.
Lemma 6 (Disjoint marginals are always consistent). For both c ∈ {0, 1} let t c ∈ A c be nonnegative vectors whose entries sum to the same value. There exists a nonnegative vector t ∈ A 01 with mar A c (t) = t c for both c ∈ {0, 1}. Moreover, t can be computed efficiently in parallel given t 0 , t 1 .
Proof. Let p k 0 and q k 1 be the nonnegative entries of t 0 and t 1 , respectively. Let s denote the sum of the entries of t 0 , t 1 so that If s = 0 then it is clear that the desired vector t is the zero vector. For the remainder of the proof assume that s = 0. The desired vector t has entries t (k 0 ,k 1 ) given by It is clear that this construction can be implemented efficiently in parallel. Let us verify that mar A c (t) = t c for both c ∈ {0, 1}. For the case c = 0 the k 0 th entry of mar A 1 (t) is given by as desired. The case c = 1 is handled similarly. Proof. Apply Lemma 6 to each of the columns of T 0 , T 1 .

A parallel multiplicative weights algorithm
In this section we complete the proof of our main result-that every decision problem that admits a twoturn interactive proof with competing teams of no-signaling provers is also in PSPACE. Most of the detail appears in Section 4.1 wherein we present an efficient parallel oracle-algorithm based on the MWUM that produces δ-optimal no-signaling strategies for the teams, given an oracle for "best responses" for Team Bob to a given candidate strategy for Alice. We describe an efficient parallel implementation of the required oracle in Section 4.2, from which the unconditional efficiency of our algorithm immediately follows. The ensuing inclusion of MRG ns (2, 2) inside PSPACE is discussed in Section 4.3.

The parallel algorithm
Precise statements of the problem solved by our algorithm and the oracle it requires are given below. All input numbers are written as rational numbers in binary. For matrix inputs, each entry is written explicitly.
Output: A δ-optimal no-signaling strategyB for Team Bob. (That is, a no-signaling matrixB such that S,B ≥ S, B − δ for all no-signaling matrices B.) Given Corollary 4.1, it suffices to find δ-optimal solutions (Ã,Ã 0 ,Ã 1 ) and (B,Π 0 ,Π 1 ) for µ(V ) and then convert these solutions into δ-optimal strategies for λ(V ). This method is codified in the algorithm of Figure 2.
This algorithm is a straightforward modification of the standard multiplicative weights update method for equilibrium problems. The precise formulation of the MWUM used in this paper is stated as Theorem 7. Our statement of this theorem is somewhat nonstandard: the result is usually presented in the form of an algorithm, whereas our presentation is purely mathematical. However, a cursory examination of the literature-say, Kale's thesis [Kal07, Chapter 2]-reveals that our mathematical formulation is equivalent to the more conventional algorithmic form.
Theorem 7 (Multiplicative weights update method-see Ref. [Kal07,Theorem 2]). Fix an ε ∈ (0, 1/2). Let m 1 , . . . , m T be arbitrary D-dimensional "loss" vectors whose entries m t i lay in the interval [−α, α]. Let w 1 , . . . , w T be D-dimensional nonnegative "weight" vectors whose entries w t i are given recursively via Let p 1 , . . . , p T be probability vectors obtained by normalizing each w 1 , . . . , w T . For all probability vectors Note that Theorem 7 holds for all choices of loss vectors m 1 , . . . , m T , including the case in which each m t is chosen adversarially based upon w t . This adaptive selection of loss vectors is typical in implementations of the MWUM. Figure 2 solves the weak no-signaling equilibrium problem (Problem 1). Assuming unit cost for the oracle, this algorithm can be implemented in parallel with run time bounded by a polynomial in 1/δ and log(dim(S 01 T 01 A 01 B 01 )).

Proposition 8. The oracle-algorithm presented in
Proof. For each pair i = (i 0 , i 1 ) of questions let π i denote the probability with which the verifier asks questions i to Team Alice. Let m t denote the ith column of M t for each t = 1, . . . , T . We argue that the entries of m t lay in the interval [0, 3π i ]. To this end, observe that the loss matrix M t is defined in Figure 2 via the adjoint mapping f * V as where the inequality follows immediately from the bound Φ * V (B) ≤ e A 01 p * Alice of Proposition 3 and the restriction Π c ≤ e Ac p * Alice on penalty matrices. The desired bound on the entries of m t follows from the observation that the ith column of 3e A 01 p * Alice is the vector whose entries are all equal to 3π i .
2. Repeat for each t = 1, . . . , T : (a) Compute optimal penalties Π t 0 , Π t 1 for (A t , A t 0 , A t 1 ) as described in Section 3.2. Use the oracle for Problem 2 to obtain a δ/2-best response B t to the verifier-Alice matrix Φ V (A t ).
Exit the loop now if t = T . (c) Update the weight matrices according to the standard multiplicative weights update rule: where ⊠ denotes the (entrywise) matrix Schur product. (See Theorem 7.) (d) Compute the updated triple (A t+1 , A t+1 0 , A t+1 1 ) of stochastic matrices for Team Alice by normalizing the columns of (W t+1 , W t+1 0 , W t+1 1 ).
4. Return (Ã ns ,B) as the δ-optimal strategies of Team Alice and Team Bob for λ(V ).
Figure 2: Algorithm that finds δ-optimal solutions to the equilibrium problem λ(V ) for two-turn interactive proofs with competing teams of no-signaling provers (Problem 1).
Let a t denote the ith column of A t for t = 1, . . . , T . It is clear that the construction of the probability vectors a t in terms of the loss vectors m t presented in Figure 2 obeys the condition of Theorem 7. It therefore follows that for any probability vector a ∈ A 01 we have Summing these inequalities over all columns i we find that for any stochastic matrix A it holds that A similar bound on the stochastic matrices A t 0 , A t 1 in terms of the loss matrices M t 0 , M t 1 can be derived in much the same way. For completeness, let us make this argument explicit. For both c ∈ {0, 1} and for each question i c let π ic denote the probability with which the referee asks question i c to Alice c . Let m t c denote the i c th column of M t c for each t = 1, . . . , T . We argue that the entries of m t c lay in the interval [−π ic , 0]. Recall the loss matrix M t c is defined in Figure 2 via the adjoint mapping f * V as where the inequality follows immediately from the restriction Π c ≤ e Ac p * Alice on penalty matrices. The desired bound on the entries of m t c follows from the observation that the i c th column of e Ac mar S c (p Alice ) * is the vector whose entries are all equal to π ic .
As above, let a t c denote the i c th column of A t c for t = 1, . . . , T . It is clear that the construction of the probability vectors a t c in terms of the loss vectors m t c presented in Figure 2 obeys the condition of Theorem 7. It therefore follows that for any probability vector a c ∈ A c we have εT .
Summing these inequalities over all columns i c we find that for any stochastic matrix A c it holds that At this point we have derived three inequalities for three arbitrary stochastic matrices A, A 0 , A 1 . Summing these inequalities and substituting ) and the choices of ε, T listed in Figure 2 we find that for any triple (A, A 0 , A 1 ) of stochastic matrices it holds that The remainder of this proof is a straightforward adaptation of Kale's analysis for the much simpler class of two-player zero-sum games in normal form [Kal07, Section 2.3.1]. We argue that the triples (Ã,Ã 0 ,Ã 1 ) and (B,Π 0 ,Π 1 ) appearing Figure 2 are δ-optimal for µ(V ). Let us begin with the triple (Ã,Ã 0 ,Ã 1 ). Choose any (B, Π 0 , Π 1 ) and let (A ⋆ , A ⋆ 0 , A ⋆ 1 ) be optimal for µ(V ). We have as desired. (The first inequality is because each (B t , Π t 0 , Π t 1 ) is a δ/2-best response to (A t , A t 0 , A t 1 ); the second is Eq. (1).) To see that (B,Π 0 ,Π 1 ) is δ-optimal for µ(V ), let (A, A 0 , A 1 ) be any triple of stochastic matrices. We have as desired. (The first inequality is Eq. (1); the second is because each (B t , Π t 0 , Π t 1 ) is a δ/2-best response to (A t , A t 0 , A t 1 ).) Finally, it follows from Corollary 4.1 thatÃ ns andB are δ-optimal strategies for λ(V ). That the algorithm admits an efficient parallel implementation is straightforward. In each iteration computations of optimal penalties, the loss matrices (via f * V ), the multiplicative weights update rule, and normalization are all simple operations involving only addition and multiplication of individual rational entries of matrices that can easily be implemented in parallel. Efficiency follows from the fact that the total number of iterations is bounded by a polynomial in 1/δ and the logarithm of dim(S 01 T 01 A 01 B 01 ), the size of the verifier matrix.

Implementations of the best-response oracle for Team Bob
In order for the algorithm of Figure 2 to be unconditionally efficient, we require a parallel implementation of the oracle for weak no-signaling optimization (Problem 2). Fortunately, all the work is already done: Problem 2 is the optimization problem that arises naturally from two-turn, two-prover interactive proofs with no-signaling provers. Thus, the parallel algorithm of Ito [Ito10] can be re-used to implement the oracle in our algorithm without complication.
In Ito's terminology, the verifier-Alice matrix Φ V (A) specifies a game and the two no-signaling provers comprising Team Bob are the players. Ito does not claim that an explicit strategy for the players can be found efficiently in parallel. Rather, he claims only that the task of distinguishing high success probability from low success probability admits a parallel algorithm, as this simpler task is sufficient to put MIP ns (2, 2) inside PSPACE. However, a cursory glance at the details of Ito's proof reveals a parallel construction of near-optimal no-signaling strategies for the players as required by Problem 2.
Alternatively, the oracle for weak no-signaling optimization (Problem 2) can be implemented by re-using the algorithm for weak no-signaling equilibrium (Problem 1) listed in Figure 2 of the present paper. Indeed, Problem 2 is a special case of Problem 1 in which one team has a trivial strategy space. In this special case the required "oracle" demands only weak no-signaling optimization over a trivial strategy space, which of course admits a trivial parallel implementation. In other words, the algorithm of Figure 2 can be used in a two-level recursive fashion to give an unconditionally efficient parallel algorithm for Problem 1.

Containment in PSPACE
The desired containment of MRG ns (2, 2) inside PSPACE now follows in the usual way: Theorem 1. Every decision problem that admits a two-turn interactive proof with competing teams of two no-signaling provers per team is also in PSPACE. Thus, we obtain the identity MRG ns (2, 2) = PSPACE.
Proof. Let L be a decision problem in MRG ns (2, 2) with completeness c and soundness s and let x be any input string. Each entry of the exponential-size verifier matrix V : S 01 T 01 → A 01 B 01 induced by the verifier on input x can be computed in space polynomial in |x| by simulating every choice of randomness for the verifier. In order to decide whether x is a yes-instance or no-instance of L it suffices to find δ-optimal strategies for the teams for δ = (c − s)/3, which permits us to distinguish λ(V ) ≥ c from λ(V ) ≤ s. It follows from Proposition 8 and the discussion in Section 4.2 that the algorithm of Figure 2 can be used to find δ-optimal strategies for the teams and can be implemented in parallel with run time bounded by a polynomial in 1/δ and the logarithm of the dimensions of V . As the dimensions of V scale exponentially with |x| and δ scales as an inverse polynomial in |x| the total run time of this parallel algorithm scales polynomially with |x| and can therefore be simulated in polynomial space in the usual way [Bor77].