Quantum Algorithms for Matrix Products over Semirings

In this paper we construct quantum algorithms for matrix products over several algebraic structures called semirings, including the (max,min)-matrix product, the distance matrix product and the Boolean matrix product. In particular, we obtain the following results. We construct a quantum algorithm computing the product of two n x n matrices over the (max,min) semiring with time complexity O(n^{2.473}). In comparison, the best known classical algorithm for the same problem, by Duan and Pettie, has complexity O(n^{2.687}). As an application, we obtain a O(n^{2.473})-time quantum algorithm for computing the all-pairs bottleneck paths of a graph with n vertices, while classically the best upper bound for this task is O(n^{2.687}), again by Duan and Pettie. We construct a quantum algorithm computing the L most significant bits of each entry of the distance product of two n x n matrices in time O(2^{0.64L} n^{2.46}). In comparison, prior to the present work, the best known classical algorithm for the same problem, by Vassilevska and Williams and Yuster, had complexity O(2^{L}n^{2.69}). Our techniques lead to further improvements for classical algorithms as well, reducing the classical complexity to O(2^{0.96L}n^{2.69}), which gives a sublinear dependency on 2^L. The above two algorithms are the first quantum algorithms that perform better than the $\tilde O(n^{5/2})$-time straightforward quantum algorithm based on quantum search for matrix multiplication over these semirings. We also consider the Boolean semiring, and construct a quantum algorithm computing the product of two n x n Boolean matrices that outperforms the best known classical algorithms for sparse matrices. For instance, if the input matrices have O(n^{1.686...}) non-zero entries, then our algorithm has time complexity O(n^{2.277}), while the best classical algorithm has complexity O(n^{2.373}).


Introduction
Background. Matrix multiplication over semirings has a multitude of applications in computer science, and in particular in the area of graph algorithms (e.g., [5,17,19,20,21,23]). One example is Boolean matrix multiplication, related for instance to the computation of the transitive closure of a graph, where the product of two n × n Boolean matrices A and B is defined as the n × n Boolean matrix C = A · B such that C[i, j] = 1 if and only if there exists a k ∈ {1, . . . , n} such that A[i, k] = B[k, j] = 1.
More generally, given a set R ⊆ Z ∪ {−∞, ∞} and two binary operations ⊕ : R × R → R and ⊙ : R × R → R, the structure (R, ⊕, ⊙) is a semiring if it behaves like a ring except that there is no requirement on the existence of an inverse with respect to the operation ⊕. Given two n × n matrices A and B over R, the matrix product over (R, ⊕, ⊙) is the n × n matrix C defined as C[i, j] = n k=1 (A[i, k] ⊙ B[k, j]) for any (i, j) ∈ {1, . . . , n} × {1, . . . , n}. The Boolean matrix product is simply the matrix product over the semiring ({0, 1}, ∨, ∧). The (max, min)-product and the distance product, which both have applications to a multitude of tasks in graph theory such as constructing fast algorithms for all-pairs paths problems (see, e.g., [19]), are the matrix products over the semiring (Z ∪ {−∞, ∞}, max, min) and the semiring (Z ∪ {∞}, min, +), respectively. Whenever the operation ⊕ is such that a term as n k=1 x k can be computed inÕ( √ n) time using quantum techniques (e.g., for ⊕ = ∨ using Grover's algorithm [8] or for ⊕ = min and ⊕ = max using quantum algorithms for minimum finding [7]) and each operation ⊙ can be implemented in polylog(n) time, the product of two n × n matrices over the semiring (R, ⊕, ⊙) can be computed in timeÕ(n 5/2 ) on a quantum computer. 1 This is true for instance for the Boolean matrix product, and for both the (max, min) and distance matrix products.
A fundamental question is whether we can do better than thoseÕ(n 5/2 )-time straightforward quantum algorithms. For the Boolean matrix product, the answer is affirmative since it can be computed classically in timeÕ(n ω ), where ω < 2.373 is the exponent of square matrix multiplication over a field. However, Boolean matrix product appears to be an exception, and for most semirings it is not known if matrix multiplication can be done inÕ(n ω )-time. For instance, the best known classical algorithm for the (max, min)-product, by Duan and Pettie [5], has time complexityÕ(n (3+ω)/2 ) = O(n 2.687 ) while, for the distance product, no truly subcubic classical algorithm is even known (without introducing assumptions on the matrices).
Our results. We construct in this paper the first quantum algorithms with exponent strictly smaller than 5/2 for matrix multiplication over several semirings.
We first obtain the following result for matrix multiplication over the (max, min) semiring.
Theorem 1.1. There exists a quantum algorithm that computes, with high probability, the (max, min)product of two n × n matrices in time O(n 2.473 ).
In comparison, the best known classical algorithm for the (max, min)-product, by Duan and Pettie [5], has time complexityÕ(n (3+ω)/2 ) = O(n 2.687 ), as mentioned above. The (max, min)-product has mainly been studied in the field in fuzzy logic [6] under the name composition of relations and in the context of computing the all-pairs bottleneck paths of a graph (i.e., computing, for all pairs (s,t) of vertices in a graph, the maximum flow that can be routed between s and t). More precisely, it is well known (see, e.g., [5,17,21]) that if the (max, min)-product of two n × n matrices can be computed in time T (n), then the all-pairs bottleneck paths of a graph with n vertices can be computed in timeÕ(T (n)). As an application of Theorem 1.1, we thus obtain a O(n 2.473 )-time quantum algorithm computing the all-pairs bottleneck paths of a graph of n vertices, while classically the best upper bound for this task is O(n 2.687 ), again from [5]. In order to prove Theorem 1.1, we construct a quantum algorithm that computes the product of two n × n matrices over the existence dominance semiring (defined in the next section) in timeÕ(n (5+ω)/3 ) ≤ O(n 2.458 ). The dominance product has applications in computational geometry [16] and graph algorithms [20] and, in comparison, the best known classical algorithm for this product [23] has complexity O(n 2.684 ). Computing efficiently the existence dominance product is, nevertheless, not enough for our purpose. We introduce (in Section 3) a new generalization of it that we call the generalized existence dominance product, and develop both quantum and classical algorithms that compute efficiently this product. This is the most technical part of this paper.
We also show (in Subsection 4.2) how these results for the generalized existence dominance product can be used to construct classical and quantum algorithms computing the ℓ most significant bits of each entry of the distance product of two n × n matrices. In the quantum setting, we obtain time complexitỹ O 2 0.640ℓ n (5+ω)/3 ≤ O(2 0.640ℓ n 2.458 ). In comparison, prior to the present work, the best known classical algorithm for the same problem by Vassilevska and Williams [20] had complexityÕ 2 ℓ n (3+ω)/2 ≤ O(2 ℓ n 2.687 ), with a slight improvement on the exponent of n obtained later by Yuster [23]. We obtain an improvement for this classical time complexity as well, reducing it toÕ 2 0.960ℓ n (3+ω)/2 , which gives a sublinear dependency on 2 ℓ .
These results are, to the best of our knowledge, the first quantum algorithms for matrix multiplication over semirings other than the Boolean semiring improving over the straightforwardÕ(n 5/2 )-time  Figure 1: The upper bounds of Theorem 1.2 (in solid lines). The horizontal axis represents the logarithm of m with respect to basis n (i.e., the value log n (m)). The vertical axis represents the logarithm of the complexity with respect to basis n. The dashed lines represent the upper bounds obtained by [24].
quantum algorithm, and the first nontrivial quantum algorithms offering a speedup with respect to the best classical algorithms for matrix multiplication when no assumptions are made on the sparsity of the matrices involved (sparse matrix multiplication is discussed below). This shows that, while quantum algorithms may not be able to outperform the classicalÕ(n ω )-time algorithm for matrix multiplication of (dense) matrices over a ring, they can offer a speedup for matrix multiplication over other algebraic structures. We finally investigate under which conditions quantum algorithms faster than the best known classical algorithms can be constructed for Boolean matrix multiplication. This question has been recently studied extensively in the output-sensitive scenario [3,10,12,13], for which quantum algorithms multiplying two n × n Boolean matrices with query complexityÕ(n √ λ ) and time complexityÕ(n √ λ + λ √ n) were constructed, where λ denotes the number of non-zero entries in the output matrix. In this work, we focus on the case where the input matrices are sparse (but not necessarily the output matrix), and evaluate the performance of quantum algorithms in this scenario. Our result (Theorem 5.1) shows how several standard combinatorial ideas for sparse Boolean matrix multiplication can be adapted in the quantum setting, and used to construct quantum algorithms faster than the best known classical algorithms. In particular, we obtain the following result. Theorem 1.2 (simplified version). Let A and B be two n × n Boolean matrices each containing at most m non-zero entries. There exists a quantum algorithm that computes, with high probability, the Boolean matrix product A · B and has time complexity The complexity of the algorithm of Theorem 1.2 is the piece-linear function of log n (m) represented in Figure 1. In comparison, the best known classical algorithm, by Yuster [24] has complexityÕ(n ω ).
Our main quantum tool is rather standard: quantum enumeration, a variant of Grover's search algorithm. We use this technique in various ways to improve the combinatorial steps in several classical approaches [1,5,21,24] that are based on a combination of algebraic steps (computing some matrix products over a field) and combinatorial steps. Moreover, the speedup obtained by quantum enumeration enables us to depart from these original approaches and optimize the combinatorial and algebraic steps in different ways, for instance relying on rectangular matrix multiplication instead of square matrix multiplication. On the other hand, several subtle but crucial issues appear when trying to apply quantum enumeration, such as how to store and access information computed during the preprocessing steps, which induces complications and requires the introduction of new algorithmic ideas. We end up with algorithms fairly remote from these original approaches, where most steps are tailored for the use of quantum enumeration. More detailed technical overviews are given at the beginning of Sections 3, 4 and 5.

Fact 2. The following relations hold for any values k
Matrix products over semirings. We define below two matrix products over semirings considered in Sections 3 and 4, respectively, additionally to the Boolean product, the (max, min)-product and the distance product defined in the introduction. These products were also used in [5,20,21].
It is easy to check, as mentioned for instance in [5,21], that computing the (max, min)-product reduces to computing the product . Indeed if C denotes the (max, min)-product of two matrices A and B, then for any where A T and B T denote the transposes of A and B, respectively. Matrix products over the semirings (min, max), (min, ≤) and (max, ≥) studied, for instance, in [19], similarly reduce to computing the product .
Quantum algorithms for matrix multiplication. We assume that a quantum algorithm can access any entry of the input matrix in a random access way, similarly to the standard model used in [3,10,12,13] for Boolean matrix multiplication. More precisely, let A and B be two n × n matrices, for any positive integer n (the model presented below can be generalized to deal with rectangular matrices in a straightforward way). We suppose that these matrices can be accessed directly by a quantum algorithm: We have an oracle O A that, for any i, j ∈ {1, . . . , n}, and any z ∈ {0, 1} * , maps the state |i | j |0 |z to |i | j |A[i, j] |z . We have a similar oracle O B for B. Since we are interested in time complexity, we will count all the computational steps of the algorithm and assign a cost of one for each call to O A or O B , which corresponds to the cases where quantum access to the inputs A and B can be done at unit cost, for example in a random access model working in quantum superposition. We say that a quantum algorithm for matrix multiplication computes the product of A and B with high probability if, when given access to oracles O A and O B corresponding to A and B, the algorithm outputs with probability at least 2/3 all the entries of the product of A and B. The complexity of several algorithms in this paper will be stated using an upper bound λ on the number of non-zero or non-infinite entries in the product of A and B. The same complexity, up to a logarithmic factor, can actually be obtained even if no nontrivial upper bound is known a priori, see [12,13].
We will use variants of Grover's search algorithm, as described for instance in [2], to find elements satisfying some conditions inside a search space of size N. Concretely, suppose that a Boolean function f : {1, . . . , N} → {0, 1} is given and that we want to find a solution, i.e., an element x ∈ {1, . . . , n} such that f (x) = 1. Consider the quantum search procedure (called safe Grover search in [15]) obtained by repeating Grover's standard search a logarithmic number of times, and checking if a solution has been found. This quantum procedure outputs one solution with probability at least 1 − 1/poly(N) if a solution exists, and always rejects if no solution exists. Its time complexity isÕ( N/ max(1,t))), where t denotes the number of solutions, if the function f can be evaluated inÕ(1) time. By repeating this procedure and striking out solutions as soon as they are found, one can find all the solutions with probability at least 1 − 1/poly(N) usingÕ N/t + N/(t − 1) + · · · + N/1 =Õ( N(t + 1)) computational steps. We call this procedure quantum enumeration.

Existence Dominance Matrix Multiplication
In this section we present a quantum algorithm that computes the existence dominance product of two matrices A and B. The underlying idea of our algorithm is similar to the idea in the best classical algorithm for the same problem by Duan and Pettie [5]: use a search step to find some of the entries of A * B, and rely on classical algebraic algorithms to find the other entries. We naturally use quantum search to implement the first part, and perform careful modifications of their approach to improve the complexity in the quantum setting, taking advantage of the features of quantum enumeration. There are two notable differences: The first one is that the algebraic part of our quantum algorithms uses rectangular matrix multiplication, while [5] uses square matrix multiplication. The second and crucial difference is that, for applications in later sections, we give a quantum algorithm that can handle a more general version of the existence dominance product, defined on set of matrices, which we call the generalized existence dominance product and define below.
The generalized existence dominance product of these matrices is the n × n matrix C with entries in S ∪ {(0, 0)} defined as follows: for all (i, j) ∈ {1, . . . , n} × {1, . . . , n} the entry C[i, j] is the maximum element in S i j , where the maximum refers to the lexicographic order.
Note that the case u = v = 1 corresponds to the standard existence dominance product, since Proof. Let t ∈ {1, . . . , m 1 } be a parameter to be chosen later. Let L be the list of all finite entries in A (1) , . . . , A (u) sorted in increasing order. Decompose L into t successive parts L 1 , . . . , L t , each containing at most ⌈m 1 /t⌉ entries. For each x ∈ {1, . . . , u} and each r ∈ {1, . . . ,t} we construct two n × n matrices A Similarly, for each y ∈ {1, . . . , v} and each r ∈ {1, . . . ,t} we construct two n × n matrices B It is easy to see that, for each x ∈ {1, . . . , u} and y ∈ {1, . . . , v}, the following equality holds (where the operators + and ∑ refer to the entry-wise OR): Indeed, the second term compares entries that are in a same part L r , while the first term takes into consideration entries in distinct parts. Define two n × n matrices C 1 and C 2 with entries in S ∪ {(0, 0)} as follows: for all (i, j) ∈ {1, . . . , n} × {1, . . . , n}, From Equation (1) r are known. We can obtain all these uv terms by computing the following Boolean product of an nu × nt matrix by an nt × nv matrix (both matrices can be constructed in timeÕ(n 2 t(u + v))).
The cost of this matrix multiplication isÕ n ω(1+log n u,1+log n t,1+log n v) . From item (iv) of Fact 2, we conclude that the matrix C 1 can be computed in timẽ O n 2 uv + n 2 t(u + v) + n ω(1+log n u,1+log n t,1+log n v) =Õ n ω(1+log n u,1+log n t,1+log n v) .
We use the following lemma to help us compute the matrix C 2 . While this is the main technical part of the proof of this proposition, for readability its proof is placed in the appendix.
for all (i, j) ∈ {1, . . . , n} × {1, . . . , n}. The time complexity of this quantum algorithm is After applying the quantum algorithm of Lemma 3.1, we can obtain the matrix C 2 , similarly to the computation of C 1 , if we know all the terms ∑ rÂ r . We obtain all these uv terms by computing the following Boolean product of an nu × nt matrix by an nt × nv matrix.
The cost of this matrix multiplication isÕ n ω(1+log n u,1+log n t,1+log n v) . The total cost of computing the matrix C 2 is thusÕ which is the desired bound since the term n 2 t(u + v) is negligible here by item (iv) of Fact 2.
We can give a classical version of this result, whose proof can be found in the appendix, that will be used to prove Theorem 4.3 in Subsection 4.2.
We now consider the case u = v = 1 corresponding to the standard existence dominance product. By optimizing the choice of the parameter t in Proposition 3.1, we obtain the following theorem. Proof. The complexity of the algorithm of Proposition 3.1 is minimized for t = n µ , where µ is the solution of the equation µ + 2ω(1, 1 + µ, 1) = 1 + log n (m 1 m 2 ). We can use items (ii) and (iii) of Fact 2 to obtain the upper bound ω(1, 1 + µ, 1) ≤ ω + µ, and optimize the complexity of the algorithm by taking t = (m 1 m 2 ) 1/3 n (1−2ω)/3 , which gives the upper bound claimed in the second part of the theorem.
In the case of completely dense input matrices (i.e., m 1 ≈ n 2 and m 2 ≈ n 2 ), the second part of Theorem 3.1 shows that the complexity of the algorithm isÕ(n (5+ω)/3 ) ≤ O(n 2.458 ).

Applications: the (max, min)-Product and the Distance Product
In this section we show how to apply the results of Section 3 to construct quantum algorithms for the (max, min)-product and the distance product.

Quantum Algorithm for the (max, min)-Product
In this subsection we present a quantum algorithm for the matrix product , which immediately gives a quantum algorithm with the same complexity for the (max, min)-product as explained in Section 2, and then gives Theorem 1.1. Our algorithm first exploits the methodology by Vassilevska et al. [21] to reduce the computation of the product to the computation of several sparse dominance products. The main technical difficulty to overcome is that, unlike in the classical case, computing all the sparse dominance products successively becomes too costly (i.e., the cost exceeds the complexity of all the other parts of the quantum algorithm). Instead, we show that it is sufficient to obtain a small fraction of the entries in each dominance product and that this task reduces to the computation of a generalized existence dominance product, and then use the quantum techniques of Proposition 3.1 to obtain precisely only those entries. Proof. Let g ∈ {1, . . . , n} be a parameter to be chosen later. For each i ∈ {1, . . . , n}, we sort the entries in the i-th row of A in increasing order and divide the list into s = ⌈n/g⌉ successive parts R i 1 , . . . , R i s with at most g entries in each part. For each r ∈ {1, . . . , s}, define the n × n matrix A r as follows: The cost of this (classical) preprocessing is O(n 2 s) time. We describe below the quantum algorithm that computes C = A B.
Step 1. for any parameter t ∈ {1, . . . , n 2 }. We want to minimize this expression. Let us write t = n γ and g = n δ . For a fixed δ , the first term is a decreasing function of γ, while the second term is an increasing function of γ. The expression is thus minimized for the value of γ solution of the equation in which case the expression becomesÕ(n (5−γ)/2 ).
Step 2. Note that at Step 1 we also obtain all (i, j) ∈ {1, . . . , n} × {1, . . . , n} such that no r satisfying For all other (i, j), we will denote by r i j the value found at Step 1. We now know that and C[i, j] can be computed in timeÕ( √ g) using the quantum algorithm for maximum finding [7], since |R i r i j | ≤ g. The complexity of Step 2 is thusÕ(n 2 √ g).

Quantum Algorithm for the Distance Product
In this subsection we present a quantum algorithm that computes the most significant bits of the distance product of two matrices, as defined below. Let A and B be two n × n matrices with entries in Z ∪ {∞}. Let W be a power of two such that the value of each finite entry of their distance product C is upper bounded by W . For instance, one can take the smallest power of two larger than max where the maxima are over the finite entries of the matrices. Each non-negative finite entry of C can then be expressed using log 2 (W ) bits: the entry C[i, j] can be expressed as For any ℓ ∈ {1, . . . , log 2 (W )}, we say that an algorithm computes the ℓ most significant bits of each entry if, for all (i, j) ∈ {1, . . . , n} × {1, . . . , n} such that C[i, j] is finite and non-negative, the algorithm outputs all the bits C[i, j] 1 ,C[i, j] 2 , · · · ,C[i, j] ℓ . Vassilevska and Williams [20] have studied this problem, and shown how to reduce the computation of the ℓ most significant bits to the computation of O(2 ℓ ) existence dominance matrix products of n × n matrices. By combining this with theÕ(n (3+ω)/2 )-time algorithm for dominance product from [16], they obtained a classical algorithm that computes the ℓ most significant bits of each entry of the distance product of A and B in timeÕ 2 ℓ n (3+ω)/2 ≤Õ 2 ℓ n 2.687 .
Here is the main result of this subsection, obtained by reducing the computation of the ℓ most significant bits to computing a generalized existence dominance product. Proof. Note that the trivialÕ(n 5/2 )-time quantum algorithm can be used to compute all the bits of each entry of the distance product C of A and B. Therefore, we will assume, without loss of generality, that ℓ satisfies the inequality 2 0.640ℓ n (5+ω)/3 ≤ n 5/2 , which implies in particular that 2 ℓ ≤ n 2 .

3
. The complexity is thusÕ n 458 . Finally, we discuss the general case where the entries of C can be negative or infinite. Observe that the above algorithm detects which entries of C are larger than (2 ℓ − 1)W /2 ℓ : these are the entries such that the algorithm finds no d such that D d [i, j] = 1. We can find which of these entries are larger than W (and thus infinite) by computing the dominance product A ′ 0 * B ′ 2 ℓ . Note that the algorithm also finds which entries of C are negative: these are the entries for which the smallest d such that Similarly, we can obtain a better classical algorithm as shown in the following theorem. Proof. The proof is similar to the proof of Theorem 4.2, but we use Proposition 3.2 instead of Proposition 3.1. The complexity becomesÕ 2 ℓ n 3 t + n ω(1+log n (2 ℓ/2 ),1+log n t,1+log n (2 ℓ/2 )) for any parameter t ∈ {1, . . . , 2 ℓ/2 n 2 }. Let us write µ = log n (2 ℓ ) and t = n γ . This expression is then O n 3+µ−γ + n ω(1+µ/2,1+γ,1+µ/2) .
Note that the dependency on n of theÕ 2 ℓ n 2.687 -time algorithm by Vassilevska and Williams [20] can be slightly improved using the recent O(n 2.684 )-time algorithm for dominance product by Yuster [23] based on rectangular matrix multiplication. We can similarly obtain an improved bound O(2 cℓ n 2.684 ), for some c < 1, with the same approach as in the proof of Theorem 4.3. However, it is complicated to express the value of c in a closed form, so we omit the statement of this slight improvement.

Sparse Boolean Matrix Multiplication
In this section we describe quantum versions of several known combinatorial techniques for handling sparse Boolean matrix products. The main result is the following theorem, which shows how to compute the Boolean product of two matrices A and B by reducing it to four products, each easier to compute than the original one when A and B are sparse enough. Note that similar ideas have been used in [1,24] to analyze applications of those combinatorial techniques in the classical setting. Here we show how to implement these ideas using quantum enumeration and analyze the complexity of the resulting algorithm.
Theorem 5.1. Assume that there exists an algorithm that computes, in time M(n 1 , n 2 , n 3 , L), the product of any n 1 × n 2 Boolean matrix and any n 2 × n 3 Boolean matrix such that their product contains at most L non-zero entries. Let A and B be two n × n Boolean matrices with at most m 1 and m 2 non-zero entries in A and B, respectively. For any values of the three parameters ℓ 1 ∈ {1, . . . , m 1 } and ℓ 2 , ℓ 3 ∈ {1, . . . , m 2 }, there exists a quantum algorithm that computes, with high probability, the Boolean product A · B and has time complexitỹ where λ denotes the number of non-zero entries in A · B, and ℓ ′ i = min(ℓ i , n) for each i ∈ {1, 2, 3}. Proof. For any k ∈ {1, . . . , n}, let a R k (resp. b R k ) be the number of non-zero entries in the k-th row of A (resp. B) and a C k (resp. b C k ) be the number of non-zero entries in the k-th column of A (resp. B). We define the following six sets of indexes, and compute them classically in time O(n 2 ).
where + represents the entry-wise OR operation. We will individually compute the four terms of this sum.
The computation of A S T ·B U S consists in the computation of a |T |× |S| matrix by a |S|× |U | matrix. We implement this part using the algorithm whose existence is assumed in the statement of the theorem. Note that, from the sparsity of B, we have / ∈ Σ. The quantum procedure starts with Σ being empty, performs successive quantum searches over {1, . . . , N}, each time searching for an element x such that f Σ (x) = 1 and adding g(x) to Σ as soon as such an x is found, and stops when no new element x is found. From the discussion of Section 2, with high probability all searches succeed, in which case at the end of the procedure Σ = g({1, . . . , N}). Let λ ′ denote the number of non-zero entries in A S ′ · B S ′ and observe that λ ′ ≤ min(λ , m 1 m 2 /ℓ 2 ), since N < m ′ 1 m 2 /ℓ 2 ≤ m 1 m 2 /ℓ 2 . The overall complexity of this quantum procedure is The computation of A T ′ · B is done as follows. For each k ∈ {1, . . . , n}, let a ′ k denote the number of non-zero entries in the k-th column of A T ′ . We first perform a O(n 2 )-time classical preprocessing step: for each k ∈ {1, . . . , n}, we construct the set E k of the row indexes of all non-zero entries in the k-th column of A T ′ , and construct the set F k of the column indexes of all non-zero entries in the k-th row of B. Note that |E k | = a ′ k and |F k | = b R k . The quantum procedure computing A T ′ · B uses a set Σ ⊆ {1, . . . , n} × {1, . . . , n}, initially empty. For each k ∈ {1, . . . , n}, all the (i, j) ∈ E k × F k such that 1 and (i, j) / ∈ Σ are computed by performing a quantum enumeration, as above, over the set E k × F k , adding (i, j) to Σ as soon as such a (i, j) is found, and stopping when no new element (i, j) is found. The overall time complexity isÕ n 2 + ∑ n k=1 a ′ k b R k (λ k + 1) , where λ k is the number of elements found when processing k. Note that the inequality ∑ k a ′ k b R k < λ min(m 1 /ℓ 1 , n) holds, since ∑ k a ′ k b R k also represents the total number of witnesses of A T ′ · B, i.e., the number of triples (i, j, k) such that A T ′ [i, k] = B[k, j] = 1 (observe that there are at most λ pairs (i, j) satisfying this condition, all such that i ∈ T ′ ). Since ∑ k λ k ≤ λ ≤ n 2 , this complexity is upper bounded bỹ Computing A · B U ′ is done similarly to the computation of A T ′ · B with costÕ n 2 + λ m 2 /ℓ 3 .
We see that the second and third terms in our complexity are never worse. In order to evaluate quantitatively the speedup obtained in the quantum setting, let us consider the case when only the input matrices are sparse (i.e., λ ≈ n 2 ). In this case, the algorithm by Amossen and Pagh has the same complexity as the algorithm by Yuster and Zwick [24] described in the introduction. In comparison, Theorem 5.1 gives the following result, which shows that our quantum algorithm is better than their classical algorithm, as discussed in the introduction. Proof. First consider the case √ m 1 m 2 ≤ n. Assume for now that m 1 ≤ m 2 . We use the following strategy: we first use quantum enumeration to find all the non-zero entries of A and, then, for each such entry A[i, k], we output all the j's such that B[k, j] = 1. The complexity of this strategy isÕ( (m 1 + 1)n 2 + m 1 n) = O(m 1 n). The same argument gives the upper boundÕ(m 2 n) when m 2 ≤ m 1 .
Finally, if √ m 1 m 2 ≥ n ω−1/2 , then we simply use the best existing classical algorithm for dense matrix multiplication.
gives, for each k, a sorted list of at most nu numbers, with possible repetitions. We then divide this list need to be considered, where q