Computing in matrix groups without memory

Memoryless computation is a novel means of computing any function of a set of registers by updating one register at a time while using no memory. We aim to emulate how computations are performed on modern cores, since they typically involve updates of single registers. The computation model of memoryless computation can be fully expressed in terms of transformation semigroups, or in the case of bijective functions, permutation groups. In this paper, we view registers as elements of a finite field and we compute linear permutations without memory. We first determine the maximum complexity of a linear function when only linear instructions are allowed. We also determine which linear functions are hardest to compute when the field in question is the binary field and the number of registers is even. Secondly, we investigate some matrix groups, thus showing that the special linear group is internally computable but not fast. Thirdly, we determine the smallest set of instructions required to generate the special and general linear groups. These results are important for memoryless computation, for they show that linear functions can be computed very fast or that very few instructions are needed to compute any linear function. They thus indicate new advantages of using memoryless computation.


Memoryless computation
Typically, swapping the contents of two variables x and y requires a buffer t, and proceeds as follows (using pseudo-code): However, the famous XOR swap (when x and y are sequences of bits), which we view in general as addition over a vector space: performs the swap without any use of memory.
While the example described above is folklore in Computer Science, the idea to compute functions without memory was developed in [1,2,3,4,5,6] and then independently rediscovered and developed in [7]. Amongst the results derived in the literature is the non-trivial fact that any function can be computed using memoryless computation. Moreover, only a number of updates linear in the number of registers is needed: any function of n variables can be computed in at most 4n − 3 updates (a result proved for the Boolean alphabet in [3], then independently extended to any alphabet in [7] and [8]), which reduces to 2n − 1 if the function is bijective. Memoryless computation has the potential to speed up computations not only by avoiding time-consuming communication with the memory but also by effectively combining the values contained in registers. This indicates that memoryless computation can be viewed as an analogue in computing to network coding [9,10], an alternative to routing on networks. It is then shown in [7] that for certain manipulations of registers, memoryless computation uses arbitrarily fewer updates than traditional, "black-box" computing.

Model for computing in matrix groups without memory
In this paper, we are interested in computing linear bijective functions without memory. Some results already appear in [7] about these functions. For instance, any linear function can be computed in at most 2n − 1 updates; in this paper, we lower that upper bound to ⌊3n/2⌋, which is tight. The number of updates required to compute any manipulation of variables is also determined in [7,Theorem 4.7].
Foremost, let us recall some notations and results from [7]. Let A := GF(q) be a finite field (the alphabet) and n ≥ 2 be an integer representing the number of registers (also called variables) x 1 , . . . , x n . We denote [n] = {1, 2, . . . , n}. The elements of A n are referred to as states, and any state a ∈ A n is expressed as a = (a 1 , . . . , a n ). For any 1 ≤ k ≤ n, the k-th unit state is given by e k = (0, . . . , 0, 1, 0, . . . , 0) where the 1 appears in coordinate k. We also denote the all-zero state as e 0 .
For any f ∈ Sym(A n ), we denote its n coordinate functions as f 1 , . . . , f n : A n → A, i.e. f (x) = (f 1 (x), . . . , f n (x)) for all x = (x 1 , . . . , x n ) ∈ A n . We say that the i-th coordinate function is trivial if it coincides with that of the identity: f i (x) = x i ; it is nontrivial otherwise.
A bijective instruction is a permutation g of A n with one nontrivial coordinate function: for some 1 ≤ j ≤ n. We say the instruction g updates the j-th coordinate. We can represent this instruction as y j ← g j (y) where y = (y 1 , . . . , y n ) ∈ A n represents the contents of the registers. A program computing f is simply a sequence of instructions whose combination is f ; the instructions are typically denoted one after the other. With this notation, the swap of two variables can be viewed as computing the permutation f of A 2 defined as f (x 1 , x 2 ) = (x 2 , x 1 ), and the program is given by In this paper, we want to compute a linear transformation f : A n → A n , i.e.
f (x) = xM ⊤ for some matrix M ∈ A n×n . We denote the rows of M as f i . We restrict ourselves to linear instructions only, i.e. instructions of the form In particular, the instruction above is a permutation if and only if v i = 0. Note that computing f without memory is then equivalent to computing M by starting from the identity matrix and updating one row at a time.
The set M(GF(q) n ) of bijective linear instructions then corresponds to the set of nonsingular matrices with at most one nontrivial row: Following [11], we say a group is internally computable if it can be generated by its instructions, i.e. if any element of the group can be computed by a program using instructions from G. For instance, Gaussian elimination proves that GL(n, q) is internally computable. We prove in Proposition 2 that SL(n, q) is also internally computable. For any internally computable group G, two main problems arise. First, we want to know how fast we can compute any element of G: we will prove that the maximum complexity in the general linear group is ⌊3n/2⌋ instructions in Theorem 1. More surprisingly, if q = 2 and n is even, then the matrices requiring 3n/2 instructions are fully characterised in Proposition 1. Note that the average complexity over all elements of a group is also interesting; for GL(n, q), this quantity tends to n instructions when q is large [7].
Secondly, due to the large number of possible instructions, it seems preferable to work with restricted sets of instructions which could be efficiently used by a processor. Therefore, we also want to know the minimum number of instructions required to generate the whole group. We shall determine this for the special and general linear groups in Theorems 2 and 3, respectively. The fact that it is equal to n in most cases-and n + 1 otherwise-shows how easy it is to compute linear functions without memory and how little space would be required to store those minimal sets of instructions.
For any internally computable group G and any g ∈ G, we denote the shortest length of a program computing g using only instructions from G as L(g, G); we refer to this quantity as the complexity of g in G. If H ≤ G and L(h, H) = L(h, G) for all h ∈ H, we say that H is fast in G. It is still unknown whether GL(n, q) is fast in Sym(GF(q) n ), i.e. if we cannot compute linear functions any faster by allowing non-linear instructions. However, we will prove in Proposition 2 that the special linear group is not fast in the general linear group (unless q = 2).
We would like to emphasize that we only consider bijective linear functions, i.e. computing in matrix groups. The case of any bijective function is studied in [11], where analogue results are derived for the symmetric and alternating groups of A n (A being any finite set of cardinality at least 2).
The rest of the paper is organised as follows. In Section 2, we determine the maximum complexity of any matrix in GL(n, q) and investigate which matrices have highest complexity. Then, in Section 3, we determine whether some matrix groups are internally computable, and we show that SL(n, q) is internally computable but not fast in GL(n, q). Finally, in Section 4, we determine the minimum size of a generating set of instructions for both the special and general linear groups.
2 Maximum complexity in the general linear group Theorem 1. Any matrix in GF(q) n×n can be computed in at most ⌊3n/2⌋ linear instructions. This bound is tight and reached for some matrices in GL(n, q).
Proof. We consider the general case where the matrix M we want to compute is not necessarily invertible. We prove the statement by strong induction on n ≥ 1; it is clear for n = 1. Suppose it holds for up to n − 1.
For any S ⊂ [n], we refer to the matrix M S ∈ GF(q) |S|×|S| with entries M (i, j) for all i, j ∈ S as the S-principal of M . Suppose that M has a nonsingular S-principal M S , say We give a program for M in two main steps and no more than ⌊3n/2⌋ instructions. The first step computes (M S |N ). By hypothesis, M S can be computed in ⌊3k/2⌋ instructions. We can easily convert that program in order to compute the matrix (M S |N ) as follows. Consider the final update of row j: y j ← f j (i.e., the j-th row must be equal to that of M after its last update). The j-th row of N , say n j is a linear combination of the rows of (0|I n−k ), hence simply replace y j ← f j by y j ← f j + n j and in any subsequent instruction, replace every occurrence of y j by y j − n j .
The second step computes (P |Q). Note that the rows p 1 , . . . , p n−k of P can be expressed as linear combinations of those of M S : P = RM S where the rows of R = P M −1 S ∈ GF(q) n−k×k are denoted r 1 , . . . , r n−k . By hypothesis, the matrix X := Q − RN (with rows x 1 , . . . , x n−k ) can be computed in ⌊3(n − k)/2⌋ instructions. Again this can be converted to compute (P |Q) as follows. Suppose i is the first row to have its last update in a program computing X, say it is y i ← n−k l=1 a i,l y l . Then the new program for (P |Q) is Then replace every future occurrence of y i with y k+i − k l=1 r i,l y l . Suppose that i ′ is the next row to have its last update y i ′ ← n−k l=1 a i ′ ,l y l ; this is converted to Again, every future occurrence of i ′ will be replaced with y k+i ′ − k l=1 r i ′ ,l y l , and so on. By induction, we can then easily prove that this converted program computes (P |Q). Now suppose M does not have any invertible principal. Let D be the directed graph whose If D is acyclic, then M can be computed in n instructions, for it is (up to renaming the vertices in topological order) an upper triangular matrix with zeros on the diagonal. Otherwise, D has girth n, for otherwise the adjacency matrix of the subgraph induced by a shortest cycle forms a nonsingular principal. Therefore D is a cycle, and M can be computed in n + 1 instructions by [7,Proposition 4.6].
The tightness of the bound follows from [7, Corollary 2].
By the proof of Theorem 1, we see that the only matrices in GL(2, q) which are a product of three instructions are exactly those whose support is the permutation matrix of a transposition. Proposition 1 below extends this result to any even order when the matrices are over GF (2). Proposition 1. In GL(2m, 2), the only matrices which are the product of no fewer than 3m instructions are the permutation matrices of fix-point free involutions.
Proof. We prove it by strong induction on m; it is clear for m = 1 and checked by computer for m = 2, therefore we assume m ≥ 3 and that it holds for up to m − 1. For any k ≥ 1, we denote the permutation matrix of (1, 2) · · · (2k − 1, 2k) as J k . We say that two matrices M and N are equivalent if M = ΠN Π −1 for some permutation matrix Π.
Let M ∈ GL(2m, 2) be a matrix at distance 3m from the identity which is not equivalent to J m . According to the proof of Theorem 1, the graph D with adjacency matrix M must contain a directed cycle of length < 2m. The graph D has girth 2, for otherwise there is a principal of size other than 2 and hence M can be computed in fewer than 3m instructions by using the two-step algorithm in the proof of Theorem 1. More generally, any invertible principal of M must have even size and be a conjugate of J k for some k.
Hence we can express M (up to equivalence) as M = . By the same argument, we can first compute J 1 and then the matrix Q + P J 1 N , hence these matrices must satisfy (up to equivalence) The conditions above mean that this principal is not invertible, neither is any of its T -principals for |T | = 3, and it can be expressed as where α = bf + de + 1 and β = ah + cg + 1. However, it can be verified that no such matrix exists.
We remark that the situation for GL(2m + 1, 2) is much more complicated. Indeed, the permutation matrices of (1, 2)(3, 4) · · · (2m − 1, 2m, 2m + 1) and its conjugates are still extremal, but many other matrices are also extremal. For example by Theorem 2 we know that the diameter of the Cayley graph for GL(3, 2) is 4 and clearly there are only two extremal permutation matrices in GL(3, 2) however there are 35 matrices equal to the product of 4 and no fewer linear instructions in this group -see Table 1.

Some matrix groups
We first discuss the special linear groups. Recall that a transvection is any permutation for all x ∈ GF(q) n , where v, φ ∈ GF(q) n , [12]. Then t φ,v is an instruction if and only if φ (viewed as a column vector) has only one nonzero coordinate. In other words, any transvection which is an instruction is represented by a shear matrix S(i, e i + ae j ) for some i, j and a ∈ GF(q).
Proposition 2. (i) The group SL(n, q) is internally computable for any n and prime power q.
(ii) If q = 2 then SL(n, q) is not fast in GL(n, q).  Proof. (i) This is simply the observation that any transvection is a product of instructions and the transvections are well known to generate the special linear group -see for instance [13, p.45].
(ii) We prove this in the case n = 2, the extension to the general case being clear. If q = 2 then there exists an element α ∈ GF(q) such that α = 0, 1. Inside GL(2, q) we thus have which expresses the above element of SL(2, q) as a product of two instructions. Inside SL(2, q) however we have that for any x, y ∈ GF(q). Since α = 1 the original matrix cannot be of this form and thus cannot be expressed as a product of just two instructions inside SL(2, q).
The argument in the proof of (ii) can be easily generalised to show that any subgroup of GL defined as the set of matrices with determinant in a proper subgroup of the multiplicative group of GF(q) is not fast.
We remark that if q = 2 then SL(n, q) = GL(n, q). Unfortunately most other groups that are naturally matrix groups are not internally computable in their natural GF(q) modules. For an instruction to be of the above form one of B or C must be the all zeros matrix and A = D = I. If C = 0, we see that B must be a matrix with only one nonzero entry, which lies on the diagonal; if B = 0, we obtain its transpose. Therefore, the symplectic instructions generate a group of matrices where A,B, C and D are all diagonal; this is clearly a proper subgroup of Sp(2n, q).
Furthermore analogous arguments apply to 3 D 4 (q) and G 2 (q) acting on their natural 26 and 8 dimensional GF(q) modules respectively. An instruction whose only non-zero off-diagonal entries are contained entirely on the bottom row must be contained in the subgroup of lower triangular matrices. The non-trivial elements of this subgroup, however, are of the form where α ∈ GF(2 2r+1 ) and β = α 2 r+1 −1 [13, p.115]. Clearly this subgroup contains no instructions and so the subgroup of 2 B 2 (2 2r+1 ) generated by any instructions is a proper subgroup.

Generating linear groups
The purpose of this section is to determine the minimum number of instructions sufficient to generate some matrix groups. The reader is reminded of the elements S(i, v) that we defined just before Theorem 1. We also define the vectors v i ∈ GF(q) n such that v i = e i + e i+1 for i ≤ n − 1 and v n = e 1 + e n . We first consider the special linear group.
Theorem 2. The group SL(n, q) is generated by n instructions unless n = 2, q = 2 m (m ≥ 2), where it is generated by 3 instructions.
Proof. The rest of the proof goes by induction on n, but we split the proof according to the parity of q. First, suppose q is odd. An immediate consequence of a classical Theorem incorrectly attributed to Dickson [14] (it was actually proved by Wilman and Moore, see [15,Corollary 2.2]) tells us that the maximal subgroups of PSL(2, q), q odd (these can easily be seen to "lift" to maximal subgroups of SL(2, q)) are all isomorphic to one of -Alt(4), Sym(4) or Alt (5); -A dihedral group of order either q + 1 or q − 1; -A subfield subgroup; -A stabiliser of a one dimensional subspace in the action on the q + 1 subspaces of GF(q) 2 on which (P)SL(2, q) naturally acts.

Consider the matrices/instructions
We prove that the group they generate does not belong to any of the maximal subgroups. First, the copies of Alt(4), Sym(4) and Alt(5). In characteristic 3, the only way two elements of order 3 can be contained in a copy of Alt(4) or Sym(4) is if their product has order 1 or 3 (in which case they're contained in the same cyclic subgroup, which the above two matrices clearly are not) or 2 (and by direct calculation our two matrices do not have a product of order 2). Finally we can eliminate Alt(5) since this subgroup can only exist in characteristic 3 if q = 3 or 9 which are easily eliminated by computer. In characteristic 5 there are no elements order 5 in Alt(4) and Sym(4) and for Alt(5) this maximal subgroup only exists when q satisfies certain congruences that a power of 5 never satisfies. For characteristic greater than p > 5 there are clearly no elements of order p in any of Alt(4), Sym(4) or Alt(5).
Since p is coprime to both q + 1 and q − 1 neither of these belong to a maximal dihedral subgroup. The only one dimensional subspace fixed by the first matrix is spanned by the (column) vector (1, 0) ⊤ whilst the second only fixes the subspace spanned by the (column) vector (0, 1) ⊤ , so no one dimensional subspace is fixed by the subgroup these generate. Recall that the product of these two matrices is the matrix 1 + xy x y 1 which has trace 2 + xy. Choosing x and y so that 2 + xy is contained in no proper subfield of GF(q) now gives a pair of elements that cannot generate a subfield subgroup. It follows that this pair must generate the whole group.
We now prove the inductive step. Let x, y ∈ GF(q) such that the instructions 1 x 0 1 , 1 0 y 1 generate SL(2, q). Then we claim that the following set of n instructions generates SL(n, q): Let us remark that we can easily generate any instruction of the form S(i, e i + e j ) for 1 ≤ i < j ≤ n − 1 (and hence any of the form S(i, e i − e j ) as well). We can then easily generate S(i, e i + xe n ) for any 1 ≤ i ≤ n − 1. We also generate any transvection of the form S(n, e n + ye i ) for any 1 ≤ i ≤ n − 1 as such: S(n, e n + ye i ) = S(n, e n − ye 1 )S(1, e 1 − e i )S(n, e n + ye 1 )S(1, e 1 + e i ).
Displaying only the columns and rows indexed 1, i, n, the equation above reads By combining the two types of transvections, we obtain all possible transvections of the type S(i, e i + ae n ) or S(n, e n + ae i ) for all a ∈ GF(q). We are done with the last coordinate, and we tackle the penultimate coordinate by considering Note that Q is indeed generated by S(n − 1, e n−1 + xe n ) and S(n, e n + ye n−1 ). We then obtain the two required types of transvections: The proof goes on from n − 1 down to 2, thus generating any possible transvection. Now suppose q is even. Any instruction in SL(2, 2 m ) is an element of order two, and hence any group generated by two instructions is dihedral. However, SL(2, 2 m ) is not a dihedral group for m ≥ 2 and hence cannot be generated by two instructions. We now prove it can be generated by three instructions. We recall from Dickson's theorem [14] that the maximal subgroups of SL(2, 2 m ) are each isomorphic to either • a stabiliser of a one dimensional subspace in the action on the 2 m +1 subspaces of GF(2 m ) 2 on which (P)SL(2, 2 m ) naturally acts; • a subfield subgroup; • a dihedral group of order 2(2 m ± 1).

Consider the matrices
where x ∈ GF(2 m ) is contained in no proper subfield. Let H be the subgroup generated by the matrices A and B. By the same arguments as the case SL(2, q) with q odd we know that H is contained in neither a subspace stabilizer nor a subfield subgroup and so the only maximal subgroups containing H must be dihedral of order 2(q ± 1). Note that since these are dihedral groups of twice odd order these subgroups cannot contain pairs of involutions that commute.
Since BC = CB it follows that C cannot be contained in any of these dihedral subgroups and so no maximal subgroup contains all of A, B and C, hence they must generate the whole group.
The base case of the induction thus occurs for n = 3. Let x such that 1 0 we can proceed as above to obtain S(2, (0, 1, x 5 )). We may repeat this process until we derive S(2, (0, 1, x 2m+1 )) = S(2, (0, 1, x 2 )), which together with M 2 and generate SL(2, 2 m ) acting on the last two coordinates. It is then easy to show that any transvection of the form S(1, e 1 + ae i ) or S(i, e i + ae 1 ) for any i = 2, 3 and any a ∈ GF(2 m ) can be generated. Thus, the whole special linear group is generated.
We now prove the inductive step. More specifically, we show that SL(n, q) is generated by the following set of instructions: 1, e n−1 + xe n ), S(n, e n + xe 1 )}.
Again, we can easily generate S(1, e 1 + xe n ) and hence SL(3, 2 m ) acting on the coordinates 1, n − 1, and n. In particular, S(n − 1, e n−1 + xe 1 ) is generated and by induction hypothesis we obtain SL(n − 1, 2 m ) acting on the first n − 1 coordinates. Finally, any transvection of the form S(n, e n + ae i ) or S(i, e i + ae n ) for any i ≤ n − 1 and any a ∈ GF(2 m ) can be easily generated. Thus, the whole special linear group is generated.
We now turn to the general linear group.
Theorem 3. The group GL(n, q) is generated by n instructions for any n and any prime power q.
Proof. The proof is split into two parts, depending on the parity of q; the even part goes by induction on n. If q is even, we prove that GL(n, 2 m ) is generated by the n instructions for any primitive element α. Since det(S(1, αe 1 + e n )) = α, we only need to generate the special linear group. For n = 2, denote M i = S(i, (α, 1)) for i = 1, 2. Then we can generate the transposition matrix as follows: (1, α))M 2 1 . Any transvection S(1, (1, α k )) can then be expressed as and any other transvection is obtained by conjugating by P .
We then have the complete set of generators for GL(n − 1, q) acting on coordinates 1 to n − 1. It is then easy to prove that any transvection of the form S(i, e i + ae n ) and S(n, e n + ae i ) for any 1 ≤ i ≤ n − 1 and any a ∈ GF(q) can be generated.
If q is odd and n = 2, consider the matrices A := 1 1 0 1 , B := 1 0 1 x where x ∈ GF(q) is not contained in any proper subfield. Arguments analogous to those used in the SL(2, q) case show that A, A B = SL(2, q).
If n > 2, we rely on the proof of Theorem 2 for q odd. We know that there exist x, y ∈ GF(q) such that SL(n, q) is generated by {S(i, v i ) : 1 ≤ i ≤ n − 2} ∪ {S(n − 1, e n−1 + xe n ), S(n, e n + ye 1 )}.
Only displaying rows and columns indexed 1, n − 1, n the equation above reads We conclude this section by noticing that Theorems 2 and 3 have implication on some classical semigroups of matrices. Denote the semigroup of singular matrices in GF(q) n×n as Sing(n, q) and consider the general linear semigroup (also called full linear monoid [16]) and special linear semigroup: GLS(n, q) = GL(n, q) ∪ Sing(n, q), SLS(n, q) = SL(n, q) ∪ Sing(n, q).
Note that Sing(n, q) is not an internally computable semigroup. Indeed, the kernel of any singular instruction matrix only contains vectors with Hamming weight equal to zero or one. Thus any matrix whose kernel forms a code with minimum distance at least two cannot be computed by a program only consisting of singular instructions. For instance, the square allones matrix of any order over any finite field cannot be computed in that fashion.
However, according to Theorems 6.3 and 6.4 in [17], any generating set of GL(n, q) (SL(n, q) respectively) appended with any matrix of rank n−1 in Sing(n, q) generates GLS(n, q) (SLS(n, q) respectively). Since any singular instruction has rank n − 1, we conclude that these semigroups are internally computable, and in particular GLS(n, q) is generated by n + 1 instructions, while SLS(n, q) is generated by n + 1 instructions unless q = 2 m and n = 2, where it is generated by four instructions.