Published by the Department of Computer Science, The University of Chicago.
Themis Gouleakis
Max-Planck Institute for Informatics
Saarbrücken, Germany
tgouleat AT mpi-inf DOT mpg DOT de
John Peebles
CSAIL
MIT
Cambridge, Massachussetts
jpeebles AT mit DOT toc DOT edu
and
Eric Price
Department of Computer Science
University of Texas at Austin
Austin, Texas
ecprice AT cs DOT utexas DOT edu
We study the fundamental problems of (i) uniformity testing of a discrete distribution, and (ii) closeness testing between two discrete distributions with bounded $\ell_2$-norm. These problems have been extensively studied in distribution testing and sample-optimal estimators are known for them [17, 7, 19, 11]. In this work, we show that the original collision-based testers proposed for these problems [14,3] are sample-optimal, up to constant factors. Previous analyses showed sample complexity upper bounds for these testers that are optimal as a function of the domain size $n$, but suboptimal by polynomial factors in the error parameter $\epsilon$. Our main contribution is a new tight analysis establishing that these collision-based testers are information-theoretically optimal, up to constant factors, both in the dependence on $n$ and in the dependence on $\epsilon$.
Submitted December 24, 2016, revised October 11, 2016, August 29, 2018, and April 24, 2019,
published May 6, 2019.