Chicago Journal of Theoretical Computer Science

CJTCS Special Issue: CATS 2009

Selected papers from

2009 Computing: The Australasian Theory Symposium

held in Wellington, New Zealand January 20-23 2009.

Guest Editors: Rod Downey, Victoria University of Wellington, NZ

Prabhu Manyem, Shanghai University, PRC.

Article 2010-12

Published by Dept. CS U. Chicago. Copyright 2010 CJTCS and the author.

Distributing Frequency-Dependent Data Stream Computations

Sumit Ganguly
Indian Institute of Technology, Kanpur,
June 22, 2010

The data streaming model of computation processes a sequence of continuously arriving data in a single-pass over the input using sub-linear space. For efficiency purposes, it is often desirable to perform the computations in a highly distributed fashion, as are currently done in internet applications and sensor networks. The distributed computation is typically performed using a tree-topology, where the nodes are processing elements and the data resides in the leaves of the tree. A large class of interesting and practical functions are symmetric functions (i.e., invariant of the permutation of data at the leaves) of the inputs or their approximations. Flexible distributed processing refers to the class of distributed tree computations whose output remains within a pre-specified tolerance limit regardless of the permutation of the data on the leaves or the topology of the computation tree.

We consider the question: Do efficient flexible distributed computations exist corresponding to data streaming algorithms for computing the same function/approximation? The work by Jan Feldman, S. Muthukrishnan, Anastasios Sidiropoulos, Cli Stein, and Zoya Svitkina. On distributing symmetric streaming computa- tions (SODA 2008) shows that corresponding to any deterministic algorithm that computes a total symmetric function of an input stream, there exists a flexible distributed algorithm for computing the same function of the concatenation of the input streams at the leaves of the tree from left to right. Further, they present resource bounds of the distributed algorithm in terms of the resource bounds of the streaming algorithm. In this paper, we show the existence of efficient flexible distributed algorithms corresponding to a given data stream algorithm that computes an approximation to the function of the frequency vector of the stream. As opposed to the work of Feldman et al., the data stream algorithm need not compute a symmetric function of the input stream.

DOI: 10.4086/cjtcs.2010.012

[] Special Issue, CATS 2009, Article 8 [next] 2010 Article 13
[back] Special Issue [next] Published Articles
[CJCTS home]