Need for Count-Min Sketch

Since Count-Min Sketch is used to find the frequency of an element, one might think if there is actually a need for such data structure! The answer is Yes. Let us see with the help of an example.

Let us try to solve this frequency estimation problem using a MultiSet Data Structure

Trying MultiSet as an alternative to Count-min sketch 

Let’s try to implement this data structure using MultiSet with the below source code and try to find out the issues with this approach. 

Java




// Java program to try MultiSet as an
// alternative to Count-min sketch
 
import com.google.common.collect.HashMultiset;
import com.google.common.collect.Multiset;
 
public class MultiSetDemo {
    public static void main(String[] args)
    {
        Multiset<String> blackListedIPs
            = HashMultiset.create();
        blackListedIPs.add("192.170.0.1");
        blackListedIPs.add("75.245.10.1");
        blackListedIPs.add("10.125.22.20");
        blackListedIPs.add("192.170.0.1");
 
        System.out.println(
            blackListedIPs.count("192.170.0.1"));
        System.out.println(
            blackListedIPs.count("10.125.22.20"));
    }
}


Output: 

Understanding the problem of using MultiSet instead of Count-Min Sketch

Now let’s look at the time and space consumed with this type of approach. 

Number of UUIDs

Insertion Time(ms)

10

<25

100

<25

1,000

30

10,000

257

100,000

1200

1,000,000

4244

Let’s have a look at the memory (space) consumed: 

Number of UUIDs

JVM heap used(MB)

10

<2

100

<2

1,000

3

10,000

9

100,000

39

1,000,000

234

We can easily understand that as data grows, the above approach is consuming a lot of memory and time to process the data.

This can be optimised if we use the Count-Min Sketch.

Count-Min Sketch Data Structure with Implementation

The Count-Min Sketch is a probabilistic data structure and is defined as a simple technique to summarize large amounts of frequency data. Count-min sketch algorithm talks about keeping track of the count of things. i.e, How many times an element is present in the set.

Similar Reads

What is Count-Min Sketch?

Count-min sketch approach was proposed by Graham Cormode and S. Muthukrishnan. in the paper approximating data with the count-min sketch published in 2011/12. Count-min sketch is used to count the frequency of the events on the streaming data. Like the Bloom filter, Count-min sketch algorithm also works with hash codes. It uses multiple hash functions to map these frequencies on to the matrix (Consider sketch here a two dimensional array or matrix)....

Need for Count-Min Sketch

Since Count-Min Sketch is used to find the frequency of an element, one might think if there is actually a need for such data structure! The answer is Yes. Let us see with the help of an example....

How does Count-Min Sketch work?

...

Implementation of Count-min sketch using Guava library in Java:

Let’s look at the below example step by step....

Time and Space Complexity of Count-Min Sketch Data Structure

We can implement the Count-min sketch using Java library provided by Guava. Below is the step by step implementation:...

Applications of Count-min sketch:

...

Issue with Count-min sketch and its solution:

...

Conclusion:

Now let’s look at the time and space consumed with this type of approach (wrt to above Java-Guava Implementation)...

Contact Us