Need for Count-Min Sketch

Since Count-Min Sketch is used to find the frequency of an element, one might think if there is actually a need for such data structure! The answer is Yes. Let us see with the help of an example.

Let us try to solve this frequency estimation problem using a MultiSet Data Structure

Trying MultiSet as an alternative to Count-min sketch

Let’s try to implement this data structure using MultiSet with the below source code and try to find out the issues with this approach.

Java

// Java program to try MultiSet as an
// alternative to Count-min sketch
 
import com.google.common.collect.HashMultiset;
import com.google.common.collect.Multiset;
 
public class MultiSetDemo {
    public static void main(String[] args)
    {
        Multiset<String> blackListedIPs
            = HashMultiset.create();
        blackListedIPs.add("192.170.0.1");
        blackListedIPs.add("75.245.10.1");
        blackListedIPs.add("10.125.22.20");
        blackListedIPs.add("192.170.0.1");
 
        System.out.println(
            blackListedIPs.count("192.170.0.1"));
        System.out.println(
            blackListedIPs.count("10.125.22.20"));
    }
}

Output:

Understanding the problem of using MultiSet instead of Count-Min Sketch

Now let’s look at the time and space consumed with this type of approach.

Number of UUIDs	Insertion Time(ms)
10	<25
100	<25
1,000	30
10,000	257
100,000	1200
1,000,000	4244

Let’s have a look at the memory (space) consumed:

Number of UUIDs	JVM heap used(MB)
10	<2
100	<2
1,000	3
10,000	9
100,000	39
1,000,000	234

We can easily understand that as data grows, the above approach is consuming a lot of memory and time to process the data.

This can be optimised if we use the Count-Min Sketch.

Count-Min Sketch Data Structure with Implementation

The Count-Min Sketch is a probabilistic data structure and is defined as a simple technique to summarize large amounts of frequency data. Count-min sketch algorithm talks about keeping track of the count of things. i.e, How many times an element is present in the set.

Need for Count-Min Sketch

Trying MultiSet as an alternative to Count-min sketch

Java

Understanding the problem of using MultiSet instead of Count-Min Sketch

Count-Min Sketch Data Structure with Implementation

Similar Reads

Contact Us