Issue with Count-min sketch and its solution

What if one or more elements got the same hash values and then they all incremented. So, in that case, the value would have been increased because of the hash collision. Thus sometimes (in very rare cases) Count-min sketch overcounts the frequencies because of the hash functions.

So the more hash function we take there will be less collision. The fewer hash functions we take there will be a high probability of collision. Hence it always recommended taking more number of hash functions. 

Count-Min Sketch Data Structure with Implementation

The Count-Min Sketch is a probabilistic data structure and is defined as a simple technique to summarize large amounts of frequency data. Count-min sketch algorithm talks about keeping track of the count of things. i.e, How many times an element is present in the set.

Similar Reads

What is Count-Min Sketch?

Count-min sketch approach was proposed by Graham Cormode and S. Muthukrishnan. in the paper approximating data with the count-min sketch published in 2011/12. Count-min sketch is used to count the frequency of the events on the streaming data. Like the Bloom filter, Count-min sketch algorithm also works with hash codes. It uses multiple hash functions to map these frequencies on to the matrix (Consider sketch here a two dimensional array or matrix)....

Need for Count-Min Sketch

Since Count-Min Sketch is used to find the frequency of an element, one might think if there is actually a need for such data structure! The answer is Yes. Let us see with the help of an example....

How does Count-Min Sketch work?

...

Implementation of Count-min sketch using Guava library in Java:

Let’s look at the below example step by step....

Time and Space Complexity of Count-Min Sketch Data Structure

We can implement the Count-min sketch using Java library provided by Guava. Below is the step by step implementation:...

Applications of Count-min sketch:

...

Issue with Count-min sketch and its solution:

...

Conclusion:

Now let’s look at the time and space consumed with this type of approach (wrt to above Java-Guava Implementation)...

Contact Us