NLP | Storing Conditional Frequency Distribution in Redis

The nltk.probability.ConditionalFreqDist class is a container for FreqDist instances, with one FreqDist per condition. It is used to count frequencies that are dependent on another condition, such as another word or a class label. It is being used here to create an API-compatible class on top of Redis using the RedisHashFreqDist .
In the code given below, a RedisConditionalHashFreqDist class that extends nltk.probability.ConditionalFreqDist and overrides the __getitem__() method. Override __getitem__() so as to create an instance of RedisHashFreqDist instead of a FreqDist.

Code :




from nltk.probability import ConditionalFreqDist
from rediscollections import encode_key
  
class RedisConditionalHashFreqDist(ConditionalFreqDist):
    def __init__(self, r, name, cond_samples = None):
        self._r = r
        self._name = name
        ConditionalFreqDist.__init__(self, cond_samples)
          
        for key in self._r.keys(encode_key('% s:*' % name)):
            condition = key.split(':')[1]
            # calls self.__getitem__(condition)
            self[condition] 
              
    def __getitem__(self, condition):
        if condition not in self._fdists:
            key = '% s:% s' % (self._name, condition)
            val = RedisHashFreqDist(self._r, key)
            super(RedisConditionalHashFreqDist, self).__setitem__(
                    condition, val)
        return super(
                RedisConditionalHashFreqDist, self).__getitem__(condition)
    def clear(self):
        for fdist in self.values():
            fdist.clear()


An instance of this class can be created by passing in a Redis connection and a base name. After that, it works just like a ConditionalFreqDist as shown in the code below :
Code :




from redis import Redis
from redisprob import RedisConditionalHashFreqDist
  
r = Redis()
rchfd = RedisConditionalHashFreqDist(r, 'condhash')
  
print (rchfd.N())
  
print (rchfd.conditions())
  
rchfd['cond1']['foo'] += 1
  
print (rchfd.N())
  
print (rchfd['cond1']['foo'])
  
print (rchfd.conditions())
  
rchfd.clear()



Output :

0
[]
1
1
['cond1']

How it works ?

  • The RedisConditionalHashFreqDist uses name prefixes to reference RedisHashFreqDist instances.
  • The name passed into the RedisConditionalHashFreqDist is a base name that is combined with each condition to create a unique name for each RedisHashFreqDist.
  • For example, if the base name of the RedisConditionalHashFreqDist is ‘condhash’, and the condition is ‘cond1’, then the final name for the RedisHashFreqDist is ‘condhash:cond1’.
  • This naming pattern is used at initialization to find all the existing hash maps using the keys command.
  • By searching for all keys matching ‘condhash:*’, user can identify all the existing conditions and create an instance of RedisHashFreqDist for each.
  • Combining strings with colons is a common naming convention for Redis keys as a way to define namespaces.
  • Each RedisConditionalHashFreqDist instance defines a single namespace of hash maps.

RedisConditionalHashFreqDist also defines a clear() method. This is a helper method that calls clear() on all the internal RedisHashFreqDist instances. The clear() method is not defined in ConditionalFreqDist.



Contact Us