​Numerical:-

MovieLens Data

USER_ID            MOVIE_ID            RATING            TIMESTAMP

196                              242                                  3                                    881250949

186                              302                                  3                                    891717742

196                              377                                  1                                   878887116

244                              51                                     2                                   880606923

166                              346                                  1                                   886397596

186                              474                                  4                                   884182806

186                              265                                  2                                   881171488

Solution : –

Step 1 – First we have to map the values , it is happen in 1st phase of Map Reduce model.

196:242   ;  186:302   ;  196:377   ;  244:51   ;  166:346   ;  186:274   ;  186:265

Step 2 –  After Mapping we have to shuffle and sort the values.

166:346   ;  186:302,274,265   ;  196:242,377   ;  244:51  

Step 3 –  After completion of step1 and step2 we have to reduce each key’s values.

Now, put all values together

Solution

 CODE FOR MAPPER AND REDUCER TOGETHER:

Python3




from mrjob.job import MRJob
from mrjob.step import MRStep
 
 
class RatingsBreak(MRJob):
    def steps(self):
        return [
            MRstep(mapper=self.mapper_get_ratings,
                   reducer=self.reducer_count_ratings)
        ]
        # MAPPER CODE
 
    def mapper_get_ratings(self, _, line):
        (User_id, Movie_id, Rating, Timestamp) = line.split('/t')
        yield rating,
        # REDUCER CODE
 
    def reducer_count_ratings(self, key, values):
        yield key, sum(values)




Map Reduce and its Phases with numerical example.

Similar Reads

Map Reduce :-

It is a framework in which we can write applications to run huge amount of data in parallel and in large cluster of commodity hardware in a reliable manner....

​Numerical:-

MovieLens Data...

Contact Us