Numerical:-
MovieLens Data
USER_ID MOVIE_ID RATING TIMESTAMP
196 242 3 881250949
186 302 3 891717742
196 377 1 878887116
244 51 2 880606923
166 346 1 886397596
186 474 4 884182806
186 265 2 881171488
Solution : –
Step 1 – First we have to map the values , it is happen in 1st phase of Map Reduce model.
196:242 ; 186:302 ; 196:377 ; 244:51 ; 166:346 ; 186:274 ; 186:265
Step 2 – After Mapping we have to shuffle and sort the values.
166:346 ; 186:302,274,265 ; 196:242,377 ; 244:51
Step 3 – After completion of step1 and step2 we have to reduce each key’s values.
Now, put all values together
CODE FOR MAPPER AND REDUCER TOGETHER:
Python3
from mrjob.job import MRJob from mrjob.step import MRStep class RatingsBreak(MRJob): def steps( self ): return [ MRstep(mapper = self .mapper_get_ratings, reducer = self .reducer_count_ratings) ] # MAPPER CODE def mapper_get_ratings( self , _, line): (User_id, Movie_id, Rating, Timestamp) = line.split( '/t' ) yield rating, # REDUCER CODE def reducer_count_ratings( self , key, values): yield key, sum (values) |
Contact Us