MapReduce

1. MapReduce fundamental concepts

1
20200103-142610
1578032937052
1.1 Mapper
1578033123727
20200103-143446
1578033212508
1578033238182

Mapper: Extract and organize what we care about.

1.2 Shuffle and Sort
1578033463232
20200103-144645
1.3 Reducer
1578034121206
20200103-144953
1578034303323

2. How MapReduce distributes processing

1578034389919
0002

3. MapReduce: a real example

1578035985411
1578036687341

Sometimes, it's not easy to try to force a problem into this way of thinking, and that's a big reason why other frameworks like Spark or Hive, or other ways of processing SQL style queries have become a little bit more popular that just writing raw MapReduce code.

But, still, if you can easily express something in terms of mapping and reducing, this can sometimes be the most efficient way of doing it.

1578037510439

Then, the results all get passed into the MapReduce framework which does shuffle and sort for us. And then, we just have to write the Reducer.

1578038403968

Here's a complete Python MapReduce script.

1578039179773

This is an entire MRJOB script in Python that would use MapReduce streaming to actually execute across a cluster.


4. Runing MapReduce with MRJOB

首先需要安装一些东西

Run our MapReduce job in our Hadoop installation.

https://www.udemy.com/course/the-ultimate-hands-on-hadoop-tame-your-big-data/learn/lecture/5963054#overview

1578041252394
1578041273664
1578041300494

5. Challenge Exercise

1578043490000
1578043550225
1578043582794
1578043608450
1578043786325

6. Check your results

1578044582388

结果:

1578044858078

movieId 50 是最popular的电影。