MapReduce
1. MapReduce fundamental concepts
1.1 Mapper
Mapper: Extract and organize what we care about.
1.2 Shuffle and Sort
1.3 Reducer
2. How MapReduce distributes processing
3. MapReduce: a real example
Sometimes, it's not easy to try to force a problem into this way of thinking, and that's a big reason why other frameworks like Spark or Hive, or other ways of processing SQL style queries have become a little bit more popular that just writing raw MapReduce code.
But, still, if you can easily express something in terms of mapping and reducing, this can sometimes be the most efficient way of doing it.
Then, the results all get passed into the MapReduce framework which does shuffle and sort for us. And then, we just have to write the Reducer.
Here's a complete Python MapReduce script.
This is an entire MRJOB script in Python that would use MapReduce streaming to actually execute across a cluster.
4. Runing MapReduce with MRJOB
首先需要安装一些东西
Run our MapReduce job in our Hadoop installation.
https://www.udemy.com/course/the-ultimate-hands-on-hadoop-tame-your-big-data/learn/lecture/5963054#overview
5. Challenge Exercise
6. Check your results
结果:
movieId 50 是最popular的电影。