Skip to main content

MapReduce - Wikipedia, the free encyclopedia

Popularity Report

Total Popularity Score: 0

Loading...
Loading...
Loading...
Loading...
Loading...
Loading...

Rank

Bookmark History

Saved by 20 people (-3 private), first by anonymouse user on 2006-08-17


Public Sticky notes

master node

Highlighted by pseudoking

master node

Highlighted by pseudoking

The master node takes the input, chops it up into smaller sub-problems, and distributes those to worker nodes. (A wor

Highlighted by pseudoking

The framework is inspired by map and reduce functions commonly used in functional programming,[2] although their purpose in the MapReduce framework is not the same as their original forms

Highlighted by doxyer

The framework is inspired by map and reduce functions commonly used in functional programming,[2]

Highlighted by kenyth

MapReduce is a framework for computing certain kinds of distributable problems using a large number of computers (nodes), collectively referred to as a cluster.

Highlighted by doxyer

master node

Highlighted by pseudoking

The Map and Reduce functions of MapReduce are both defined with respect to data structured in (key, value) pairs.

Highlighted by doxyer

Map(k1,v1) -> list(k2,v2)

Highlighted by doxyer

Reduce(k2, list (v2)) -> list(v2)

Highlighted by doxyer

defined with respect to data structured in (key, value) pairs.

Highlighted by kenyth

Map takes one pair of data with a type on a data domain, and returns a list of pairs in a different domain:

Map(k1,v1) -> list(k2,v2)

Highlighted by kenyth

ct to data stru

Highlighted by kenyth

Thus the MapReduce framework transforms a list of (key, value) pairs into a list of values. This behavior is different from the functional programming map and reduce combination, which accepts a list of arbitrary values and returns one single value that combines all the values returned by map.

Highlighted by doxyer

the MapReduce framework collects all pairs with the same key from all lists and groups them together, thus creating one group for each one of the different generated keys.

Highlighted by kenyth

The Reduce function is then applied in parallel to each group, which in turn produces a collection of values in the same domain:

Reduce(k2, list (v2)) -> list(v2)

Highlighted by kenyth

The canonical example application of MapReduce is a process to count the appearances of each different word in a set of documents

Highlighted by doxyer

This behavior is different from the functional programming map and reduce combination, which accepts a list of arbitrary values and returns one single value that combines all the values returned by map.

Highlighted by kenyth

map(String name, String document): // key: document name // value: document contents for each word w in document: EmitIntermediate(w, 1); reduce(String word, Iterator partialCounts): // key: a word // values: a list of aggregated partial counts int result = 0; for each v in partialCounts: result += ParseInt(v); Emit(result);

Highlighted by doxyer

count the appearances of each different word in a set of documents

Highlighted by kenyth

The hot spots, which the application defines, are:

  • an input reader
  • a Map function
  • a partition function
  • a compare function
  • a Reduce function
  • an output writer

Highlighted by doxyer

The frozen part of the MapReduce framework is a large distributed sort

Highlighted by doxyer

each document is split in words, and each word is counted initially with a "1" value by the Map function, using the word as the result key. The framework puts together all the pairs with the same key and feeds them to the same call to Reduce, thus this function just needs to sum all of its input values to find the total appearances of that word.

Highlighted by kenyth

MapReduce is useful in a wide range of applications, including: "distributed grep, distributed sort, web link-graph reversal, term-vector per host, web access log stats, inverted index construction, document clustering, machine learning, statistical machine translation..."

Highlighted by doxyer

The Google MapReduce framework is implemented in C++ with interfaces in Python and Java.

Highlighted by doxyer