MapReduce Design Patterns - External Source Output

2220阅读 0评论2013-04-10 YoLaiYoQu
分类:HADOOP

Pattern Name

External Source Output

Category

Input and Output Patterns

Description

The external source output pattern writes data to a system outside of Hadoop and HDFS.

Intent

You want to write MapReduce output to a nonnative location.

Motivation

The pattern skips storing data in a file system entirely and sends output key/value pairs directly where they belong. MapReduce is rarely ever hosting an applications as-is, so using MapReduce to bulk load into an external source in parallel has its uses.

In a MapReduce approach, the data is written out in parallel. As with using an external source for input, you need to be sure the destination system can handle the parallel ingest it is bound to endure with all the open connections.

Applicability

 

Structure

>The OutputFormat verifies the output specification of the job configuration prior to job submission. This method also is responsible for creating and initializing a RecordWriter implementation.

>The RecordWriter writes all key/value pairs to the external source. During construction of the object, establish any needed connections using the external source’s API. These connections are then used to write out all the data from each map or reduce task.

Consequences

The output data has been sent to the external source and that external source has loaded it successfully.

Known uses

 

Resemblances

 

Performance analysis

From a MapReduce perspective, there isn’t much to worry about since the map and reduce are generic. However, you do have to be very careful that the receiver of the data can handle the parallel connections.

Examples

Writing to Redis instances

上一篇:8道经典逻辑推理题
下一篇:MapReduce Design Patterns - External Source Input