org.apache.avro.mapred (Apache Avro Java 1.5.1 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV PACKAGE NEXT PACKAGE

FRAMES NO FRAMES

Package org.apache.avro.mapred

Run Hadoop MapReduce jobs over Avro data, with map and reduce functions written in Java.

See:
Description

Class Summary
AvroAsTextInputFormat	An `InputFormat` for Avro data files, which converts each datum to string form in the input key.
AvroCollector<T>	A collector for map and reduce output.
AvroInputFormat<T>	An `InputFormat` for Avro data files
AvroJob	Setters to configure jobs for Avro data.
AvroKey<T>	The wrapper of keys for jobs configured with `AvroJob` .
AvroKeyComparator<T>	The `RawComparator` used by jobs configured with `AvroJob`.
AvroMapper<IN,OUT>	A mapper for Avro data.
AvroOutputFormat<T>	An `OutputFormat` for Avro data files.
AvroRecordReader<T>	An `RecordReader` for Avro data files.
AvroReducer<K,V,OUT>	A reducer for Avro data.
AvroSerialization<T>	The `Serialization` used by jobs configured with `AvroJob`.
AvroUtf8InputFormat	An `InputFormat` for text files.
AvroValue<T>	The wrapper of values for jobs configured with `AvroJob` .
AvroWrapper<T>	The wrapper of data for jobs configured with `AvroJob` .
FsInput	Adapt an `FSDataInputStream` to `SeekableInput`.
Pair<K,V>	A key/value pair.
SequenceFileInputFormat<K,V>	An `InputFormat` for sequence files.
SequenceFileReader<K,V>	A `FileReader` for sequence files.
SequenceFileRecordReader<K,V>	A `RecordReader` for sequence files.

Package org.apache.avro.mapred Description

Run Hadoop MapReduce jobs over Avro data, with map and reduce functions written in Java.

Avro data files do not contain key/value pairs as expected by Hadoop's MapReduce API, but rather just a sequence of values. Thus we provide here a layer on top of Hadoop's MapReduce API.

In all cases, input and output paths are set and jobs are submitted as with standard Hadoop jobs:

Specify input files with FileInputFormat.setInputPaths(org.apache.hadoop.mapred.JobConf, java.lang.String)
Specify an output directory with FileOutputFormat.setOutputPath(org.apache.hadoop.mapred.JobConf, org.apache.hadoop.fs.Path)
Run your job with JobClient.runJob(org.apache.hadoop.mapred.JobConf)

For jobs whose input and output are Avro data files:

Call AvroJob.setInputSchema(org.apache.hadoop.mapred.JobConf, org.apache.avro.Schema) and AvroJob.setOutputSchema(org.apache.hadoop.mapred.JobConf, org.apache.avro.Schema) with your job's input and output schemas.
Subclass AvroMapper and specify this as your job's mapper with AvroJob.setMapperClass(org.apache.hadoop.mapred.JobConf, java.lang.Class)
Subclass AvroReducer and specify this as your job's reducer and perhaps combiner, with AvroJob.setReducerClass(org.apache.hadoop.mapred.JobConf, java.lang.Class) and AvroJob.setCombinerClass(org.apache.hadoop.mapred.JobConf, java.lang.Class)

For jobs whose input is an Avro data file and which use an AvroMapper, but whose reducer is a non-Avro Reducer and whose output is a non-Avro format:

Call AvroJob.setInputSchema(org.apache.hadoop.mapred.JobConf, org.apache.avro.Schema) with your job's input schema.
Subclass AvroMapper and specify this as your job's mapper with AvroJob.setMapperClass(org.apache.hadoop.mapred.JobConf, java.lang.Class)
Implement Reducer and specify your job's reducer and combiner with JobConf.setReducerClass(java.lang.Class) and JobConf.setCombinerClass(java.lang.Class). The input key and value types should be AvroKey and AvroValue.
Specify your job's output key and value types JobConf.setOutputKeyClass(java.lang.Class) and JobConf.setOutputValueClass(java.lang.Class).
Specify your job's output format JobConf.setOutputFormat(java.lang.Class).

For jobs whose input is non-Avro data file and which use a non-Avro Mapper, but whose reducer is an AvroReducer and whose output is an Avro data file:

Set your input file format with JobConf.setInputFormat(java.lang.Class).
Implement Mapper and specify your job's mapper with JobConf.setMapperClass(java.lang.Class). The output key and value type should be AvroKey and AvroValue.
Subclass AvroReducer and specify this as your job's reducer and perhaps combiner, with AvroJob.setReducerClass(org.apache.hadoop.mapred.JobConf, java.lang.Class) and AvroJob.setCombinerClass(org.apache.hadoop.mapred.JobConf, java.lang.Class)
Call AvroJob.setOutputSchema(org.apache.hadoop.mapred.JobConf, org.apache.avro.Schema) with your job's output schema.

For jobs whose input is non-Avro data file and which use a non-Avro Mapper and no reducer, i.e., a map-only job:

Set your input file format with JobConf.setInputFormat(java.lang.Class).
Implement Mapper and specify your job's mapper with JobConf.setMapperClass(java.lang.Class). The output key and value type should be AvroWrapper and NullWritable.
Call JobConf.setNumReduceTasks(int) with zero.
Call AvroJob.setOutputSchema(org.apache.hadoop.mapred.JobConf, org.apache.avro.Schema) with your job's output schema.