org.apache.avro.mapred (Apache Avro Java 1.7.4 API)

Class Summary
Class	Description
AvroAsTextInputFormat	An `InputFormat` for Avro data files, which converts each datum to string form in the input key.
AvroCollector<T>	A collector for map and reduce output.
AvroInputFormat<T>	An `InputFormat` for Avro data files
AvroJob	Setters to configure jobs for Avro data.
AvroKey<T>	The wrapper of keys for jobs configured with `AvroJob` .
AvroKeyComparator<T>	The `RawComparator` used by jobs configured with `AvroJob`.
AvroMapper<IN,OUT>	A mapper for Avro data.
AvroMultipleOutputs	The AvroMultipleOutputs class simplifies writing Avro output data to multiple outputs
AvroOutputFormat<T>	An `OutputFormat` for Avro data files.
AvroRecordReader<T>	An `RecordReader` for Avro data files.
AvroReducer<K,V,OUT>	A reducer for Avro data.
AvroSerialization<T>	The `Serialization` used by jobs configured with `AvroJob`.
AvroTextOutputFormat<K,V>	The equivalent of `TextOutputFormat` for writing to Avro Data Files with a `"bytes"` schema.
AvroUtf8InputFormat	An `InputFormat` for text files.
AvroValue<T>	The wrapper of values for jobs configured with `AvroJob` .
AvroWrapper<T>	The wrapper of data for jobs configured with `AvroJob` .
FsInput	Adapt an `FSDataInputStream` to `SeekableInput`.
Pair<K,V>	A key/value pair.
SequenceFileInputFormat<K,V>	An `InputFormat` for sequence files.
SequenceFileReader<K,V>	A `FileReader` for sequence files.
SequenceFileRecordReader<K,V>	A `RecordReader` for sequence files.

Package org.apache.avro.mapred Description

Run Hadoop MapReduce jobs over Avro data, with map and reduce functions written in Java.

Avro data files do not contain key/value pairs as expected by Hadoop's MapReduce API, but rather just a sequence of values. Thus we provide here a layer on top of Hadoop's MapReduce API.

In all cases, input and output paths are set and jobs are submitted as with standard Hadoop jobs:

Specify input files with FileInputFormat.setInputPaths(org.apache.hadoop.mapred.JobConf, java.lang.String)
Specify an output directory with FileOutputFormat.setOutputPath(org.apache.hadoop.mapred.JobConf, org.apache.hadoop.fs.Path)
Run your job with JobClient.runJob(org.apache.hadoop.mapred.JobConf)

For jobs whose input and output are Avro data files:

Call AvroJob.setInputSchema(org.apache.hadoop.mapred.JobConf, org.apache.avro.Schema) and AvroJob.setOutputSchema(org.apache.hadoop.mapred.JobConf, org.apache.avro.Schema) with your job's input and output schemas.
Subclass AvroMapper and specify this as your job's mapper with AvroJob.setMapperClass(org.apache.hadoop.mapred.JobConf, java.lang.Class<? extends org.apache.avro.mapred.AvroMapper>)
Subclass AvroReducer and specify this as your job's reducer and perhaps combiner, with AvroJob.setReducerClass(org.apache.hadoop.mapred.JobConf, java.lang.Class<? extends org.apache.avro.mapred.AvroReducer>) and AvroJob.setCombinerClass(org.apache.hadoop.mapred.JobConf, java.lang.Class<? extends org.apache.avro.mapred.AvroReducer>)

For jobs whose input is an Avro data file and which use an AvroMapper, but whose reducer is a non-Avro Reducer and whose output is a non-Avro format:

Call AvroJob.setInputSchema(org.apache.hadoop.mapred.JobConf, org.apache.avro.Schema) with your job's input schema.
Subclass AvroMapper and specify this as your job's mapper with AvroJob.setMapperClass(org.apache.hadoop.mapred.JobConf, java.lang.Class<? extends org.apache.avro.mapred.AvroMapper>)
Implement Reducer and specify your job's reducer and combiner with JobConf.setReducerClass(java.lang.Class<? extends org.apache.hadoop.mapred.Reducer>) and JobConf.setCombinerClass(java.lang.Class<? extends org.apache.hadoop.mapred.Reducer>). The input key and value types should be AvroKey and AvroValue.
Specify your job's output key and value types JobConf.setOutputKeyClass(java.lang.Class<?>) and JobConf.setOutputValueClass(java.lang.Class<?>).
Specify your job's output format JobConf.setOutputFormat(java.lang.Class<? extends org.apache.hadoop.mapred.OutputFormat>).

For jobs whose input is non-Avro data file and which use a non-Avro Mapper, but whose reducer is an AvroReducer and whose output is an Avro data file:

Set your input file format with JobConf.setInputFormat(java.lang.Class<? extends org.apache.hadoop.mapred.InputFormat>).
Implement Mapper and specify your job's mapper with JobConf.setMapperClass(java.lang.Class<? extends org.apache.hadoop.mapred.Mapper>). The output key and value type should be AvroKey and AvroValue.
Subclass AvroReducer and specify this as your job's reducer and perhaps combiner, with AvroJob.setReducerClass(org.apache.hadoop.mapred.JobConf, java.lang.Class<? extends org.apache.avro.mapred.AvroReducer>) and AvroJob.setCombinerClass(org.apache.hadoop.mapred.JobConf, java.lang.Class<? extends org.apache.avro.mapred.AvroReducer>)
Call AvroJob.setOutputSchema(org.apache.hadoop.mapred.JobConf, org.apache.avro.Schema) with your job's output schema.

For jobs whose input is non-Avro data file and which use a non-Avro Mapper and no reducer, i.e., a map-only job:

Set your input file format with JobConf.setInputFormat(java.lang.Class<? extends org.apache.hadoop.mapred.InputFormat>).
Implement Mapper and specify your job's mapper with JobConf.setMapperClass(java.lang.Class<? extends org.apache.hadoop.mapred.Mapper>). The output key and value type should be AvroWrapper and NullWritable.
Call JobConf.setNumReduceTasks(int) with zero.
Call AvroJob.setOutputSchema(org.apache.hadoop.mapred.JobConf, org.apache.avro.Schema) with your job's output schema.