org.apache.avro.mapred (Apache Avro Java 1.10.1 API)

Class Summary
Class	Description
AvroAsTextInputFormat	An `InputFormat` for Avro data files, which converts each datum to string form in the input key.
AvroCollector<T>	A collector for map and reduce output.
AvroInputFormat<T>	An `InputFormat` for Avro data files.
AvroJob	Setters to configure jobs for Avro data.
AvroKey<T>	The wrapper of keys for jobs configured with `AvroJob` .
AvroKeyComparator<T>	The `RawComparator` used by jobs configured with `AvroJob`.
AvroMapper<IN,OUT>	A mapper for Avro data.
AvroMultipleInputs	This class supports Avro-MapReduce jobs that have multiple input paths with a different `Schema` and `AvroMapper` for each path.
AvroMultipleOutputs	The AvroMultipleOutputs class simplifies writing Avro output data to multiple outputs
AvroOutputFormat<T>	An `OutputFormat` for Avro data files.
AvroRecordReader<T>	An `RecordReader` for Avro data files.
AvroReducer<K,V,OUT>	A reducer for Avro data.
AvroSerialization<T>	The `Serialization` used by jobs configured with `AvroJob`.
AvroTextOutputFormat<K,V>	The equivalent of `TextOutputFormat` for writing to Avro Data Files with a `"bytes"` schema.
AvroUtf8InputFormat	An `InputFormat` for text files.
AvroValue<T>	The wrapper of values for jobs configured with `AvroJob` .
AvroWrapper<T>	The wrapper of data for jobs configured with `AvroJob` .
FsInput	Adapt an `FSDataInputStream` to `SeekableInput`.
Pair<K,V>	A key/value pair.
SequenceFileInputFormat<K,V>	An `InputFormat` for sequence files.
SequenceFileReader<K,V>	A `FileReader` for sequence files.
SequenceFileRecordReader<K,V>	A `RecordReader` for sequence files.

Package org.apache.avro.mapred Description

Run Hadoop MapReduce jobs over Avro data, with map and reduce functions written in Java.

Avro data files do not contain key/value pairs as expected by Hadoop's MapReduce API, but rather just a sequence of values. Thus we provide here a layer on top of Hadoop's MapReduce API.

In all cases, input and output paths are set and jobs are submitted as with standard Hadoop jobs:

Specify input files with FileInputFormat.setInputPaths(org.apache.hadoop.mapred.JobConf, java.lang.String)
Specify an output directory with FileOutputFormat.setOutputPath(org.apache.hadoop.mapred.JobConf, org.apache.hadoop.fs.Path)
Run your job with JobClient.runJob(org.apache.hadoop.mapred.JobConf)

For jobs whose input and output are Avro data files:

Call AvroJob.setInputSchema(org.apache.hadoop.mapred.JobConf, org.apache.avro.Schema) and AvroJob.setOutputSchema(org.apache.hadoop.mapred.JobConf, org.apache.avro.Schema) with your job's input and output schemas.
Subclass AvroMapper and specify this as your job's mapper with AvroJob.setMapperClass(org.apache.hadoop.mapred.JobConf, java.lang.Class<? extends org.apache.avro.mapred.AvroMapper>)
Subclass AvroReducer and specify this as your job's reducer and perhaps combiner, with AvroJob.setReducerClass(org.apache.hadoop.mapred.JobConf, java.lang.Class<? extends org.apache.avro.mapred.AvroReducer>) and AvroJob.setCombinerClass(org.apache.hadoop.mapred.JobConf, java.lang.Class<? extends org.apache.avro.mapred.AvroReducer>)

For jobs whose input is an Avro data file and which use an AvroMapper, but whose reducer is a non-Avro Reducer and whose output is a non-Avro format:

Call AvroJob.setInputSchema(org.apache.hadoop.mapred.JobConf, org.apache.avro.Schema) with your job's input schema.
Subclass AvroMapper and specify this as your job's mapper with AvroJob.setMapperClass(org.apache.hadoop.mapred.JobConf, java.lang.Class<? extends org.apache.avro.mapred.AvroMapper>)
Implement Reducer and specify your job's reducer with JobConf.setReducerClass(java.lang.Class<? extends org.apache.hadoop.mapred.Reducer>). The input key and value types should be AvroKey and AvroValue.
Optionally implement Reducer and specify your job's combiner with JobConf.setCombinerClass(java.lang.Class<? extends org.apache.hadoop.mapred.Reducer>). You will be unable to re-use the same Reducer class as the Combiner, as the Combiner will need input and output key to be AvroKey, and input and output value to be AvroValue.
Specify your job's output key and value types JobConf.setOutputKeyClass(java.lang.Class<?>) and JobConf.setOutputValueClass(java.lang.Class<?>).
Specify your job's output format JobConf.setOutputFormat(java.lang.Class<? extends org.apache.hadoop.mapred.OutputFormat>).

For jobs whose input is non-Avro data file and which use a non-Avro Mapper, but whose reducer is an AvroReducer and whose output is an Avro data file:

Set your input file format with JobConf.setInputFormat(java.lang.Class<? extends org.apache.hadoop.mapred.InputFormat>).
Implement Mapper and specify your job's mapper with JobConf.setMapperClass(java.lang.Class<? extends org.apache.hadoop.mapred.Mapper>). The output key and value type should be AvroKey and AvroValue.
Subclass AvroReducer and specify this as your job's reducer and perhaps combiner, with AvroJob.setReducerClass(org.apache.hadoop.mapred.JobConf, java.lang.Class<? extends org.apache.avro.mapred.AvroReducer>) and AvroJob.setCombinerClass(org.apache.hadoop.mapred.JobConf, java.lang.Class<? extends org.apache.avro.mapred.AvroReducer>)
Call AvroJob.setOutputSchema(org.apache.hadoop.mapred.JobConf, org.apache.avro.Schema) with your job's output schema.

For jobs whose input is non-Avro data file and which use a non-Avro Mapper and no reducer, i.e., a map-only job:

Set your input file format with JobConf.setInputFormat(java.lang.Class<? extends org.apache.hadoop.mapred.InputFormat>).
Implement Mapper and specify your job's mapper with JobConf.setMapperClass(java.lang.Class<? extends org.apache.hadoop.mapred.Mapper>). The output key and value type should be AvroWrapper and NullWritable.
Call JobConf.setNumReduceTasks(int) with zero.
Call AvroJob.setOutputSchema(org.apache.hadoop.mapred.JobConf, org.apache.avro.Schema) with your job's output schema.