See: Description
| Class | Description | 
|---|---|
| AvroAsTextInputFormat | An  InputFormatfor Avro data files, which
 converts each datum to string form in the input key. | 
| AvroCollector<T> | A collector for map and reduce output. | 
| AvroInputFormat<T> | An  InputFormatfor Avro data files. | 
| AvroJob | Setters to configure jobs for Avro data. | 
| AvroKey<T> | The wrapper of keys for jobs configured with  AvroJob. | 
| AvroKeyComparator<T> | The  RawComparatorused by jobs configured withAvroJob. | 
| AvroMapper<IN,OUT> | A mapper for Avro data. | 
| AvroMultipleInputs | This class supports Avro-MapReduce jobs that have multiple input paths with a
 different  SchemaandAvroMapperfor each path. | 
| AvroMultipleOutputs | The AvroMultipleOutputs class simplifies writing Avro output data to multiple
 outputs | 
| AvroOutputFormat<T> | An  OutputFormatfor Avro data files. | 
| AvroRecordReader<T> | An  RecordReaderfor Avro data files. | 
| AvroReducer<K,V,OUT> | A reducer for Avro data. | 
| AvroSerialization<T> | The  Serializationused by jobs configured withAvroJob. | 
| AvroTextOutputFormat<K,V> | The equivalent of  TextOutputFormatfor
 writing to Avro Data Files with a"bytes"schema. | 
| AvroUtf8InputFormat | An  InputFormatfor text files. | 
| AvroValue<T> | The wrapper of values for jobs configured with  AvroJob. | 
| AvroWrapper<T> | The wrapper of data for jobs configured with  AvroJob. | 
| FsInput | Adapt an  FSDataInputStreamtoSeekableInput. | 
| Pair<K,V> | A key/value pair. | 
| SequenceFileInputFormat<K,V> | An  InputFormatfor sequence files. | 
| SequenceFileReader<K,V> | A  FileReaderfor sequence files. | 
| SequenceFileRecordReader<K,V> | A  RecordReaderfor sequence files. | 
Avro data files do not contain key/value pairs as expected by Hadoop's MapReduce API, but rather just a sequence of values. Thus we provide here a layer on top of Hadoop's MapReduce API.
In all cases, input and output paths are set and jobs are submitted as with standard Hadoop jobs:
FileInputFormat.setInputPaths(org.apache.hadoop.mapred.JobConf, java.lang.String)FileOutputFormat.setOutputPath(org.apache.hadoop.mapred.JobConf, org.apache.hadoop.fs.Path)JobClient.runJob(org.apache.hadoop.mapred.JobConf)For jobs whose input and output are Avro data files:
AvroJob.setInputSchema(org.apache.hadoop.mapred.JobConf, org.apache.avro.Schema) and
   AvroJob.setOutputSchema(org.apache.hadoop.mapred.JobConf, org.apache.avro.Schema) with your
   job's input and output schemas.AvroMapper and specify
   this as your job's mapper with AvroJob.setMapperClass(org.apache.hadoop.mapred.JobConf, java.lang.Class<? extends org.apache.avro.mapred.AvroMapper>)AvroReducer and specify
   this as your job's reducer and perhaps combiner, with AvroJob.setReducerClass(org.apache.hadoop.mapred.JobConf, java.lang.Class<? extends org.apache.avro.mapred.AvroReducer>) and AvroJob.setCombinerClass(org.apache.hadoop.mapred.JobConf, java.lang.Class<? extends org.apache.avro.mapred.AvroReducer>)For jobs whose input is an Avro data file and which use an AvroMapper, but whose reducer is a non-Avro
  Reducer and whose output is a
  non-Avro format:
 
AvroJob.setInputSchema(org.apache.hadoop.mapred.JobConf, org.apache.avro.Schema) with your
   job's input schema.AvroMapper and specify
   this as your job's mapper with AvroJob.setMapperClass(org.apache.hadoop.mapred.JobConf, java.lang.Class<? extends org.apache.avro.mapred.AvroMapper>)Reducer and specify
   your job's reducer with JobConf.setReducerClass(java.lang.Class<? extends org.apache.hadoop.mapred.Reducer>).  The input key
   and value types should be AvroKey and AvroValue.Reducer and
   specify your job's combiner with JobConf.setCombinerClass(java.lang.Class<? extends org.apache.hadoop.mapred.Reducer>).  You will be unable to
   re-use the same Reducer class as the Combiner, as the Combiner will need
   input and output key to be AvroKey, and
   input and output value to be AvroValue.JobConf.setOutputKeyClass(java.lang.Class<?>) and JobConf.setOutputValueClass(java.lang.Class<?>).JobConf.setOutputFormat(java.lang.Class<? extends org.apache.hadoop.mapred.OutputFormat>).For jobs whose input is non-Avro data file and which use a
  non-Avro Mapper, but whose reducer
  is an AvroReducer and whose output is
  an Avro data file:
 
JobConf.setInputFormat(java.lang.Class<? extends org.apache.hadoop.mapred.InputFormat>).Mapper and specify
   your job's mapper with JobConf.setMapperClass(java.lang.Class<? extends org.apache.hadoop.mapred.Mapper>).  The output key
   and value type should be AvroKey and
   AvroValue.AvroReducer and specify
   this as your job's reducer and perhaps combiner, with AvroJob.setReducerClass(org.apache.hadoop.mapred.JobConf, java.lang.Class<? extends org.apache.avro.mapred.AvroReducer>) and AvroJob.setCombinerClass(org.apache.hadoop.mapred.JobConf, java.lang.Class<? extends org.apache.avro.mapred.AvroReducer>)AvroJob.setOutputSchema(org.apache.hadoop.mapred.JobConf, org.apache.avro.Schema) with your
   job's output schema.For jobs whose input is non-Avro data file and which use a
  non-Avro Mapper and no reducer,
  i.e., a map-only job:
 
JobConf.setInputFormat(java.lang.Class<? extends org.apache.hadoop.mapred.InputFormat>).Mapper and specify
   your job's mapper with JobConf.setMapperClass(java.lang.Class<? extends org.apache.hadoop.mapred.Mapper>).  The output key
   and value type should be AvroWrapper and
   NullWritable.JobConf.setNumReduceTasks(int) with zero.
   AvroJob.setOutputSchema(org.apache.hadoop.mapred.JobConf, org.apache.avro.Schema) with your
   job's output schema.Copyright © 2009–2020 The Apache Software Foundation. All rights reserved.