See: Description
Class | Description |
---|---|
AvroAsTextInputFormat |
An
InputFormat for Avro data files, which
converts each datum to string form in the input key. |
AvroCollector<T> |
A collector for map and reduce output.
|
AvroInputFormat<T> |
An
InputFormat for Avro data files. |
AvroJob |
Setters to configure jobs for Avro data.
|
AvroKey<T> |
The wrapper of keys for jobs configured with
AvroJob . |
AvroKeyComparator<T> |
The
RawComparator used by jobs configured with AvroJob . |
AvroMapper<IN,OUT> |
A mapper for Avro data.
|
AvroMultipleInputs |
This class supports Avro-MapReduce jobs that have multiple input paths with a
different
Schema and AvroMapper for each path. |
AvroMultipleOutputs |
The AvroMultipleOutputs class simplifies writing Avro output data to multiple
outputs
|
AvroOutputFormat<T> |
An
OutputFormat for Avro data files. |
AvroRecordReader<T> |
An
RecordReader for Avro data files. |
AvroReducer<K,V,OUT> |
A reducer for Avro data.
|
AvroSerialization<T> |
The
Serialization used by jobs configured with AvroJob . |
AvroTextOutputFormat<K,V> |
The equivalent of
TextOutputFormat for
writing to Avro Data Files with a "bytes" schema. |
AvroUtf8InputFormat |
An
InputFormat for text files. |
AvroValue<T> |
The wrapper of values for jobs configured with
AvroJob . |
AvroWrapper<T> |
The wrapper of data for jobs configured with
AvroJob . |
FsInput |
Adapt an
FSDataInputStream to SeekableInput . |
Pair<K,V> |
A key/value pair.
|
SequenceFileInputFormat<K,V> |
An
InputFormat for sequence files. |
SequenceFileReader<K,V> |
A
FileReader for sequence files. |
SequenceFileRecordReader<K,V> |
A
RecordReader for sequence files. |
Avro data files do not contain key/value pairs as expected by Hadoop's MapReduce API, but rather just a sequence of values. Thus we provide here a layer on top of Hadoop's MapReduce API.
In all cases, input and output paths are set and jobs are submitted as with standard Hadoop jobs:
FileInputFormat.setInputPaths(org.apache.hadoop.mapred.JobConf, java.lang.String)
FileOutputFormat.setOutputPath(org.apache.hadoop.mapred.JobConf, org.apache.hadoop.fs.Path)
JobClient.runJob(org.apache.hadoop.mapred.JobConf)
For jobs whose input and output are Avro data files:
AvroJob.setInputSchema(org.apache.hadoop.mapred.JobConf, org.apache.avro.Schema)
and
AvroJob.setOutputSchema(org.apache.hadoop.mapred.JobConf, org.apache.avro.Schema)
with your
job's input and output schemas.AvroMapper
and specify
this as your job's mapper with AvroJob.setMapperClass(org.apache.hadoop.mapred.JobConf, java.lang.Class<? extends org.apache.avro.mapred.AvroMapper>)
AvroReducer
and specify
this as your job's reducer and perhaps combiner, with AvroJob.setReducerClass(org.apache.hadoop.mapred.JobConf, java.lang.Class<? extends org.apache.avro.mapred.AvroReducer>)
and AvroJob.setCombinerClass(org.apache.hadoop.mapred.JobConf, java.lang.Class<? extends org.apache.avro.mapred.AvroReducer>)
For jobs whose input is an Avro data file and which use an AvroMapper
, but whose reducer is a non-Avro
Reducer
and whose output is a
non-Avro format:
AvroJob.setInputSchema(org.apache.hadoop.mapred.JobConf, org.apache.avro.Schema)
with your
job's input schema.AvroMapper
and specify
this as your job's mapper with AvroJob.setMapperClass(org.apache.hadoop.mapred.JobConf, java.lang.Class<? extends org.apache.avro.mapred.AvroMapper>)
Reducer
and specify
your job's reducer with JobConf.setReducerClass(java.lang.Class<? extends org.apache.hadoop.mapred.Reducer>)
. The input key
and value types should be AvroKey
and AvroValue
.Reducer
and
specify your job's combiner with JobConf.setCombinerClass(java.lang.Class<? extends org.apache.hadoop.mapred.Reducer>)
. You will be unable to
re-use the same Reducer class as the Combiner, as the Combiner will need
input and output key to be AvroKey
, and
input and output value to be AvroValue
.JobConf.setOutputKeyClass(java.lang.Class<?>)
and JobConf.setOutputValueClass(java.lang.Class<?>)
.JobConf.setOutputFormat(java.lang.Class<? extends org.apache.hadoop.mapred.OutputFormat>)
.For jobs whose input is non-Avro data file and which use a
non-Avro Mapper
, but whose reducer
is an AvroReducer
and whose output is
an Avro data file:
JobConf.setInputFormat(java.lang.Class<? extends org.apache.hadoop.mapred.InputFormat>)
.Mapper
and specify
your job's mapper with JobConf.setMapperClass(java.lang.Class<? extends org.apache.hadoop.mapred.Mapper>)
. The output key
and value type should be AvroKey
and
AvroValue
.AvroReducer
and specify
this as your job's reducer and perhaps combiner, with AvroJob.setReducerClass(org.apache.hadoop.mapred.JobConf, java.lang.Class<? extends org.apache.avro.mapred.AvroReducer>)
and AvroJob.setCombinerClass(org.apache.hadoop.mapred.JobConf, java.lang.Class<? extends org.apache.avro.mapred.AvroReducer>)
AvroJob.setOutputSchema(org.apache.hadoop.mapred.JobConf, org.apache.avro.Schema)
with your
job's output schema.For jobs whose input is non-Avro data file and which use a
non-Avro Mapper
and no reducer,
i.e., a map-only job:
JobConf.setInputFormat(java.lang.Class<? extends org.apache.hadoop.mapred.InputFormat>)
.Mapper
and specify
your job's mapper with JobConf.setMapperClass(java.lang.Class<? extends org.apache.hadoop.mapred.Mapper>)
. The output key
and value type should be AvroWrapper
and
NullWritable
.JobConf.setNumReduceTasks(int)
with zero.
AvroJob.setOutputSchema(org.apache.hadoop.mapred.JobConf, org.apache.avro.Schema)
with your
job's output schema.Copyright © 2009–2020 The Apache Software Foundation. All rights reserved.