Package org.apache.avro.mapred
Class AvroMultipleInputs
java.lang.Object
org.apache.avro.mapred.AvroMultipleInputs
This class supports Avro-MapReduce jobs that have multiple input paths with a
different
Schema
and AvroMapper
for each path.
Usage:
Case 1: (ReflectData based inputs)
// Enable ReflectData usage across job. AvroJob.setReflect(job); Schema type1Schema = ReflectData.get().getSchema(Type1Record.class) AvroMultipleInputs.addInputPath(job, inputPath1, type1Schema, Type1AvroMapper.class);Where Type1AvroMapper would be implemented as
class Type1AvroMapper extends AvroMapper<Type1Record, Pair<ComparingKeyRecord, CommonValueRecord>>
Schema type2Schema = ReflectData.get().getSchema(Type2Record.class) AvroMultipleInputs.addInputPath(job, inputPath2, type2Schema, Type2AvroMapper.class);Where Type2AvroMapper would be implemented as
class Type2AvroMapper extends AvroMapper<Type2Record, Pair<ComparingKeyRecord, CommonValueRecord>>
Case 2: (SpecificData based inputs)
Schema type1Schema = Type1Record.SCHEMA$; AvroMultipleInputs.addInputPath(job, inputPath1, type1Schema, Type1AvroMapper.class);Where Type1AvroMapper would be implemented as
class Type1AvroMapper extends AvroMapper<Type1Record, Pair<ComparingKeyRecord, CommonValueRecord>>
Schema type2Schema = Type2Record.SCHEMA$; AvroMultipleInputs.addInputPath(job, inputPath2, type2Schema, Type2AvroMapper.class);Where Type2AvroMapper would be implemented as
class Type2AvroMapper extends AvroMapper<Type2Record, Pair<ComparingKeyRecord, CommonValueRecord>>
Note on InputFormat: The InputFormat used will always be
AvroInputFormat
when using this class.
Note on collector outputs: When using this class, you will
need to ensure that the mapper implementations involved must all emit the
same Key type and Value record types, as set by
AvroJob.setOutputSchema(JobConf, Schema)
or
AvroJob.setMapOutputSchema(JobConf, Schema)
.
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionstatic void
addInputPath
(JobConf conf, Path path, Class<? extends AvroMapper> mapperClass, Schema inputSchema)
-
Constructor Details
-
AvroMultipleInputs
public AvroMultipleInputs()
-
-
Method Details
-
addInputPath
public static void addInputPath(JobConf conf, Path path, Class<? extends AvroMapper> mapperClass, Schema inputSchema) - Parameters:
conf
- The configuration of the jobpath
-Path
to be added to the list of inputs for the jobmapperClass
-AvroMapper
class to use for this pathinputSchema
-Schema
to use for this path
-