Class AvroMultipleInputs

java.lang.Object
org.apache.avro.mapred.AvroMultipleInputs

public class AvroMultipleInputs extends Object
This class supports Avro-MapReduce jobs that have multiple input paths with a different Schema and AvroMapper for each path.

Usage:

Case 1: (ReflectData based inputs)

 // Enable ReflectData usage across job.
 AvroJob.setReflect(job);

 Schema type1Schema = ReflectData.get().getSchema(Type1Record.class)
 AvroMultipleInputs.addInputPath(job, inputPath1, type1Schema, Type1AvroMapper.class);
 
Where Type1AvroMapper would be implemented as
  class Type1AvroMapper extends AvroMapper<Type1Record, Pair<ComparingKeyRecord, CommonValueRecord>>
 
 Schema type2Schema = ReflectData.get().getSchema(Type2Record.class)
 AvroMultipleInputs.addInputPath(job, inputPath2, type2Schema, Type2AvroMapper.class);
 
Where Type2AvroMapper would be implemented as
  class Type2AvroMapper extends AvroMapper<Type2Record, Pair<ComparingKeyRecord, CommonValueRecord>>
 

Case 2: (SpecificData based inputs)

 Schema type1Schema = Type1Record.SCHEMA$;
 AvroMultipleInputs.addInputPath(job, inputPath1, type1Schema, Type1AvroMapper.class);
 
Where Type1AvroMapper would be implemented as
  class Type1AvroMapper extends AvroMapper<Type1Record, Pair<ComparingKeyRecord, CommonValueRecord>>
 
 Schema type2Schema = Type2Record.SCHEMA$;
 AvroMultipleInputs.addInputPath(job, inputPath2, type2Schema, Type2AvroMapper.class);
 
Where Type2AvroMapper would be implemented as
  class Type2AvroMapper extends AvroMapper<Type2Record, Pair<ComparingKeyRecord, CommonValueRecord>>
 

Note on InputFormat: The InputFormat used will always be AvroInputFormat when using this class.

Note on collector outputs: When using this class, you will need to ensure that the mapper implementations involved must all emit the same Key type and Value record types, as set by AvroJob.setOutputSchema(JobConf, Schema) or AvroJob.setMapOutputSchema(JobConf, Schema).

  • Constructor Details

    • AvroMultipleInputs

      public AvroMultipleInputs()
  • Method Details

    • addInputPath

      public static void addInputPath(JobConf conf, Path path, Class<? extends AvroMapper> mapperClass, Schema inputSchema)
      Add a Path with a custom Schema and AvroMapper to the list of inputs for the map-reduce job.
      Parameters:
      conf - The configuration of the job
      path - Path to be added to the list of inputs for the job
      mapperClass - AvroMapper class to use for this path
      inputSchema - Schema to use for this path