Class AvroJob

java.lang.Object
org.apache.avro.mapreduce.AvroJob

public final class AvroJob extends Object
Utility methods for configuring jobs that work with Avro.

When using Avro data as MapReduce keys and values, data must be wrapped in a suitable AvroWrapper implementation. MapReduce keys must be wrapped in an AvroKey object, and MapReduce values must be wrapped in an AvroValue object.

Suppose you would like to write a line count mapper that reads from a text file. If instead of using a Text and IntWritable output value, you would like to use Avro data with a schema of "string" and "int", respectively, you may parametrize your mapper with AvroKey<CharSequence> and AvroValue<Integer> types. Then, use the setMapOutputKeySchema() and setMapOutputValueSchema() methods to set writer schemas for the records you will generate.

  • Field Details

    • CONF_OUTPUT_CODEC

      public static final String CONF_OUTPUT_CODEC
      The configuration key for a job's output compression codec. This takes one of the strings registered in CodecFactory
      See Also:
  • Method Details

    • setInputKeySchema

      public static void setInputKeySchema(Job job, Schema schema)
      Sets the job input key schema.
      Parameters:
      job - The job to configure.
      schema - The input key schema.
    • setInputValueSchema

      public static void setInputValueSchema(Job job, Schema schema)
      Sets the job input value schema.
      Parameters:
      job - The job to configure.
      schema - The input value schema.
    • setMapOutputKeySchema

      public static void setMapOutputKeySchema(Job job, Schema schema)
      Sets the map output key schema.
      Parameters:
      job - The job to configure.
      schema - The map output key schema.
    • setMapOutputValueSchema

      public static void setMapOutputValueSchema(Job job, Schema schema)
      Sets the map output value schema.
      Parameters:
      job - The job to configure.
      schema - The map output value schema.
    • setOutputKeySchema

      public static void setOutputKeySchema(Job job, Schema schema)
      Sets the job output key schema.
      Parameters:
      job - The job to configure.
      schema - The job output key schema.
    • setOutputValueSchema

      public static void setOutputValueSchema(Job job, Schema schema)
      Sets the job output value schema.
      Parameters:
      job - The job to configure.
      schema - The job output value schema.
    • setDataModelClass

      public static void setDataModelClass(Job job, Class<? extends GenericData> modelClass)
      Sets the job data model class.
      Parameters:
      job - The job to configure.
      modelClass - The job data model class.
    • getInputKeySchema

      public static Schema getInputKeySchema(Configuration conf)
      Gets the job input key schema.
      Parameters:
      conf - The job configuration.
      Returns:
      The job input key schema, or null if not set.
    • getInputValueSchema

      public static Schema getInputValueSchema(Configuration conf)
      Gets the job input value schema.
      Parameters:
      conf - The job configuration.
      Returns:
      The job input value schema, or null if not set.
    • getMapOutputKeySchema

      public static Schema getMapOutputKeySchema(Configuration conf)
      Gets the map output key schema.
      Parameters:
      conf - The job configuration.
      Returns:
      The map output key schema, or null if not set.
    • getMapOutputValueSchema

      public static Schema getMapOutputValueSchema(Configuration conf)
      Gets the map output value schema.
      Parameters:
      conf - The job configuration.
      Returns:
      The map output value schema, or null if not set.
    • getOutputKeySchema

      public static Schema getOutputKeySchema(Configuration conf)
      Gets the job output key schema.
      Parameters:
      conf - The job configuration.
      Returns:
      The job output key schema, or null if not set.
    • getOutputValueSchema

      public static Schema getOutputValueSchema(Configuration conf)
      Gets the job output value schema.
      Parameters:
      conf - The job configuration.
      Returns:
      The job output value schema, or null if not set.