public class AvroMultipleOutputs extends Object
 Case one: writing to additional outputs other than the job default output.
 Each additional output, or named output, may be configured with its own
 Schema and OutputFormat. A named output can be a
 single file or a multi file. The later is refered as a multi named output
 which is an unbound set of files all sharing the same Schema.
 
Case two: to write data to different files provided by user
 AvroMultipleOutputs supports counters, by default they are disabled. The
 counters group is the AvroMultipleOutputs class name. The names of
 the counters are the same as the output name. These count the number of
 records written to each output name. For multi named outputs the name of the
 counter is the concatenation of the named output, and underscore '_' and the
 multiname.
 
 JobConf job = new JobConf();
 FileInputFormat.setInputPath(job, inDir);
 FileOutputFormat.setOutputPath(job, outDir);
 job.setMapperClass(MyAvroMapper.class);
 job.setReducerClass(HadoopReducer.class);
 job.set("avro.reducer",MyAvroReducer.class);
 ...
 Schema schema;
 ...
 // Defines additional single output 'avro1' for the job
 AvroMultipleOutputs.addNamedOutput(job, "avro1", AvroOutputFormat.class,
 schema);
 // Defines additional output 'avro2' with different schema for the job
 AvroMultipleOutputs.addNamedOutput(job, "avro2",
   AvroOutputFormat.class,
   null); // if Schema is specified as null then the default output schema is used
 ...
 job.waitForCompletion(true);
 ...
 
 Usage in Reducer:
 public class MyAvroReducer extends
   AvroReducer<K, V, OUT> {
 private MultipleOutputs amos;
 public void configure(JobConf conf) {
 ...
 amos = new AvroMultipleOutputs(conf);
 }
 public void reduce(K, Iterator<V> values,
 AvroCollector<OUT>, Reporter reporter)
 throws IOException {
 ...
 amos.collect("avro1", reporter,datum);
 amos.getCollector("avro2", "A", reporter).collect(datum);
 amos.collect("avro1",reporter,schema,datum,"testavrofile");// this create a file testavrofile and writes data with schema "schema" into it
                                                            and uses other values from namedoutput "avro1" like outputclass etc.
 amos.collect("avro1",reporter,schema,datum,"testavrofile1");
 ...
 }
 public void close() throws IOException {
 amos.close();
 ...
 }
 }
 | Constructor and Description | 
|---|
| AvroMultipleOutputs(JobConf job)Creates and initializes multiple named outputs support, it should be
 instantiated in the Mapper/Reducer configure method. | 
| Modifier and Type | Method and Description | 
|---|---|
| static void | addMultiNamedOutput(JobConf conf,
                   String namedOutput,
                   Class<? extends OutputFormat> outputFormatClass,
                   Schema schema)Adds a multi named output for the job. | 
| static void | addNamedOutput(JobConf conf,
              String namedOutput,
              Class<? extends OutputFormat> outputFormatClass,
              Schema schema)Adds a named output for the job. | 
| void | close()Closes all the opened named outputs. | 
| void | collect(String namedOutput,
       Reporter reporter,
       Object datum)Output Collector for the default schema. | 
| void | collect(String namedOutput,
       Reporter reporter,
       Schema schema,
       Object datum)OutputCollector with custom schema. | 
| void | collect(String namedOutput,
       Reporter reporter,
       Schema schema,
       Object datum,
       String baseOutputPath)OutputCollector with custom schema and file name. | 
| AvroCollector | getCollector(String namedOutput,
            Reporter reporter)Deprecated. 
 Use  collect(java.lang.String, org.apache.hadoop.mapred.Reporter, java.lang.Object)method for collecting output | 
| AvroCollector | getCollector(String namedOutput,
            String multiName,
            Reporter reporter)Gets the output collector for a named output. | 
| static boolean | getCountersEnabled(JobConf conf)Returns if the counters for the named outputs are enabled or not. | 
| static Class<? extends OutputFormat> | getNamedOutputFormatClass(JobConf conf,
                         String namedOutput)Returns the named output OutputFormat. | 
| Iterator<String> | getNamedOutputs()Returns iterator with the defined name outputs. | 
| static List<String> | getNamedOutputsList(JobConf conf)Returns list of channel names. | 
| static boolean | isMultiNamedOutput(JobConf conf,
                  String namedOutput)Returns if a named output is multiple. | 
| static void | setCountersEnabled(JobConf conf,
                  boolean enabled)Enables or disables counters for the named outputs. | 
public AvroMultipleOutputs(JobConf job)
job - the job configuration objectpublic static List<String> getNamedOutputsList(JobConf conf)
conf - job confpublic static boolean isMultiNamedOutput(JobConf conf, String namedOutput)
conf - job confnamedOutput - named outputtrue if the name output is multi, false if
         it is single. If the name output is not defined it returns
         falsepublic static Class<? extends OutputFormat> getNamedOutputFormatClass(JobConf conf, String namedOutput)
conf - job confnamedOutput - named outputpublic static void addNamedOutput(JobConf conf, String namedOutput, Class<? extends OutputFormat> outputFormatClass, Schema schema)
conf - job conf to add the named outputnamedOutput - named output name, it has to be a word, letters and
                          numbers only, cannot be the word 'part' as that is
                          reserved for the default output.outputFormatClass - OutputFormat class.schema - Schema to used for this namedOutputpublic static void addMultiNamedOutput(JobConf conf, String namedOutput, Class<? extends OutputFormat> outputFormatClass, Schema schema)
conf - job conf to add the named outputnamedOutput - named output name, it has to be a word, letters and
                          numbers only, cannot be the word 'part' as that is
                          reserved for the default output.outputFormatClass - OutputFormat class.schema - Schema to used for this namedOutputpublic static void setCountersEnabled(JobConf conf, boolean enabled)
AvroMultipleOutputs class name.
 
 The names of the counters are the same as the named outputs. For multi named
 outputs the name of the counter is the concatenation of the named output, and
 underscore '_' and the multiname.conf - job conf to enableadd the named output.enabled - indicates if the counters will be enabled or not.public static boolean getCountersEnabled(JobConf conf)
AvroMultipleOutputs class name.
 
 The names of the counters are the same as the named outputs. For multi named
 outputs the name of the counter is the concatenation of the named output, and
 underscore '_' and the multiname.conf - job conf to enableadd the named output.public Iterator<String> getNamedOutputs()
public void collect(String namedOutput, Reporter reporter, Object datum) throws IOException
namedOutput - the named output namereporter - the reporterdatum - output dataIOException - thrown if output collector could not be createdpublic void collect(String namedOutput, Reporter reporter, Schema schema, Object datum) throws IOException
namedOutput - the named output name (this will the output file name)reporter - the reporterdatum - output dataschema - schema to use for this outputIOException - thrown if output collector could not be createdpublic void collect(String namedOutput, Reporter reporter, Schema schema, Object datum, String baseOutputPath) throws IOException
namedOutput - the named output namereporter - the reporterbaseOutputPath - outputfile name to use.datum - output dataschema - schema to use for this outputIOException - thrown if output collector could not be createdpublic AvroCollector getCollector(String namedOutput, Reporter reporter) throws IOException
collect(java.lang.String, org.apache.hadoop.mapred.Reporter, java.lang.Object) method for collecting outputnamedOutput - the named output namereporter - the reporterIOException - thrown if output collector could not be createdpublic AvroCollector getCollector(String namedOutput, String multiName, Reporter reporter) throws IOException
namedOutput - the named output namereporter - the reportermultiName - the multinameIOException - thrown if output collector could not be createdpublic void close()
           throws IOException
super.close() at the end of
 their close()IOException - thrown if any of the MultipleOutput files could
         not be closed properly.Copyright © 2009–2020 The Apache Software Foundation. All rights reserved.