|
||||||||||
| PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES | |||||||||
See:
Description
| Class Summary | |
|---|---|
| AvroAsTextInputFormat | An InputFormat for Avro data files, which
converts each datum to string form in the input key. |
| AvroCollector<T> | A collector for map and reduce output. |
| AvroInputFormat<T> | An InputFormat for Avro data files |
| AvroJob | Setters to configure jobs for Avro data. |
| AvroKey<T> | The wrapper of keys for jobs configured with AvroJob . |
| AvroKeyComparator<T> | The RawComparator used by jobs configured with AvroJob. |
| AvroMapper<IN,OUT> | A mapper for Avro data. |
| AvroOutputFormat<T> | An OutputFormat for Avro data files. |
| AvroRecordReader<T> | An RecordReader for Avro data files. |
| AvroReducer<K,V,OUT> | A reducer for Avro data. |
| AvroSerialization<T> | The Serialization used by jobs configured with AvroJob. |
| AvroUtf8InputFormat | An InputFormat for text files. |
| AvroValue<T> | The wrapper of values for jobs configured with AvroJob . |
| AvroWrapper<T> | The wrapper of data for jobs configured with AvroJob . |
| FsInput | Adapt an FSDataInputStream to SeekableInput. |
| Pair<K,V> | A key/value pair. |
| SequenceFileInputFormat<K,V> | An InputFormat for sequence files. |
| SequenceFileReader<K,V> | A FileReader for sequence files. |
| SequenceFileRecordReader<K,V> | A RecordReader for sequence files. |
Run Hadoop MapReduce jobs over Avro data, with map and reduce functions written in Java.
Avro data files do not contain key/value pairs as expected by Hadoop's MapReduce API, but rather just a sequence of values. Thus we provide here a layer on top of Hadoop's MapReduce API.
In all cases, input and output paths are set and jobs are submitted as with standard Hadoop jobs:
FileInputFormat.setInputPaths(org.apache.hadoop.mapred.JobConf, java.lang.String)FileOutputFormat.setOutputPath(org.apache.hadoop.mapred.JobConf, org.apache.hadoop.fs.Path)JobClient.runJob(org.apache.hadoop.mapred.JobConf)For jobs whose input and output are Avro data files:
AvroJob.setInputSchema(org.apache.hadoop.mapred.JobConf, org.apache.avro.Schema) and
AvroJob.setOutputSchema(org.apache.hadoop.mapred.JobConf, org.apache.avro.Schema) with your
job's input and output schemas.AvroMapper and specify
this as your job's mapper with AvroJob.setMapperClass(org.apache.hadoop.mapred.JobConf, java.lang.Class extends org.apache.avro.mapred.AvroMapper>)AvroReducer and specify
this as your job's reducer and perhaps combiner, with AvroJob.setReducerClass(org.apache.hadoop.mapred.JobConf, java.lang.Class extends org.apache.avro.mapred.AvroReducer>) and AvroJob.setCombinerClass(org.apache.hadoop.mapred.JobConf, java.lang.Class extends org.apache.avro.mapred.AvroReducer>)For jobs whose input is an Avro data file and which use an AvroMapper, but whose reducer is a non-Avro
Reducer and whose output is a
non-Avro format:
AvroJob.setInputSchema(org.apache.hadoop.mapred.JobConf, org.apache.avro.Schema) with your
job's input schema.AvroMapper and specify
this as your job's mapper with AvroJob.setMapperClass(org.apache.hadoop.mapred.JobConf, java.lang.Class extends org.apache.avro.mapred.AvroMapper>)Reducer and specify
your job's reducer and combiner with JobConf.setReducerClass(java.lang.Class extends org.apache.hadoop.mapred.Reducer>) and JobConf.setCombinerClass(java.lang.Class extends org.apache.hadoop.mapred.Reducer>). The input key
and value types should be AvroKey and AvroValue.JobConf.setOutputKeyClass(java.lang.Class>) and JobConf.setOutputValueClass(java.lang.Class>).JobConf.setOutputFormat(java.lang.Class extends org.apache.hadoop.mapred.OutputFormat>).For jobs whose input is non-Avro data file and which use a
non-Avro Mapper, but whose reducer
is an AvroReducer and whose output is
an Avro data file:
JobConf.setInputFormat(java.lang.Class extends org.apache.hadoop.mapred.InputFormat>).Mapper and specify
your job's mapper with JobConf.setMapperClass(java.lang.Class extends org.apache.hadoop.mapred.Mapper>). The output key
and value type should be AvroKey and
AvroValue.AvroReducer and specify
this as your job's reducer and perhaps combiner, with AvroJob.setReducerClass(org.apache.hadoop.mapred.JobConf, java.lang.Class extends org.apache.avro.mapred.AvroReducer>) and AvroJob.setCombinerClass(org.apache.hadoop.mapred.JobConf, java.lang.Class extends org.apache.avro.mapred.AvroReducer>)AvroJob.setOutputSchema(org.apache.hadoop.mapred.JobConf, org.apache.avro.Schema) with your
job's output schema.For jobs whose input is non-Avro data file and which use a
non-Avro Mapper and no reducer,
i.e., a map-only job:
JobConf.setInputFormat(java.lang.Class extends org.apache.hadoop.mapred.InputFormat>).Mapper and specify
your job's mapper with JobConf.setMapperClass(java.lang.Class extends org.apache.hadoop.mapred.Mapper>). The output key
and value type should be AvroWrapper and
NullWritable.JobConf.setNumReduceTasks(int) with zero.
AvroJob.setOutputSchema(org.apache.hadoop.mapred.JobConf, org.apache.avro.Schema) with your
job's output schema.
|
||||||||||
| PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES | |||||||||