Class AvroSequenceFile

java.lang.Object
org.apache.avro.hadoop.io.AvroSequenceFile

public class AvroSequenceFile extends Object
A wrapper around a Hadoop SequenceFile that also supports reading and writing Avro data.

The vanilla Hadoop SequenceFile contains a header followed by a sequence of records. A record consists of a key and a value. The key and value must either:

  • implement the Writable interface, or
  • be accepted by a Serialization registered with the SerializationFactory.

Since Avro data are Plain Old Java Objects (e.g., Integer for data with schema "int"), they do not implement Writable. Furthermore, a Serialization implementation cannot determine whether an object instance of type CharSequence that also implements Writable should be serialized using Avro or WritableSerialization.

The solution implemented in AvroSequenceFile is to:

  • wrap Avro key data in an AvroKey object,
  • wrap Avro value data in an AvroValue object,
  • configure and register AvroSerialization with the SerializationFactory, which will accept only objects that are instances of either AvroKey or AvroValue, and
  • store the Avro key and value schemas in the SequenceFile header.
  • Field Details

    • METADATA_FIELD_KEY_SCHEMA

      public static final Text METADATA_FIELD_KEY_SCHEMA
      The SequenceFile.Metadata field for the Avro key writer schema.
    • METADATA_FIELD_VALUE_SCHEMA

      public static final Text METADATA_FIELD_VALUE_SCHEMA
      The SequenceFile.Metadata field for the Avro value writer schema.
  • Method Details

    • createWriter

      public static SequenceFile.Writer createWriter(AvroSequenceFile.Writer.Options options) throws IOException
      Creates a writer from a set of options.

      Since there are different implementations of Writer depending on the compression type, this method constructs the appropriate subclass depending on the compression type given in the options.

      Parameters:
      options - The options for the writer.
      Returns:
      A new writer instance.
      Throws:
      IOException - If the writer cannot be created.