Class DataFileWriter<D>

java.lang.Object
org.apache.avro.file.DataFileWriter<D>
All Implemented Interfaces:
Closeable, Flushable, AutoCloseable

public class DataFileWriter<D> extends Object implements Closeable, Flushable
Stores in a file a sequence of data conforming to a schema. The schema is stored in the file with the data. Each datum in a file is of the same schema. Data is written with a DatumWriter. Data is grouped into blocks. A synchronization marker is written between blocks, so that files may be split. Blocks may be compressed. Extensible metadata is stored at the end of the file. Files may be appended to.
See Also:
  • Constructor Details

    • DataFileWriter

      public DataFileWriter(DatumWriter<D> dout)
      Construct a writer, not yet open.
  • Method Details

    • setCodec

      public DataFileWriter<D> setCodec(CodecFactory c)
      Configures this writer to use the given codec. May not be reset after writes have begun.
    • setSyncInterval

      public DataFileWriter<D> setSyncInterval(int syncInterval)
      Set the synchronization interval for this file, in bytes. Valid values range from 32 to 2^30 Suggested values are between 2K and 2M The stream is flushed by default at the end of each synchronization interval. If setFlushOnEveryBlock(boolean) is called with param set to false, then the block may not be flushed to the stream after the sync marker is written. In this case, the flush() must be called to flush the stream. Invalid values throw IllegalArgumentException
      Parameters:
      syncInterval - the approximate number of uncompressed bytes to write in each block
      Returns:
      this DataFileWriter
    • setEncoder

      public DataFileWriter<D> setEncoder(Function<OutputStream,BinaryEncoder> initEncoderFunc)
      Allows setting a different encoder than the default DirectBinaryEncoder.
      Parameters:
      initEncoderFunc - Function to create a binary encoder
      Returns:
      this DataFileWriter
    • create

      public DataFileWriter<D> create(Schema schema, File file) throws IOException
      Open a new file for data matching a schema with a random sync.
      Throws:
      IOException
    • create

      public DataFileWriter<D> create(Schema schema, OutputStream outs) throws IOException
      Open a new file for data matching a schema with a random sync.
      Throws:
      IOException
    • create

      public DataFileWriter<D> create(Schema schema, OutputStream outs, byte[] sync) throws IOException
      Open a new file for data matching a schema with an explicit sync.
      Throws:
      IOException
    • setFlushOnEveryBlock

      public void setFlushOnEveryBlock(boolean flushOnEveryBlock)
      Set whether this writer should flush the block to the stream every time a sync marker is written. By default, the writer will flush the buffer each time a sync marker is written (if the block size limit is reached or the sync() is called.
      Parameters:
      flushOnEveryBlock - - If set to false, this writer will not flush the block to the stream until flush() is explicitly called.
    • isFlushOnEveryBlock

      public boolean isFlushOnEveryBlock()
      Returns:
      - true if this writer flushes the block to the stream every time a sync marker is written. Else returns false.
    • appendTo

      public DataFileWriter<D> appendTo(File file) throws IOException
      Open a writer appending to an existing file.
      Throws:
      IOException
    • appendTo

      public DataFileWriter<D> appendTo(SeekableInput in, OutputStream out) throws IOException
      Open a writer appending to an existing file. Since 1.9.0 this method does not close in.
      Parameters:
      in - reading the existing file.
      out - positioned at the end of the existing file.
      Throws:
      IOException
    • setMeta

      public DataFileWriter<D> setMeta(String key, byte[] value)
      Set a metadata property.
    • isReservedMeta

      public static boolean isReservedMeta(String key)
    • setMeta

      public DataFileWriter<D> setMeta(String key, String value)
      Set a metadata property.
    • setMeta

      public DataFileWriter<D> setMeta(String key, long value)
      Set a metadata property.
    • append

      public void append(D datum) throws IOException
      Append a datum to the file.
      Throws:
      IOException
      See Also:
    • appendEncoded

      public void appendEncoded(ByteBuffer datum) throws IOException
      Expert: Append a pre-encoded datum to the file. No validation is performed to check that the encoding conforms to the file's schema. Appending non-conforming data may result in an unreadable file.
      Throws:
      IOException
    • appendAllFrom

      public void appendAllFrom(DataFileStream<D> otherFile, boolean recompress) throws IOException
      Appends data from another file. otherFile must have the same schema. Data blocks will be copied without de-serializing data. If the codecs of the two files are compatible, data blocks are copied directly without decompression. If the codecs are not compatible, blocks from otherFile are uncompressed and then compressed using this file's codec.

      If the recompress flag is set all blocks are decompressed and then compressed using this file's codec. This is useful when the two files have compatible compression codecs but different codec options. For example, one might append a file compressed with deflate at compression level 1 to a file with deflate at compression level 7. If recompress is false, blocks will be copied without changing the compression level. If true, they will be converted to the new compression level.

      Parameters:
      otherFile -
      recompress -
      Throws:
      IOException
    • sync

      public long sync() throws IOException
      Return the current position as a value that may be passed to DataFileReader.seek(long). Forces the end of the current block, emitting a synchronization marker. By default, this will also flush the block to the stream. If setFlushOnEveryBlock(boolean) is called with param set to false, then this method may not flush the block. In this case, the flush() must be called to flush the stream.
      Throws:
      IOException
    • flush

      public void flush() throws IOException
      Calls sync() and then flushes the current state of the file.
      Specified by:
      flush in interface Flushable
      Throws:
      IOException
    • fSync

      public void fSync() throws IOException
      If this writer was instantiated using a File, FileOutputStream or Syncable instance, this method flushes all buffers for this writer to disk. In other cases, this method behaves exactly like flush().
      Throws:
      IOException
    • close

      public void close() throws IOException
      Flush and close the file.
      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface Closeable
      Throws:
      IOException