Package org.apache.avro.file
Class DataFileWriter<D>
java.lang.Object
org.apache.avro.file.DataFileWriter<D>
- All Implemented Interfaces:
Closeable
,Flushable
,AutoCloseable
Stores in a file a sequence of data conforming to a schema. The schema is
stored in the file with the data. Each datum in a file is of the same schema.
Data is written with a
DatumWriter
. Data is grouped into
blocks. A synchronization marker is written between blocks, so that
files may be split. Blocks may be compressed. Extensible metadata is stored
at the end of the file. Files may be appended to.- See Also:
-
Nested Class Summary
Modifier and TypeClassDescriptionstatic class
Thrown byappend(Object)
when an exception occurs while writing a datum to the buffer. -
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionvoid
Append a datum to the file.void
appendAllFrom
(DataFileStream<D> otherFile, boolean recompress) Appends data from another file. otherFile must have the same schema.void
appendEncoded
(ByteBuffer datum) Expert: Append a pre-encoded datum to the file.Open a writer appending to an existing file.appendTo
(SeekableInput in, OutputStream out) Open a writer appending to an existing file.void
close()
Flush and close the file.Open a new file for data matching a schema with a random sync.create
(Schema schema, OutputStream outs) Open a new file for data matching a schema with a random sync.create
(Schema schema, OutputStream outs, byte[] sync) Open a new file for data matching a schema with an explicit sync.void
flush()
Calls sync() and then flushes the current state of the file.void
fSync()
If this writer was instantiated using a File, FileOutputStream or Syncable instance, this method flushes all buffers for this writer to disk.boolean
static boolean
isReservedMeta
(String key) Configures this writer to use the given codec.setEncoder
(Function<OutputStream, BinaryEncoder> initEncoderFunc) Allows setting a different encoder than the default DirectBinaryEncoder.void
setFlushOnEveryBlock
(boolean flushOnEveryBlock) Set whether this writer should flush the block to the stream every time a sync marker is written.Set a metadata property.Set a metadata property.Set a metadata property.setSyncInterval
(int syncInterval) Set the synchronization interval for this file, in bytes.long
sync()
Return the current position as a value that may be passed toDataFileReader.seek(long)
.
-
Constructor Details
-
DataFileWriter
Construct a writer, not yet open.
-
-
Method Details
-
setCodec
Configures this writer to use the given codec. May not be reset after writes have begun. -
setSyncInterval
Set the synchronization interval for this file, in bytes. Valid values range from 32 to 2^30 Suggested values are between 2K and 2M The stream is flushed by default at the end of each synchronization interval. If setFlushOnEveryBlock(boolean) is called with param set to false, then the block may not be flushed to the stream after the sync marker is written. In this case, the flush() must be called to flush the stream. Invalid values throw IllegalArgumentException- Parameters:
syncInterval
- the approximate number of uncompressed bytes to write in each block- Returns:
- this DataFileWriter
-
setEncoder
Allows setting a different encoder than the default DirectBinaryEncoder.- Parameters:
initEncoderFunc
- Function to create a binary encoder- Returns:
- this DataFileWriter
-
create
Open a new file for data matching a schema with a random sync.- Throws:
IOException
-
create
Open a new file for data matching a schema with a random sync.- Throws:
IOException
-
create
Open a new file for data matching a schema with an explicit sync.- Throws:
IOException
-
setFlushOnEveryBlock
public void setFlushOnEveryBlock(boolean flushOnEveryBlock) Set whether this writer should flush the block to the stream every time a sync marker is written. By default, the writer will flush the buffer each time a sync marker is written (if the block size limit is reached or the sync() is called.- Parameters:
flushOnEveryBlock
- - If set to false, this writer will not flush the block to the stream until flush() is explicitly called.
-
isFlushOnEveryBlock
public boolean isFlushOnEveryBlock()- Returns:
- - true if this writer flushes the block to the stream every time a sync marker is written. Else returns false.
-
appendTo
Open a writer appending to an existing file.- Throws:
IOException
-
appendTo
Open a writer appending to an existing file. Since 1.9.0 this method does not close in.- Parameters:
in
- reading the existing file.out
- positioned at the end of the existing file.- Throws:
IOException
-
setMeta
Set a metadata property. -
isReservedMeta
-
setMeta
Set a metadata property. -
setMeta
Set a metadata property. -
append
Append a datum to the file.- Throws:
IOException
- See Also:
-
appendEncoded
Expert: Append a pre-encoded datum to the file. No validation is performed to check that the encoding conforms to the file's schema. Appending non-conforming data may result in an unreadable file.- Throws:
IOException
-
appendAllFrom
Appends data from another file. otherFile must have the same schema. Data blocks will be copied without de-serializing data. If the codecs of the two files are compatible, data blocks are copied directly without decompression. If the codecs are not compatible, blocks from otherFile are uncompressed and then compressed using this file's codec. If the recompress flag is set all blocks are decompressed and then compressed using this file's codec. This is useful when the two files have compatible compression codecs but different codec options. For example, one might append a file compressed with deflate at compression level 1 to a file with deflate at compression level 7. If recompress is false, blocks will be copied without changing the compression level. If true, they will be converted to the new compression level.- Parameters:
otherFile
-recompress
-- Throws:
IOException
-
sync
Return the current position as a value that may be passed toDataFileReader.seek(long)
. Forces the end of the current block, emitting a synchronization marker. By default, this will also flush the block to the stream. If setFlushOnEveryBlock(boolean) is called with param set to false, then this method may not flush the block. In this case, the flush() must be called to flush the stream.- Throws:
IOException
-
flush
Calls sync() and then flushes the current state of the file.- Specified by:
flush
in interfaceFlushable
- Throws:
IOException
-
fSync
If this writer was instantiated using a File, FileOutputStream or Syncable instance, this method flushes all buffers for this writer to disk. In other cases, this method behaves exactly like flush().- Throws:
IOException
-
close
Flush and close the file.- Specified by:
close
in interfaceAutoCloseable
- Specified by:
close
in interfaceCloseable
- Throws:
IOException
-