apache_avro/documentation/primer.rs
1// Licensed to the Apache Software Foundation (ASF) under one
2// or more contributor license agreements. See the NOTICE file
3// distributed with this work for additional information
4// regarding copyright ownership. The ASF licenses this file
5// to you under the Apache License, Version 2.0 (the
6// "License"); you may not use this file except in compliance
7// with the License. You may obtain a copy of the License at
8//
9// http://www.apache.org/licenses/LICENSE-2.0
10//
11// Unless required by applicable law or agreed to in writing,
12// software distributed under the License is distributed on an
13// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14// KIND, either express or implied. See the License for the
15// specific language governing permissions and limitations
16// under the License.
17
18//! # A primer on Apache Avro
19//!
20//! Avro is a schema based encoding system, like Protobuf. This means that if you have raw Avro data
21//! without a schema, you are unable to decode it. It also means that the format is very space
22//! efficient.
23//!
24//! ## Schemas
25//!
26//! Schemas are defined in JSON and look like this:
27//! ```json
28//! {
29//! "type": "record",
30//! "name": "example",
31//! "fields": [
32//! {"name": "a", "type": "long", "default": 42},
33//! {"name": "b", "type": "string"}
34//! ]
35//! }
36//! ```
37//! For all possible types and extra attributes, see [the schema section of the specification].
38//!
39//! Schemas can depend on each other. For example, the schema defined above can be used again or a
40//! schema can include itself:
41//! ```json
42//! {
43//! "type": "record",
44//! "name": "references",
45//! "fields": [
46//! {"name": "a", "type": "example"},
47//! {"name": "b", "type": "bytes"},
48//! {"name": "recursive", "type": ["null", "references"]}
49//! ]
50//! }
51//! ```
52//!
53//! Schemas are represented using the [`Schema`](crate::Schema) type.
54//!
55//! [the schema section of the specification]: https://avro.apache.org/docs/++version++/specification/#schema-declaration
56//!
57//! ## Data serialization and deserialization
58//! There are various formats to encode and decode Avro data. Most formats use the Avro binary encoding.
59//!
60//! #### [Object Container File](https://avro.apache.org/docs/++version++/specification/#object-container-files)
61//! This is the most common file format used for Avro, it uses the binary encoding. It includes the
62//! schema in the file, and can therefore be decoded by a reader who doesn't have the schema. It includes
63//! many records in one file.
64//!
65//! This file format can be used via the [`Reader`](crate::Reader) and [`Writer`](crate::Writer) types.
66//!
67//! #### [Single Object Encoding](https://avro.apache.org/docs/++version++/specification/#single-object-encoding)
68//! This file format also uses the binary encoding, but the schema is not included directly. It instead
69//! includes a fingerprint of the schema, which a reader can look up in a schema database or compare
70//! with the fingerprint that the reader is expecting. This file format always contains one record.
71//!
72//! This file format can be used via the [`GenericSingleObjectReader`](crate::GenericSingleObjectReader),
73//! [`GenericSingleObjectWriter`](crate::GenericSingleObjectWriter), [`SpecificSingleObjectReader`](crate::SpecificSingleObjectReader),
74//! and [`SpecificSingleObjectWriter`](crate::SpecificSingleObjectWriter) types.
75//!
76//! #### Avro datums
77//! This is not really a file format, as it's just the raw Avro binary data. It does not include a
78//! schema and can therefore not be decoded without the reader knowing **exactly** which schema was
79//! used to write it.
80//!
81//! This file format can be used via the [`to_avro_datum`](crate::to_avro_datum), [`from_avro_datum`](crate::from_avro_datum),
82//! [`to_avro_datum_schemata`](crate::to_avro_datum_schemata), [`from_avro_datum_schemata`](crate::from_avro_datum_schemata),
83//! [`from_avro_datum_reader_schemata`](crate::from_avro_datum_reader_schemata), and
84//! [`write_avro_datum_ref`](crate::write_avro_datum_ref) functions.
85//!
86//! #### [Avro JSON](https://avro.apache.org/docs/++version++/specification/#json-encoding)
87//! Not be confused with the schema definition which is also in JSON. This is the Avro data encoded
88//! in JSON.
89//!
90//! It can be used via the [`From<serde_json::Value> for Value`](crate::types::Value) and
91//! [`TryFrom<Value> for serde_json::Value`](crate::types::Value) implementations.
92//!
93//! ## Compression
94//! For records with low entropy it can be useful to compress the encoded data. Using the [Object Container File format](#object-container-file)
95//! this is directly possible in Avro. Avro supports various compression codecs:
96//!
97//! - deflate
98//! - bzip2
99//! - Snappy
100//! - XZ
101//! - Zstandard
102//!
103//! All readers are required to implement the `deflate` codec, but most implementations implement most
104//! codecs.
105//!