Data Serialization — An Overview
Saying No to JAVA Serialization, CSV, and JSON is easy but how tough is it to choose from Apache Thrift, Protocol Buffers, and Avro? — Awesome Data serialization formats
Ram is a Java Developer. He has some data and decided to send it over the network(and get it stored somewhere so they can be reconstructed when required).
The first thought on his mind is, to rely on Java Serialization/DeSerialization. But immediately he realizes that it is slow, inefficient, and vulnerable(Absolutely debatable). Java serialization also creates a lot of garbage. Apart from all these, using java serialization will make it language-dependent so he was looking for a language agnostic approach.

Approach 1 :
CSV: Super easy to parse and read but this always doesn’t guarantee the data (Column names may or may not exist) so parsing is very tricky here.
Approach 2 :
JSON: This is lightweight, flexible, and can take any form and widely accepted approach but the major disadvantage here is the lack of strict schema and repeated keys. Yeah, it’s quite easy to err with datatypes, also the repeated keys impact the file size
Approach 3:
Let’s get serious. What are the best options in the market, That’s when he got introduced to these awesome language agnostic data serialization formats — Apache Thrift, Protocol Buffers, Avro.
Protocol Buffers: It is very fast, language/ platform neutral, and strongly typed. Supports C++, Java, Objective-C, Python, Kotlin, Dart, Go, Ruby, and C#
Apache Thrift: Statically typed, language agnostic, a complete RPC and serialization framework. Supports a lot of languages like C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, JavaScript, Node.
Apache Avro: It relies on a schema-based system. Avro can be viewed as JSON with a schema. In Avro, data is strictly typed and automatically compressed. Also, the schema evolution is always on cards. Though Avro is yet to support a lot of languages, as of now, major ones are supported (Java, C, C++, C#, Python, and Ruby.)
A detailed performance comparison of JVM Serializers can be found at https://lnkd.in/e8UNpYTY
I think Apache Thrift, Protocol Buffers, and Avro are equally impressive. As a Java Developer, I just started exploring it and found Avro to be quite handy for me but would like to know your thoughts.
Are you a fan of Apache Thrift, Protocol Buffers, or Avro? I would love to know how impactful they are in your project.