Data Serialization — An Overview

Arun Prakash
2 min readJul 30, 2022

Saying No to JAVA Serialization, CSV, and JSON is easy but how tough is it to choose from Apache Thrift, Protocol Buffers, and Avro? — Awesome Data serialization formats

Ram is a Java Developer. He has some data and decided to send it over the network(and get it stored somewhere so they can be reconstructed when required).

The first thought on his mind is, to rely on Java Serialization/DeSerialization. But immediately he realizes that it is slow, inefficient, and vulnerable(Absolutely debatable). Java serialization also creates a lot of garbage. Apart from all these, using java serialization will make it language-dependent so he was looking for a language agnostic approach.

Image source: google

Approach 1 :
CSV: Super easy to parse and read but this always doesn’t guarantee the data (Column names may or may not exist) so parsing is very tricky here.

Approach 2 :
JSON: This is lightweight, flexible, and can take any form and widely accepted approach but the major disadvantage here is the lack of strict schema and repeated keys. Yeah, it’s quite easy to err with datatypes, also the repeated keys impact the file size

Approach 3:
Let’s get serious. What are the best options in the market, That’s when he got introduced to these awesome language agnostic data serialization formats — Apache Thrift, Protocol Buffers, Avro.

Protocol Buffers: It is very fast, language/ platform neutral, and strongly typed. Supports C++, Java, Objective-C, Python, Kotlin, Dart, Go, Ruby, and C#

Apache Thrift: Statically typed, language agnostic, a complete RPC and serialization framework. Supports a lot of languages like C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, JavaScript, Node.

Apache Avro: It relies on a schema-based system. Avro can be viewed as JSON with a schema. In Avro, data is strictly typed and automatically compressed. Also, the schema evolution is always on cards. Though Avro is yet to support a lot of languages, as of now, major ones are supported (Java, C, C++, C#, Python, and Ruby.)

A detailed performance comparison of JVM Serializers can be found at https://lnkd.in/e8UNpYTY

I think Apache Thrift, Protocol Buffers, and Avro are equally impressive. As a Java Developer, I just started exploring it and found Avro to be quite handy for me but would like to know your thoughts.

Are you a fan of Apache Thrift, Protocol Buffers, or Avro? I would love to know how impactful they are in your project.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Arun Prakash
Arun Prakash

Written by Arun Prakash

I write about Cloud, DevOps and SRE Stuffs! Passionate about Security !

No responses yet

Write a response