Serializing Data Using Google ProtoBuff

1. Overview

In this post, we will see how to use Google Protocol Buffers a language-neutral, platform-neutral, extensible and a popular data format to serialize data structure. We can define a protocol message in a .proto file and then using compiler we can generate code in languages like Java, Python, Go, C++, C#.

There are many advantages to using Google ProtoBuffer like it's a platform-neutral, you can share data between two different languages. Backward compatibility can be handled easily, you can add new data to your format and your old generated code will work without any issue. The binary format generated by the google protobuf is very small than a normal json and conversion/parsing is very fast. The generated code also provides methods to create our protocol object instance and read-write it to any byte stream like a file, network etc saving us from writing the complex logic to encode decode data 😴.

2. Defining Maven Dependency

Once we define the message and generate the code we need the code to be used in our java project and for this, we need to add google proto buffer dependency in our maven project.

<dependency>
    <groupId>com.google.protobuf</groupId>
    <artifactId>protobuf-java</artifactId>
    <version>3.5.0</version>
</dependency>

3. Defining a Protobuf Message

For defining a new protocol we need to create a .proto file, there is a standard format described according to which we have to write our .proto. We will write a very simple protocol in a protobuf format:

message Post {
    string id = 1;
    string title = 2;
}

Defining a protocol message is simple, the above example defines a Post.proto message. It has two fields id and title of type string, similarly we can have fields of other scalar types like number, double, or another message field. Each field is given a unique tag number like = 1, this tag number is used to identify the message in the binary format, field marked 1 will be saved first then field marked 2 and so on. You can read more about the structure of .proto file here.

Let's take a more complicated message format.

syntax="proto3";

package com.infinityworks.google.model;

option java_outer_classname="LibraryModel";

message Book{
    string isbn = 1;
    string title = 2;
    string publish_date = 3;
    repeated Author author = 4;
    Genre genre = 5;

    message Author{
        string name = 1;
    }

    enum Genre{
        UNKNOWN = 0;
        SELF_DEVELOPMENT = 1;
        BUSINESS = 2;
        TECHNOLOGY = 3;
        ROMANCE = 4;
        THRILLER = 5;
    }
}

message Library{
    repeated Book books = 1;
}

The first line syntax="proto3" tells the compiler that we are using protobuff version 3. If we omit this line the compiler will compile the protocol using version 2. package and java_outer_classname options provides package and class name related information to the compiler, package is the java package in which the classes are put and classname is the outerclass name that will hold all the message type in our single .proto file, if java_outer_classname is not present then the name of first message name will be used as a class name in this case it will be BookOuterClass.java.

The message Book has five elements isbn, title, published_date of scalar type and author, genre of other message and enum type. Variables which are longer like published_date should be separated by _ and all the variable names should be small case. repeated is a special keyword both in proto2 and proto3, when we want to have multiple elements of the same type say a collection of same element type like in our case a Book can have one or more Author's then mark such field as repeated, the compiler generates list based java variable to hold multiple values of the same type and also provides builder for Author.

We have another message type Library which contains collection of our Book objects.

4. Generating Code From a .proto File

Once we have defined our proto file we can generate code for it. To generate the code install the protobuff compiler to your local machine and add it to your environment path. Then run the below command.

$> protoc -I=. --java_out=./src/main/java Library.proto

The protoc command will generate all java class files from Library.proto. The compiler also provides error information for any issue with the .proto file. The -I option specifies a directory where all the proto files resides. The --java_out specifies the directory where the generated class files will be created.

The generated class files have Data access classes with the setter, getter methods and provide fluent API builder using which you can generate the instance of the protocol. It also provides some additional methods to serialize and deserialize the instance to the binary format.

5. Building Instance of a Protobuffer Defined Message

Once we have all the classes generated by compiler we can create our message instance for Book class.

LibraryModel.Book book = LibraryModel.Book.newBuilder()
                .setIsbn("1213-2322-3212-1231")
                .setTitle("The Next Imaginary Book of My Mind")
                .addAuthor(LibraryModel.Book.Author.newBuilder().setName("Ninad").build())
                .setGenre(LibraryModel.Book.Genre.THRILLER)
                .build();

assertEquals("1213-2322-3212-1231", book.getIsbn());
assertEquals("The Next Imaginary Book of My Mind", book.getTitle());
assertEquals(LibraryModel.Book.Genre.THRILLER, book.getGenre());
assertEquals("Ninad", book.getAuthor(0).getName());

We can create a fluent builder using the newBuilder() method of LibraryModel.Book class, the builder provides methods to set the field data and the build() method will create the object of the Book class. The good thing about the builder api is that the instance generated after calling the .build() method are immutable.

6. Serializing and Deserializing Protobuffer Message

When we have our object of Book class, we want to persist it to disc or send it across the network to some other system. We want to write it into a binary format that can easily be read. The generated class provide methods to write the object to any byte stream.

LibraryModel.Library library = LibraryModel.Library.newBuilder().addBooks(book).build();
FileOutputStream outputStream = new FileOutputStream("d:/data.dat");
library.writeTo(outputStream);

We have added the book object that we have created to our library builder, and using the .writeTo() method we save the library object to a file.

We can read the binary encoded format to load the saved data. The utility method .parseFrom() takes the stream from which data can be loaded.

LibraryModel.Library parsedLibrary = LibraryModel.Library.parseFrom(new FileInputStream("d:/data.dat"));
        LibraryModel.Book book1 = parsedLibrary.getBooks(0);

assertEquals("1213-2322-3212-1231", book1.getIsbn());
assertEquals("The Next Imaginary Book of My Mind", book1.getTitle());
assertEquals(LibraryModel.Book.Genre.THRILLER, book.getGenre());
assertEquals("Ninad", book1.getAuthor(0).getName());

7. Conclusion

Thus, we have seen how to serialize and deserialize data to Google Protocol Buffer binary format.

We used basic example to understand how to define a protocol buffer message and use automatic code generation tool to create java based class file. Next, we saw how to serialize and deserialize the data using the generated files.

The examples and java implementation for the above examples can be found in the Github Project. Read more about the Google Protocol Buffer here.