Java Serialization in Depth

Preface: Java serialization can present both performance and security vulnerabilities when compared to alternatives.

While it's internal serialization algorithm gives you the ability to serialize any object, it should be carefully considered against alternatives like Protocol buffers and Jackson JSON/XML serialization.

For information on how JSON serialization works, check out Java Object Mapper, What it is, how it works.


Java is an object oriented language. You define classes like this:

public class User {
    private String name;
    private String email;

    public void setName(String name){
        this.name = name;
    }

    public String getName(){
        return this.name;
    }
}

and create objects like this:

User user = new User();
user.setName("Sam");
user.getName(); //Sam

While the Java runtime (JVM) understands this structure, other environments may not. What if you want to save a user to a database? What if you want to write that user to a file or transfer it over a network?

Serialization makes this possible. Serialization saves a Java class as a series of bytes that can be reconstructed back into the original object by another system. The process of writing an object to a series of bytes is serialization. The process of reading those bytes back into object form is deserialization.

This article covers an in depth look at serialization including what it is, how it works, and examples.

What is serialization?

Serialization is the process of converting Java classes to data formats other systems can interpret. When you serialize an object, you write it as a byte sequence that other systems can understand. For example, a database may not know what this means...

new User()

but it sure knows what this means...

0000000    edac    0500    7273    1500    6f63    2e6d    7865    6d61
0000010    6c70    2e65    6564    6f6d    552e    6573    6372    ca34
0000020    73ee    0345    029d    0200    004c    7008    7361    7773
0000030    726f    7464    1200    6a4c    7661    2f61    616c    676e
0000040    532f    7274    6e69    3b67    004c    7508    6573    6e72
0000050    6d61    7165    7e00    0100    7078    7070
000005c

Serialization makes this conversion between objects and byte streams possible.

Serialization isn't specific to Java. Other languages like JavaScript, PhP, etc. have their own serialization mechanisms...

For example, this is how JavaScript deserializes JSON:

var obj = JSON.parse('{ "name":"Sam", "email":"sam@mail.com"}');

Why do we need serialization in Java?

Without serialization, you couldn't transfer data to other systems. For example, a web server couldn't send JSON to a web browser client. A Java entity couldn't be saved to a database without serialization.

Serialization makes the transfer of data between Java and the outside world possible. It is the link between the POJO classes you write and their representations in a file, database, or server response.

Serialization makes it possible to save/persist the state of an object. Using serialization, objects can be stored in memory.

Also remember that reading files or other input from outside sources as Java relatable objects (Strings, classes, etc) is possible because of serialization. Specifically deserialization makes it possible to consume data from other systems in language specific formats Java can understand.

Serialization in Java: How it works

The java.io library includes classes for serializing objects. You can serialize objects using the ObjectOutputStream...

FileOutputStream fos = new FileOutputStream("temp.out");
ObjectOutputStream oos = new ObjectOutputStream(fos);
User user = new User();
oos.writeObject(user);
oos.flush();
oos.close(); 

The writeObject() method starts the serialization process....

1) Metadata associated with the object's class is written as a byte stream. This metadata includes a description of the object's class and any of it's super classes.

2) Data associated with the instance is written to the byte stream. This includes all non-static, non-transient fields.

3) Metadata associated with the object's members is written to the stream. This is the same as step 1 except applied to the object's members rather than the object itself.

4) Data associated with the object's members is written to the stream. This is the same as step 2 except applied to the object's members rather than the object itself.

Notice how the ObjectOutputStream wraps a FileOutputStream. If you aren't familiar with OutputStreams or basic java io, be sure to check out FileReader vs BufferedReader vs Scanner.

You can deserialize objects using the ObjectInputStream...

FileInputStream fis = new FileInputStream("temp.out");
ObjectInputStream oin = new ObjectInputStream(fis);
try {
	User user = (User) oin.readObject();
} catch(Exception e) {
	//handle exception
}

Notice how the same file is deserialized back into an instance of User using the readObject() method. When Java deserializes data from an input source (such as a file), it reads the metadata written during serialization to understand what type of class to reconstruct.

The Java Serializable Interface

The examples above only work if the User class implements the java.io Serializable interface...

public class User implements Serializable {

Otherwise you will get an exception like:

Exception in thread "main" java.io.NotSerializableException: com.example.demo.User

This is because the Serializable interface marks a class for serialization. The Serializable interface is just a marker. It doesn't actually specify any methods. It simply tells Java "hey this can be serialized".

Only objects that implement Serializable can be written to streams. This includes class members and their implementations. For example:

public class User {
    private Address address;
}

If the Address class doesn't implement Serializable then you can't serialize User.

SerialVersionUID

Java associates an ID with each Serializable class at runtime. This ID is used to verify a source/destination is utilizing the same class during serialization.

While Java will automatically associate these IDS if they aren't defined, it is strongly recommended that these values are declared explicitly:

public class User implements Serializable {
    private static final long serialVersionUID = 7148561634028749725L;
    ...

Most IDEs make it easy enough to generate these values. You can also use the serialver utility to create these ID's from the command line.

While Java will auto generate this field if it's missing, it's strongly recommended you add these explicitly. This is because the auto generated values are highly sensitive to class details. These details are small enough to introduce inconsistencies across different Java implementations.

Long story short, declare serialVersionUID on all objects you want to serialize.

Problems with Serialization?

Serialization can have unintended consequences. For example, serialization allows unintended access to non-transient private members. This means if you declare a member private the serialization algorithm still writes the data out to whoever is reading...

Serialization also presents inconsistencies when serialVersionUID isn't explicitly defined. This is because objects are tied to classes. Classes tend to change over time. A serialized object may be unrecognized depending on what has changed.

While Java's serialization algorithm makes it possible to serialize any object, alternatives like Protocol buffers and Jackson's JSON parser can be better for serializing objects.

Java Serialization Example

package com.example.demo;

import java.io.Serializable;

public class User implements Serializable {
    private static final long serialVersionUID = 7148561634028749725L;
    private String username;
    private transient String password;


    public void setUserName(String username){
        this.username = username;
    }

    public String getUserName(){
        return this.username;
    }

    public void setPassword(String password){
        this.password = password;
    }

    public String getPassword(){
        return this.password;
    }

}
package com.example.demo;

import java.io.*;

public class DemoApplication {

	public static void main(String[] args) throws IOException {

		FileOutputStream fos = new FileOutputStream("temp.out");
		ObjectOutputStream oos = new ObjectOutputStream(fos);
		User user = new User();

		user.setUserName("Sam");
		user.setPassword(("password123"));

		oos.writeObject(user);
		oos.flush();
		oos.close();

		FileInputStream fis = new FileInputStream("temp.out");
		ObjectInputStream oin = new ObjectInputStream(fis);
		try {
			User userFromFile = (User) oin.readObject();
			System.out.println(userFromFile.getUserName()); //Sam
			System.out.println(userFromFile.getPassword()); //null because transient field isn't serialized.
		} catch(Exception e) {
			//handle exception
		}
	}

}

Notice how the User class implements the Serializable interface. Also notice how a serialVersionUID is explicitly defined for a serializable class.

Notice the use of transient. This keyword excludes the password from serialization. Notice how userFromFile.getPassword() returns null because of this.

See how ObjectOutputStream and ObjectInputStream are used to perform serialization/deserialization on input streams of data (in this case a file).

Conclusion

Serialization is the process of writing objects to byte streams that other systems can understand. Serialization makes it possible to transfer data to other systems like web clients, datastores, and save an object's state in memory.

Serialization isn't specific to Java. It's a more universal process used to transfer and reconstruct data structures in a platform agnostic way. Serialization is important to Java because it translates Java POJOS to entities other systems can understand.

Your thoughts?

|

great read