Python: flattening a JSON

In this tutorial we will learn how to flatten a JSON with Python and the flatten_json module.

Introduction

In this tutorial we will learn how to flatten a JSON with Python and the flatten_json module.

You can install this module with pip (a Python package installer) simply by sending the following command:

pip install flatten_json

Note: in case you run into problems importing the flatten_json module because of six not being installed, simply install it with the command shown below:

pip install six

The flattening procedure is useful when we have a complex JSON object and we want to obtain a new object with only one level deep, independently of how nested the original object was.

In the flattened object, the property names will correspond to the full path of each property in the original object. This means that the information about the original object is still preserved and the process can be reversed.

The reversing of the flatten procedure is called unflatten and it is also supported by the module we are using, as can be seen here. Nonetheless, in this tutorial, we are only going to cover the flatten part.

Looking into a more concrete example of what to expect, let’s assume that we have the following JSON object:

{
   "a": {
       "b": "nestedVal1",
       "c": "nestedVal2"
   }
}

If we apply the flattening procedure to such JSON, we would obtain the following:

{
  "a.b": "nestedVal1",
  "a.c": "nestedVal2"
}

So, we have now a one level deep object but we can still infer what was the original structure of the JSON. Note that “.” was used here as separator character in the path of each property. The module we are using allows us to set a custom separator, as can be seen in the documentation.

Another important thing to mention is the format of arrays when they are flattened. Let’s assume that we have the following original object:

{
    "arr": [10, 20, 30]
}

If we flatten it, we end up with the following:

{
  "arr.0": 10,
  "arr.1": 20,
  "arr.2": 30
}

For this use case, each element of the array is mapped to a different property in the flattened JSON and the index of each element is contained on the path.

This tutorial was tested with version 3.7.2 of Python. If you are looking for a similar tutorial in JavaScript, please check here.

The code

We will start by importing the flatten_json module we have just installed, so later we can use the flatten function on our JSON.

import flatten_json

We are also going to import the json module, which allows us to serialize a Python dictionary into a JSON string.

During our program, we are going to represent our original JSON structure as a Python dictionary and the result of flattening it will also be a dictionary. So, at the end, we will convert it to a pretty printed JSON string, which will allows us to better analyze the result.

import json

After this we are going to define a dictionary that could hypothetically represent a person entity. This will represent the JSON we want to flatten.

Our person will contain some use cases for us to analyze how the flatten_json module behaves:

  • name and age will be two properties without any nesting. name is a string and age a number;
  • addr will be a property that corresponds to an object with two nested properties: city and postCode. The values of these two are strings, which means addr nesting is only 1 level deep;
  • family is an array of strings, so we can analyze how arrays are flattened;
  • nestedObject is another property that has nesting 4 levels deep, so we can check a more complex use case.
person = {
   "name": "Terry",
   "age": 22,
   "addr": {
       "city": "Lisbon",
       "postCode": "7557-40"
   },
   "family": ["John", "Steve", "May"],
   "nestedObject": { "a": { "b": { "c": {"d": "nested property" } } } }
}

Now, to perform the flattening of our dictionary, we simply call the flatten function of the flatten_json module we have imported before.

As first input we pass our dictionary and as second optional input we specify a string that will be used as separator character for the path of properties.

We will be using “.” as our separator character. By default, if nothing is specified, the library will use the “_” character. Note that we can pass an arbitrary string, it doesn’t need to be a single character.

As output, this function call will return a new dictionary which will correspond to the flattened JSON.

flat = flatten_json.flatten(person, ".")

As mentioned before, so far we have been working with dictionaries to represent our original JSON and its flattened version. In order to convert it to a string, we simply need to call the dumps function from the json module.

As first input of the dumps function we will pass our flat dictionary. Additionally, we will set the optional argument indent to the value 2, so we obtain a pretty printed version of our JSON with that number of indents.

We will print the result directly to the prompt.

print(json.dumps(flat, indent = 2))

The final code can be seen below.

import flatten_json
import json

person = {
   "name": "Terry",
   "age": 22,
   "addr": {
       "city": "Lisbon",
       "postCode": "7557-40"
   },
   "family": ["John", "Steve", "May"],
   "nestedObject": { "a": { "b": { "c": {"d": "nested property" } } } }
}

flat = flatten_json.flatten(person, ".")

print(json.dumps(flat, indent = 2))

Testing the code

To test the code, simply run it using a tool of your choice. In my case I’m using IDLE.

You should get an output similar to figure 1. As can be seen, we have obtained a flattened version of our original JSON, following the rules described on the introductory section.

Output of the program, showing the flattened JSON on IDLE, a Python IDE.
Figure 1 – Output of the program, showing the flattened JSON.

Leave a Reply