Just like importing a CSV file to a variable in python, there are several packages available for Python to import JSON files. As usual, I prefer using the pandas package. Here is an illustration of how to import a JSON file to a variable in Python using pandas.
Note: This illustration is based on a Python 3.10.x virtual environment in Visual Studio Code on macOS.
To install pandas package follow my earlier article install a python package in a virtual environment in visual studio code in detail. Here is the pip installation code for pandas.
pip install pandas
Here is the syntax of the code to be used to import a JSON file to a variable.
# Import the pandas module. import pandas as ps # Use read_json in pandas to read the JSON file and assign it to a variable. df = ps.read_json('Path of the source json file')
Here read_json is a function in pandas, which reads the JSON files and returns it as a 2 dimensional array called DataFrame.
Import a JSON file to a variable
Here is a simple example on how to import a JSON file called costs-price.json from a folder called samples using the pandas.read_json function. Then assigning the retrieved DataFrame or the 2 dimensional array to a variable. Finally print the DataFrame to the terminal. While printing to the terminal. I’m using to_string() to avoid truncation of data.
import pandas as ps jsonData = ps.read_json('samples/costs-price.json') print(jsonData.to_string())
Segregating few columns from the imported data
In the previous example, after importing the data from a JSON file, I have just printed the data. If you want to work with just a few selected columns, then you can use the pandas.DataFrame() class. Let’s modify the above example to list down the three columns hlpi_name, year, income and expenditure.
import pandas as ps jsonData = ps.read_json('samples/costs-price.json') data_partial = ps.DataFrame( jsonData, columns=['hlpi_name', 'year', 'income', 'expenditure']) print(data_partial.to_string())
Filtering rows based on column value
Furthermore, if you want to filter the number of records based on values in a column, then we can use the pandas.DataFrame.loc property. In our example, let us display only the records which are for the year 2011.
import pandas as ps jsonData = ps.read_json('samples/costs-price.json') data_partial = ps.DataFrame( jsonData, columns=['hlpi_name', 'year', 'income', 'expenditure']) data_2011 = data_partial.loc[data_partial['year'] == 2011] print(data_2011.to_string())
- More about the pandas package at GitHub.
- More about pandas.DataFrame at Pandas Documentation.