Skip to main content

Data Visualization in Python

Now this is a topic that any respectable researcher, data scientist, or basically any person who works with numbers should be proficient in: data visualization. I believe myself to be none of the above, and while I do not consider myself to have expert knowledge in data visualization, I do believe that I have had enough experience with charts and graphs in python to speak about the subject.

I have learnt and experimented with data science and machine learning techniques in python before, and data visualization has come up almost every single time. Of course, being able to see your data is very important- it helps you identify the structure and spread of the data easily and can help you identify patterns as well. Unfortunately, every time I feel the need to fire up a bar plot, I feel this inexplicable resistance: I forgot how to. Despite being one of the easiest concepts to learn, it is difficult to master.

There are, of course, no in-built data visualization tools in python, meaning you’ll have to resort to libraries to create those graphs. The right library also changes depending on the job, and each one has their own ideology and syntax to remember. The foundational ideology of the visualization library, while it may be difficult to understand, is essential to being able to use it effectively, and simply memorizing commands without understanding what objects you’re creating and how to work with their attributes will limit your ability to use these tools effectively.

Therefore, this article is all about learning the foundation of different major data visualization tools in python – we’ll only be talking about pandas and matplotlib, perhaps the two extremes of python data visualization. However, it is indeed very important to get to know all the different tools available in python for data visualization and when to use them. After learning a bit more about pandas and matplotlib, it should become easy to understand and implement other data visualization libraries.

Pandas

If you have ever worked with python for any kind of data before, you’ve probably already used pandas. Along with numpy, pandas is usually the first library imported for any data-related project. Considering you’ll probably be using it anyway, it is pleasant to find out that pandas has a built in data visualization tool as well.

The actual implementation is simplified and doesn’t offer a finer degree of control like other libraries. However, the simpler functions are quite useful when all you want is a quick plot, without customizing every aspect of the graph for presentation. As such, this tool is extremely useful.

We start by importing pandas, and of course, we’ll need numpy as well:

import numpy as np

import pandas as pd

We’ll use a sample dataset for demonstration purposes. The Indian Cities AQI resource (www.kaggle.com/datasets/rajanbhateja/indian-cities-aqi-2020-2024) will be used to create some sample datasets:

A screenshot of a computer screen

AI-generated content may be incorrect.

An indiaAPI dataset, which records the mean concentration of different pollutants in various Indian cities.

A table with numbers and text

AI-generated content may be incorrect.

A dataset for each city recording the pollutant data each day over a 5-year period (bengaluru shown).

These datasets will be used to evaluate different plots with different libraries. To be honest, the scale of each variable makes it hard to graph this data and preserve much meaning (this is especially important because carbon monoxide must be scaled differently than all the other variables). The graphs produced in this article will probably not fly in any professional setting: the data must be adequately dealt with first. However, since I won’t be talking about data preprocessing in this post, the data will serve adequately for demonstration purposes.

Now, the purpose of pandas visualization is to provide a quick and easy data visualization solution. The syntax is, therefore, simple to understand.

Once you have a dataframe ready for plotting, you just need to call dataframe.plot(). Yes, it’s that simple. By default, it gives you a line graph, and so to specify the type of graph, for example a bar graph, you can write dataframe.plot(kind=’bar’), or just dataframe.plot.bar(). Finally, you can specify certain attributes like the x and y labels and the plot title. There are also other attributes that you can specify, but the truth is, if you’re going to take the time to mess around with those, you probably don’t want to be relying on pandas for visualization in the first place.

Here’s the bar plot for the indiaAPI dataset– without any other attributes specified. Notice how pandas intuitively understood how to structure the data into 7 bars per city.

indiaAPI.plot.bar()

A graph with different colored lines

AI-generated content may be incorrect.

Here’s a line graph for the Bengaluru dataset, without any attributes specified. Notice how pandas automatically managed the data and doesn’t throw an error despite the first two columns.

bengaluru.plot()

A graph of different colored lines

AI-generated content may be incorrect.

I find it very useful to be able to quickly visualize my data with easy-to-remember lines of code. However, if we’re creating a graph for presentation purposes, we’re going to have to be able to customize the plot further. Hence, we’ll have to resort to other libraries.

Matplotlib

Now, there’s nothing wrong with using matplotlib all the time – it’s just a little unintuitive to learn is all. Whenever I look at any code with matplotlib, the lines all go over my head: initializing figures, subplots, it’s overall confusing to look at. Once you understand it, though, it is powerful, being arguably the most customizable and versatile plotting library in python. Why, you can even animate plots using it!

The first thing to note is that we won’t be using the whole matplotlib library. For most common plots and use cases, you’ll only be working with matplotlib’s pyplot submodule. You’ll be importing it as shown:

import matplotlib.pyplot as plt

Then, you can plot your data:

plt.plot(indiaAPI)

You’ll also have to drop in a plt.show() if you’re using a regular IDE, but since I’m using Google Colab for the sake of this article, we’ll omit that. Although not the best practice, this is all we need to plot in matplotlib.

To be able to use matplotlib to its full potential, however, you’ll need to learn how to work with figs and axes.

Figure is the term for the entire page where everything is graphed. An ax is a class that represents a single plot (or subplot). You can have multiple figures, each having multiple axes.

When you call plt.plot(), matplotlib automatically creates a new figure if there isn’t one already. However, to have control over what gets plotted where, you must specify the creation of new figures. For example,

plt.plot([1, 2, 3], [4, 5, 6])

plt.plot([1, 2, 3], [6, 5, 4])

will plot two lines in the same plot, intersecting in an x-shape. However, 

plt.plot([1, 2, 3], [4, 5, 6])

plt.figure()

plt.plot([1, 2, 3], [6, 5, 4])

creates two separate graphs. Our control here, however, is still quite limited. That’s why we use the plt.subplots() function – yes, even if we only have one figure.

fig, ax = plt.subplots()

or even

fig, axs = plt.subplots(2, 3)

to create multiple axes on which to plot. Each ax is your own separate graph, on which you can plot different data simultaneously. Each fig holds different axes and can be opened and closed. You can title axes and even the whole fig, and this whole system of figures and axes, once understood, is quite useful when customizing your plots.

This is as much detail as you need to start using the matplotlib library on your own – from where you can learn new commands and methods as well as attributes for a highly customizable and powerful interface.

Now, going to plot my data, I’ve hit a brick wall. Pandas automatically structures and manages the data based on what type of plot we want. For example, pandas made a 7-bar plot for the indiaAPI dataset. For the city datasets, pandas used only the numerical data for plotting. In matplotlib, however, these burdens are placed on us instead. While this can be useful for more complex data (if the data is left for us to manage, ambiguous situations won’t cause any issues), it sure is tedious to format the data into the way it needs to be.

Making a grouped barplot, for instance, is a much more manual process in matplotlib, as you must specify the location of each bar, essentially grouping the bars by hand. Using some code I’ve taken inspiration from, below is the indiaAPI dataset in matplotlib, as well as the code required to create it:

A screen shot of a computer code

AI-generated content may be incorrect.

No, this is not a joke. This is what needs to be done to plot the data using matplotlib, as compared to the mere two lines for pandas. However, while it is true that pandas was way easier to use, I wouldn’t dare rely on it for any professional setting. Is it worth the complexity to make it look better in most scenarios though? That’s up to you to decide.

A graph with colorful lines and numbers

AI-generated content may be incorrect.

A group of graphs showing different colors

AI-generated content may be incorrect.

Looks like this article has come to an end. While I don’t intend to teach anyone anything with this post, I really hope that it will be the spark to encourage you to learn about data visualization in depth. Knowing when and where to use different libraries (and being strong enough to not shy away from the more complex ones) is an invaluable skill when it comes to working with any kind of data. I really hope you’ll take something away from this introduction.


Comments

Popular posts from this blog

Forces

 Forces Introduction Forces can be classified as a push or pull acting on an object. Everything from opening a drawer to the Earth's gravitational pull is known as a force. Can you think of other examples? Forces have the capacity to change either the state of motion or the state of mass in the object which it applies to.  You can also classify forces into two types: contact and non-contact force. However, it depends on if the forces are caused by contact or without it. They are known as contact and non-contact forces, respectively. To understand more clearly, you can refer to this example: A coin is falling from a building. Here, you will find two examples of forces. Can you find out what they are? The first force is the gravitational force, which does not touch the coin, yet it still applies a force onto it in a downwards motion. The second is the frictional force: in this case, it does have contact with the coin: air resistance is formed due to friction, meaning the air mol...

Line Following Robotics

Line Following @ DPS Monarch Tech Fest Introduction Since 2020, DPS Monarch School (Doha, Qatar), has held a Tech Fest, celebrating today’s technological advancements through various competitions and events. In the years 2023 – 2024, as well as the 2024 – 2025, I have been lucky to be selected for this gathering, in the line following robot event. Before I go on with the event itself, I would like to talk about our line following robot, and line following in general. Year 1 June 2023 was when I was first introduced to the world of line following. When I first received the opportunity to participate, I had thought, ‘How hard can it be?’ And later, after some block coding with the mBot2 , along with my other 3 team members, I found out that I was right. Using a quad RGB sensor that turned the robot whenever the line got to the edge proved to be a robust method. Hence, we had a robot that could follow a line… but the story wasn’t over. Humans are always obsessed with mak...

About Generators

  Generators An electrical generator is a device that converts energy into electric power for use in an external circuit. It was first prototyped by Michael Faraday in 1831. The electric generator works through kinetic energy, also known as motion. This energy can be procured from water turbines, wind turbines, steam turbines, and other methods. Water turbines or hydroelectric turbines are used to convert the kinetic energy in water into electricity. Usually, it is done in high-falling or fast-flowing water to generate more energy. This process is termed as hydropower. Wind turbines are used to convert the kinetic energy in wind into electricity. These are, on average, the most efficient way to generate electricity. These turbines alone do not produce much, so they are grouped in hundreds in windy areas. These are known as wind farms. Steam turbines are the most used type of electricity generation. They have a reputation as a harmful source of electricity for the Earth, alt...