Intro to Voilà#

In this section, we will learn how to use the Voilà Python library to create a dashboard from our SQL queries and visualizations. Additionally, we will explore how the ETL (Extract, Transform, Load) - EDA (Exploratory Data Analysis) pipeline, introduced in the previous module, are integrated with the dashboard, examine the dashboard’s structure and deployment, and discuss interesting insights gathered from it.

What is Voilà?#

Voilà is a Python library that allows users to effortlessly create standalone web applications from Jupyter notebooks. Voilà takes the output of your notebook, while hiding code cells by default, and renders it in a web browser, so that you can share your work or use it in a production setting. With the help of ipywidgets, we can transform the rendered web application into an interactive dashboard and this module does covers it in detail!

Moreover, Voilà offers several ways to customize your dashboard, including changing themes, creating templates, and controlling cell output. This allows you to create a visually aesthetic dashboard from a Jupyter notebook with minimal additional code! Markdown cells are also displayed in the dashboard, allowing you to add text and images to your dashboard.

Install Voilà with the following command:

!pip install voila

Important

This section assumes comfort working with ipywidgets. To get an introduction to working with ipywidgets, please review the introductory section to ipywidgets, the section on query parameterization as well as the section on interactive queries with JupySQL

Questions to Answer in the Dashboard#

The interactive dashboard contains 4 tables and 5 plots, created using JupySQL’s ggplot API, seaborn, and ipywidgets. It answers the following questions:

  1. How do yearly manufacturing trends of fuel-only, electric, and hybrid cars compare?

  2. How are fuel consumption and \(CO2\) emissions distributed for all types of cars?

  3. What is the relationship between charging time and travel range for electric vehicles by car size and model year?

  4. How are \(CO_2\) emissions distributed by vehicle type (fuel-only, electric, and hybrid) and fuel type (gasoline, diesel, ethanol, natural gas, and electricty)?

  5. Which US fuel-only and hybrid car manufacturers emit the least \(CO_2\) and how does this differ by transmission type?

Directory Structure#

For this blog, we will assume the following directory structure:

├── environment.yml
├── pipeline
│   ├── pipeline.yaml
│   ├── data
│      ├── database
│         ├── car_data.duckdb
│   ├── products
│   ├── src
│      ├── menu.py
│      ├── dashboard.py
│      ├── voila-app.ipynb
│      ├── datadownload.py
└── README.md

The voila-app.ipynb file in the sql/pipeline/src directory serves as our dashboard notebook. Additionally, all necessary files and modules, including the menu.py and dashboard.py files, are in the same directory as the notebook.

Dependencies#

The environment.yml file in the root directory of this course contains the dependencies for the project. You can create a conda environment using that file and then activate the environment to run voila-app.ipynb, as shown below:

conda env create -f environment.yml
conda activate sql-course

Running the Pipeline#

If you have not already done so, you need to build, package and run the pipeline by following the steps outlined in the previous module.

After running the pipeline, you will have the car_data.duckdb database, under pipeline/data/database, ready to be queried by the dashboard.

To execute the pipeline, we run the following command in the terminal from:

cd pipeline && ploomber build

This yields:

Loading pipeline...
Executing: 100%|███████████████████████████████████████████████████████████████████████████████| 18/18 [00:10<00:00,  1.69cell/s]
Executing:  29%|██████████████████████▊                                                         | 8/28 [00:10<00:25,  1.28s/cell]
Building task 'eda-pipeline': 100%|████████████████████████████████████████████████████████████████| 2/2 [00:20<00:00, 10.48s/it]
name          Ran?      Elapsed (s)    Percentage
------------  ------  -------------  ------------
datadownload  True          10.7184        51.168
eda-pipeline  True          10.2291        48.832

ETL and Voilà#

The Ploomber pipeline will store the tables into a database file under /pipeline/data/database called car_data.duckdb. We can connect our dashboard to the DuckDB instance and generate queries for our visualizations.

Note: The ../ prefix is not required if the database is in the same directory as the notebook.

The pipeline process entailed above can be better understood with the following diagram:

ETL Pipeline

Dashboard Structure#

You can find the Jupyter notebook with the Voilà app here. The dashboard also uses custom helper scripts:

Introduction and Tables#

The dashboard, firstly, needs to have a relevant title and description for the user to understand what the dashboard is about. The date the fuel emissions data was last updated is also displayed because the data is updated monthly and, accordingly, the tables and visualizations may have novel insights since the last update. Next, we display the interactive table, integrated with ipywidgets and outputted from our ETL Pipeline, to allow the user interact with numerical and categorical columns in each table (fuel-only, hybrid, and electric). The user can, hence, begin the EDA process by filtering columns to find interesting patterns and relationships.

Specifically, SelectMultiple, Dropdown, and Combobox widgets are employed to filter categorical columns, including the car’s fuel type, size, model, and model year. Note that Combobox, which is a String widget, was not introduced earlier in the course and we recommend taking a look at its documentation. The \(CO_2\) ratings column (a higher rating suggesting lower \(CO_2\) emissions) can also be filtered with the IntSlider widget. The aforementioned widgets and table are shown below:

The code below is used to create the widgets and table. Press ‘Show code source’ to view the code.

Hide code cell source
import os
import sys

os.chdir("../../")

sys.path.append(os.path.join(os.getcwd(), "pipeline", "src"))

import ipywidgets as widgets
from ipywidgets import interactive_output, interact
from menu import init_widgets, style, setup_menu, select_table, clean_electric_range
from dashboard import (
    Seaborn_Barplot,
    Boxplot_ggplot,
    Seaborn_Scatter,
    Histogram_ggplot,
    Seaborn_Boxplot,
)
from IPython.display import display, clear_output
import pandas as pd
import numpy as np
from itables import init_notebook_mode, show

# flake8: noqa
init_notebook_mode(all_interactive=True)


def select_table_electric(vehicle_type, year, vehicle_class, make, co2):
    """
    Select table based on vehicle type

    Parameters
    ----------
        vehicle_type : str
            Vehicle type (fuel-only, hybrid, or electric)
        year : int
            Model year
        vehicle_class : list
            Vehicle class (compact, midsize, etc.)
        make : str
            Car manufacturer
        co2 : int
            CO2 rating (higher rating suggests lower CO2 emissions)
    """
    print("Performing query")
    query = select_table(vehicle_type, year, vehicle_class, make, co2)

    # Use JupySQL magic %sql to execute the query
    result = %sql {{query}};

    # Convert the result to a Pandas DataFrame
    df = result.DataFrame()

    clear_output(wait=True)

    show(df, classes="display nowrap compact")