Make your queries interactive#

In this section, we will combine the learnings from the two previous sections, Intro to ipywidgets and Parameterizing SQL Queries, to create interactive SQL queries using JupySQL. This technique is useful for conducting exploratory data analysis with SQL, as it allows us to filter our data and visualize the tabular results interactively.

The installation of ipywidgets was covered previously here.

Set up and data access#

Important

Note: The –save and %sqlcmd features used require the latest JupySQL version. Ensure you run the code below.

This code installs JupySQL, and DuckDB in your environment. We will be using these moving forward.

%pip install jupysql jupysql-plugin --quiet
Note: you may need to restart the kernel to use updated packages.

We continue to work with the Bank and Marketing data set.

Important

Source: UCI Machine Learning Repository

URL: https://archive-beta.ics.uci.edu/dataset/222/bank+marketing

Data Citation

Moro,S., Rita,P., and Cortez,P.. (2012). Bank Marketing. UCI Machine Learning Repository. https://doi.org/10.24432/C5K306.

We can use the following function to extract the downloaded data from the UCI repository.

import sys

sys.path.insert(0, "../../")
import banking  # noqa: E402

_ = banking.BankingData("https://tinyurl.com/jb-bank", "bank")
_.extract_to_csv()

Initialize a DuckDB Instance

# Loading in SQL extension
%reload_ext sql
# Initiating a DuckDB database named 'bank.duck.db' to run our SQL queries on
%sql duckdb:///bank.duck.db
Found pyproject.toml from '/home/docs/checkouts/readthedocs.org/user_builds/ploomber-sql/checkouts/latest'
Settings changed:
Config value
displaycon False
feedback True
autopandas False
named_parameters True

Load the data

%%sql
CREATE OR REPLACE TABLE bank AS
FROM read_csv_auto('bank_cleaned.csv', header=True, sep=',')
Count

We confirm the table was loaded

%sqlcmd tables
Name
bank

We can use JupySQL’s Table Explorer to take a quick look at the table.

%sqlcmd explore --table bank
%sqlcmd columns -t bank
name type nullable default autoincrement comment
age BIGINT True None False None
job VARCHAR True None False None
marital VARCHAR True None False None
education VARCHAR True None False None
default VARCHAR True None False None
balance BIGINT True None False None
housing VARCHAR True None False None
loan VARCHAR True None False None
contact VARCHAR True None False None
day BIGINT True None False None
month VARCHAR True None False None
duration BIGINT True None False None
campaign BIGINT True None False None
pdays BIGINT True None False None
previous BIGINT True None False None
poutcome VARCHAR True None False None
y VARCHAR True None False None

%sql --interact {{widget_variable}}#

In the Intro to ipywidgets section of this module, we learned how to apply python functions and decorators to create ipywidgets. In this section, we apply JupySQL’s --interact argument, which allows us to create interactive SQL queries using ipywidgets. We also learned in the previous section about variable expansion using placeholders for {{variable}}. We will combine these two concepts to create interactive SQL queries!

First, you need to define the {{variable}} as the form of a basic data type or ipywidgets Widget. Then, pass the variable name into the --interact argument and use a WHERE clause to filter using the specified widgets variables. Below, we will delve into the different types of widgets and how to use them with JupySQL.

Basic Data Types#

The simplest way is to declare a variable with basic data types (Numeric, Text, Boolean…). ipywidgets autogenerates UI controls for that variable.

An example with a numeric variable duration, which creates a slider UI for its values by default, is as follows:

duration_min = 1500
%sql --interact duration_min SELECT age, loan, duration FROM bank WHERE duration > {{duration_min}} LIMIT 5
Interactive mode, please interact with below widget(s) to control the variable

Note above that we are filtering records by the variable duration if it is greater than the value of the slider widget duration_min.

An example with a categorical variable loan, which creates a text box for the user to type in values as a UI, is as follows:

loan = "yes"  # Try inputting "no" in the output's text box
%sql --interact loan SELECT age, loan, duration FROM bank WHERE loan == '{{loan}}' LIMIT 5
Interactive mode, please interact with below widget(s) to control the variable

Numeric Widgets#

IntSlider and FloatSlider#

These widgets were introduced in this section of the module. Here, we will see how to use them with JupySQL.

An example for the IntSlider is as follows:

duration_lower_bound = widgets.IntSlider(min=5, max=3000, step=500, value=1500)

%sql --interact duration_lower_bound SELECT age, loan, duration FROM bank WHERE duration <= {{duration_lower_bound}} LIMIT 5
Interactive mode, please interact with below widget(s) to control the variable

Note: Other Numeric Widgets can be found here.

Selection Widgets#

RadioButtons#

These widgets were also introduced in the previous section of this module.

An example of its usage with --interact is as follows:

outcome_selection = widgets.RadioButtons(
    options=["failure", "other", "success", "unknown"],
    description="Outcome",
    disabled=False,
)
%%sql --interact outcome_selection
SELECT age, loan, poutcome FROM bank
WHERE poutcome == '{{outcome_selection}}'
LIMIT 5;
Interactive mode, please interact with below widget(s) to control the variable

Important

When using selection widgets that only allow selecting one value, use == '{{ widget_variable }}' for the query to run without errors. On the other hand, if using multiple selection widgets, use IN {{ widget_variable }}.

Note: Other Selection Widgets can be found here.

Complete Example#

Now, we can demonstrate a way to combine multiple ipywidgets to make a more complex interactive SQL query, inclusive of different data types and widgets.

The code chunk below initializes multiple ipywidgets: radio buttons and multiple selection for the categorical variables poutcome and loan respectively and a basic unbounded slider for the numeric variable duration. The show_limit variable is a basic data type that creates a slider by default and is used to limit the number of rows, in this case between 0 to 10 with a step size of 1.

Note: For poutcome, multiple values can be selected with shift and/or ctrl (or command) pressed and mouse clicks or arrow keys.

duration_lower_bound = 500
show_limit = (0, 10, 1)
loan_selection = widgets.RadioButtons(
    options=["yes", "no"], description="Personal Loan", disabled=False
)
outcome_selection = widgets.SelectMultiple(
    options=["failure", "other", "success", "unknown"],
    value=["success", "failure"],
    description="Previous Campaign Outcome",
    disabled=False,
)

Next, we can use the --interact argument to create UI’s for the above widgets, which will help us interactively filter our output, and pass them into the SQL query:

%%sql --interact show_limit --interact duration_lower_bound --interact loan_selection --interact outcome_selection
SELECT duration, loan, poutcome FROM bank
WHERE poutcome IN {{outcome_selection}} AND
duration > {{duration_lower_bound}} AND
loan == '{{loan_selection}}'
LIMIT {{show_limit}}
Interactive mode, please interact with below widget(s) to control the variable

Try out other widgets, such as Boolean, String, and Datetime, detailed here and see what you can come up with!

Macros + %sql --interact {{widget_variable}}#

The --interact argument can also be used with Macros, using the {% macro %} tag. Recall that a macro is saved in a variable with the --save flag. Therefore, JupySQL supports the use of multiple flags in a %%sql command. Applying macros to ipywidgets is useful when we want to use the same widget variable in multiple SQL queries. For a refresher on Macros, see this section

To show an example, consider the contact duration variable, measured in seconds, from the bank marketing dataset. Suppose the marketing manager wants to not only measure this variable in minutes but also filter it interactively across all other queries. We can, hence, use a macro to transform the variable and create a slider widget for it!

We first create our bounded FloatSlider numeric widget variable, just like previous instances. Note when using a macro, we need to initialize the widget variable in a python code-cell outside of the macro’s code-cell, which will be executed as an SQL cell wholly.

duration_slider = widgets.FloatSlider(
    min=0.05, max=51, step=1, value=5, description="Contact Duration (mins)"
)

Next, we initialize our macro function and specify the SQL query with the widget variable that produces the UI for the slider and the tabular output for the saved query:

%%sql --save convert --interact duration_slider
{% macro sec_to_min(column_name, precision=2) %}
    ({{ column_name }} / 60)::numeric(16, {{ precision }})
{% endmacro %}

SELECT
  job, marital,
  {{ sec_to_min('duration') }} as duration_in_mins
FROM bank
WHERE duration_in_mins <= {{duration_slider}};
Interactive mode, please interact with below widget(s) to control the variable

Finally, we have the option to display our query using the %sqlrender command as seen in the parameterizing your queries section section. Because we applied a widget to it, the WHERE clause in the rendered query will reflect the latest value of the slider UI, so you do not have to go back and forth!

final = %sqlrender convert
print(final)
SELECT
  job, marital,
  
    (duration / 60)::numeric(16, 2)
 as duration_in_mins
FROM bank
WHERE duration_in_mins <= 5.0;
/home/docs/checkouts/readthedocs.org/user_builds/ploomber-sql/conda/latest/lib/python3.11/site-packages/sql/magic.py:84: FutureWarning: 
'%sqlrender' will be deprecated soon, please use '%sqlcmd snippets convert' instead. 

For documentation, follow this link : https://jupysql.ploomber.io/en/latest/api/magic-snippets.html#id1
  warnings.warn(

You try: Use JupySQL to perform the queries and answer the questions#

Question 1 (Easy):#

Using the bank dataset, create an IntSlider widget called balance_lower for the balance column. Specifically, include a bounded slider with values ranging between -1000 and 20000, a step size of 1000, and initial value set to 10000. Show only the first 5 rows of the output and the columns age, loan, and balance.

Answers

We start off by initializing a variable called balance_lower and assigning it to the IntSlider widget with the required integer arguments. To limit the number of rows to only 5, we do not need a basic slider, as shown with the show_limit variable in the Complete Example above.

balance_lower = widgets.IntSlider(min=-1000, max=20000, step=1000, value=10000)
%%sql --interact balance_lower
SELECT age, loan, balance FROM bank
WHERE balance <= {{balance_lower}}
LIMIT 5
Interactive mode, please interact with below widget(s) to control the variable

Question 2 (Medium):#

Using the bank dataset, create a ToggleButtons Selection Widget for the month column. Show a range of records from 1 to 10 with a step size of 5 and the columns age, loan, and month.

Answers

Important

When in doubt about the syntax of a particular widget, refer to the ipywidgets documentation.

We start off by initializing a variable called month_toggle and assigning it to the ToggleButtons widget. The options argument is set to a list of the unique values in the month column. We do not need to specify the value argument here as it will, by default, select the first value in the options list, which is “jan” in this case.

To show a range of records, we can modify the show_limit variable from the Complete Example above.

month_toggle = widgets.ToggleButtons(
    options=[
        "jan",
        "feb",
        "mar",
        "apr",
        "may",
        "jun",
        "jul",
        "aug",
        "sep",
        "oct",
        "nov",
        "dec",
    ],
    description="Month:",
    disabled=False,
)
show_limit = (1, 10, 5)

Finally, we use the --interact argument to create a UI for the contact_dropdown widget.

%sql --interact show_limit --interact month_toggle SELECT age, loan, month FROM bank WHERE month == '{{month_toggle}}' LIMIT {{show_limit}}
Interactive mode, please interact with below widget(s) to control the variable

Question 3 (BONUS):#

Create an unbounded numeric widget for the integer variable duration with a range of values from 0 to 2000, a step size of 500, and an initial value of 1000. However, make sure that the table changes output upon clicking a play button! Also add a ToggleButton, a Boolean Widget, for the variable housing that has value = “yes”, button_style = “success”, and a check icon. Lastly, limit the output to only show 10 records and the columns age, loan, and housing.

Hint Did you know that we can also create animated sliders for integer data types? This question requires exactly that! See the documentation here for more details.

Answers

We start off by initializing a variable called play and assigning it to the Play Animation widget with the required integer arguments.

play = widgets.Play(
    value=0,
    min=0,
    max=2000,
    step=500,
    interval=1000,  # time interval in milliseconds
    description="Press play",
    disabled=False,
)

For the ToggleButton widget, we initialize a variable called housing_toggle and assign it to the ToggleButton widget with the required arguments.

housing_toggle = widgets.ToggleButtons(
    options=["yes", "no"],
    description="Housing:",
    disabled=False,
)

Before calling --interact, we need to add UI’s for the Play and IntSlider widgets. This is attained with both jslink() and HBox ipywidgets methods. We then use the --interact argument to create the UI’s. In the WHERE clause, because we want an unbounded slider, we use the >= operator for duration. To limit the number of rows to only 10, we do not need a basic slider.

%%sql --interact play --interact housing_toggle
SELECT age, loan, housing FROM bank
WHERE duration >= {{play}} AND
housing == '{{housing_toggle}}'
LIMIT 10
Interactive mode, please interact with below widget(s) to control the variable

Important

Because we set the minimum value and initial value upon rendering to 0, the maximum value to 2000, and a step size of 500, the table will change or “blink” four times upon pressing the “play” button. Normally, an IntSlider is recommended to be added next to the Play widget; however, JupySQL does not support this at the moment.

Question 4 (Hard):#

Consider the pdays variable from the bank marketing dataset. The value in this column is -1 when the client was not contacted since the previous campaign and an integer > 0 otherwise. The marketing manager wants you to create a macro named dummify that transforms this numeric variable into a binary categorical variable, named pdays_dummy, taking on either “no” if pdays = -1 or “yes” otherwise. You are also expected to create both a RadioButtons selection widget for the transformed pdays_dummy variable and a SelectMultiple selection widget for the poutcome variable to help the manager filter for campaign performance on the fly. Finally, output the rendered query after displaying the tabular results, with the columns job, marital, poutcome, and pdays_dummy.

Hint Create the selection widgets first, making sure that no SQL statements are present in their code cells. Then, use --save for creating the macro and --interact for using the widgets. Make sure to account for both widgets in the WHERE clause!

Answers

We start off by initializing our selection widget variables:

contact_selection = widgets.RadioButtons(
    options=["yes", "no"], description="Previously Contacted?", disabled=False
)
outcome_selection = widgets.SelectMultiple(
    options=["failure", "other", "success", "unknown"],
    value=["success", "failure"],
    description="Campaign Outcome:",
    style={"description_width": "initial"},
    disabled=False,
)

Because we are creating a macro, we need to use the --save argument. The --interact argument initializes the UI for our widget variables. We then proceed to create the discretization macro function. Here, because we want to output “yes” or “no”, using ::numeric(16, {{ precision }} after the CASE WHEN statement will be incorrect; hence, we use ::varchar.

The SQL query is then written in the same code-cell the macro is present in and the macro is called in the SELECT clause on the pdays variable, which is aliased as pdays_dummy for readability. The poutcome variable is present in both the SELECT and WHERE clauses to show it in the tabular output and to create the multiple selection widget for it respectively. Lastly, pdays_dummy is used in the WHERE clause to create its radio button widget:

%%sql --save dummify --interact contact_selection --interact outcome_selection
{% macro days_to_dummy(column_name) %}
    (case when {{ column_name }} = -1 then 'no' else 'yes' end)::varchar
{% endmacro %}

SELECT
    job, marital, poutcome,
    {{ days_to_dummy('pdays') }} as pdays_dummy
FROM bank
WHERE poutcome IN {{outcome_selection}} AND
pdays_dummy == '{{contact_selection}}';
Interactive mode, please interact with below widget(s) to control the variable

Finally, %sqlrender helps us display the query, accounting for the last chosen values in the widgets:

final = %sqlrender dummify
print(final)
SELECT
    job, marital, poutcome,
    
    (case when pdays = -1 then 'no' else 'yes' end)::varchar
 as pdays_dummy
FROM bank
WHERE poutcome IN ('success', 'failure') AND
pdays_dummy == 'yes';
/home/docs/checkouts/readthedocs.org/user_builds/ploomber-sql/conda/latest/lib/python3.11/site-packages/sql/magic.py:84: FutureWarning: 
'%sqlrender' will be deprecated soon, please use '%sqlcmd snippets dummify' instead. 

For documentation, follow this link : https://jupysql.ploomber.io/en/latest/api/magic-snippets.html#id1
  warnings.warn(

Wrapping Up#

In this section, we learned about how to use JupySQL to create widgets with variable expansion and macros. To summarize:

  • Numeric widgets, although commonly used as sliders, can also be used in text boxes or dropdowns, for example. This is useful when you want to specify a value.

  • For categorical or text data, we can use either Selection or Boolean widgets. Some examples include radio buttons, toggle buttons, and dropdowns.

  • %sql --interact {{widget_variable}} is a powerful tool in your arsenal to quickly create ipywidgets

  • The main difference between the %sql and %%sql magic commands in Jupyter notebooks is that %sql only allows for a single statement to be executed, while %%sql allows a block of SQL to be executed. Make sure to use %%sql when creating macros!

  • JupySQL provides incredible flexibility because it allows chaining -- options, including --save and --interact, as we saw in this section. This helps us combine SQL queries with macros and widgets, making the EDA process less repetitive and more interactive!

This ends the Interactive Queries and Parameterization module. We hope you use these skills to boost your productivity in creating interactive queries! In the next module, we will introduce Advanced Querying Techniques.

Delete table

%%sql
DROP TABLE IF EXISTS bank;
Success

References#

“Simple Widget Introduction#.” Simple Widget Introduction - Jupyter Widgets 8.0.5 documentation, n.d. https://ipywidgets.readthedocs.io/en/stable/examples/Widget Basics.html.

“Widget List#.” Widget List - Jupyter Widgets 8.0.5 documentation, n.d. https://ipywidgets.readthedocs.io/en/stable/examples/Widget List.html.