Parameterizing Queries#

In the last module, you learned how to use SQL within Jupyter notebooks with JupySQL, and you learned how you can combine it with usage of widgets to create interactive graphical user interfaces (GUIs) for your SQL queries. This module will take you a step further. Here, we will discuss how to parameterize your SQL queries and effectively utilize ipywidgets to create more interactive data workflows. Let’s dive in!

Set up and data access#

Important

Note: The –save and %sqlcmd features used require the latest JupySQL version. Ensure you run the code below.

This code installs JupySQL, and DuckDB in your environment. We will be using these moving forward.

%pip install jupysql jupysql-plugin --quiet
Note: you may need to restart the kernel to use updated packages.

We continue to work with the Bank and Marketing data set.

Important

Source: UCI Machine Learning Repository

URL: https://archive-beta.ics.uci.edu/dataset/222/bank+marketing

Data Citation

Moro,S., Rita,P., and Cortez,P.. (2012). Bank Marketing. UCI Machine Learning Repository. https://doi.org/10.24432/C5K306.

We can use the following function to extract the downloaded data from the UCI repository.

import sys

sys.path.insert(0, "../../")
import banking  # noqa: E402

_ = banking.BankingData("https://tinyurl.com/jb-bank", "bank")
_.extract_to_csv()

Initialize a DuckDB Instance:

# Loading in SQL extension
%reload_ext sql
# Initiating a DuckDB database named 'bank.duck.db' to run our SQL queries on
%sql duckdb:///bank.duck.db
Found pyproject.toml from '/home/docs/checkouts/readthedocs.org/user_builds/ploomber-sql/checkouts/latest'
Settings changed:
Config value
displaycon False
feedback True
autopandas False
named_parameters True

Load the data

%%sql
CREATE OR REPLACE TABLE bank AS
FROM read_csv_auto('bank_cleaned.csv', header=True, sep=',')
Count

We confirm the table was loaded

%sqlcmd tables
Name
bank

We can use JupySQL’s Table Explorer to take a quick look at the table.

%sqlcmd explore --table bank
%sqlcmd columns -t bank
name type nullable default autoincrement comment
age BIGINT True None False None
job VARCHAR True None False None
marital VARCHAR True None False None
education VARCHAR True None False None
default VARCHAR True None False None
balance BIGINT True None False None
housing VARCHAR True None False None
loan VARCHAR True None False None
contact VARCHAR True None False None
day BIGINT True None False None
month VARCHAR True None False None
duration BIGINT True None False None
campaign BIGINT True None False None
pdays BIGINT True None False None
previous BIGINT True None False None
poutcome VARCHAR True None False None
y VARCHAR True None False None

Variable Expansion#

JupySQL supports variable expansion in the form of {{variable}}. This allows you to write a query with placeholders that can be replaced by variables dynamically. The benefits of using parameterized SQL queries are that they can be reused with different values, prepared ahead of time, and used with dynamic data.

Let’s start with a simple query template with placeholders, and substitute the placeholders with a couple of variables using variable expansion.

We are going to select two columns (age and job) from the table bank with the variable dynamic_column and show only 5 rows. We can control this via another variable dynamic_limit.

dynamic_limit = 5
dynamic_column = "age, job"
%sql SELECT {{dynamic_column}} FROM bank LIMIT {{dynamic_limit}}
age job
30 unemployed
33 services
35 management
30 management
59 blue-collar

Sample case#

The HR department would like to determine top five oldest and youngest employees in two different jobs: services, management.

We can combine Python with SQL code using the %sql magic. With Python we can iterate over the jobs in the list jobs. With parameterization, we can then insert the job into the SQL query via the {{variable}} notation.

jobs = ["services", "management"]

for job in jobs:
    print(f"Top 5 oldest in {job}")
    order = "DESC"
    o_ = %sql SELECT * FROM bank WHERE job='{{job}}' ORDER BY age {{order}} LIMIT 5
    print(o_)

    print(f"Top 5 youngest in {job}")
    order = "ASC"
    y_ = %sql SELECT * FROM bank WHERE job='{{job}}' ORDER BY age {{order}} LIMIT 5
    print(y_)
Top 5 oldest in services
+-----+----------+----------+-----------+---------+---------+---------+------+----------+-----+-------+----------+----------+-------+----------+----------+-----+
| age |   job    | marital  | education | default | balance | housing | loan | contact  | day | month | duration | campaign | pdays | previous | poutcome |  y  |
+-----+----------+----------+-----------+---------+---------+---------+------+----------+-----+-------+----------+----------+-------+----------+----------+-----+
|  60 | services | married  |  tertiary |    no   |   4256  |   yes   |  no  | cellular |  16 |  aug  |   239    |    3     |  761  |    1     |  other   | yes |
|  59 | services | married  | secondary |    no   |   204   |    no   |  no  | cellular |  23 |  jul  |   286    |    2     |   -1  |    0     | unknown  |  no |
|  59 | services | married  | secondary |    no   |    0    |   yes   |  no  | cellular |  12 |  may  |   214    |    3     |   -1  |    0     | unknown  |  no |
|  58 | services | divorced | secondary |    no   |   462   |    no   |  no  | cellular |  13 |  nov  |   495    |    1     |   93  |    1     | success  | yes |
|  58 | services | married  |  primary  |    no   |    65   |    no   |  no  | unknown  |  27 |  jun  |   162    |    1     |   -1  |    0     | unknown  |  no |
+-----+----------+----------+-----------+---------+---------+---------+------+----------+-----+-------+----------+----------+-------+----------+----------+-----+
Top 5 youngest in services
+-----+----------+---------+-----------+---------+---------+---------+------+-----------+-----+-------+----------+----------+-------+----------+----------+----+
| age |   job    | marital | education | default | balance | housing | loan |  contact  | day | month | duration | campaign | pdays | previous | poutcome | y  |
+-----+----------+---------+-----------+---------+---------+---------+------+-----------+-----+-------+----------+----------+-------+----------+----------+----+
|  21 | services |  single | secondary |    no   |   361   |    no   |  no  | telephone |  5  |  jun  |   329    |    1     |   95  |    1     |  other   | no |
|  21 | services |  single | secondary |    no   |   1903  |   yes   |  no  |  unknown  |  29 |  may  |   107    |    2     |   -1  |    0     | unknown  | no |
|  23 | services |  single |  tertiary |    no   |   363   |   yes   |  no  |  unknown  |  30 |  may  |    16    |    18    |   -1  |    0     | unknown  | no |
|  23 | services | married | secondary |    no   |   -19   |   yes   |  no  |  unknown  |  20 |  jun  |    13    |    10    |   -1  |    0     | unknown  | no |
|  23 | services |  single | secondary |    no   |   500   |   yes   |  no  |  unknown  |  4  |  jun  |   111    |    3     |   -1  |    0     | unknown  | no |
+-----+----------+---------+-----------+---------+---------+---------+------+-----------+-----+-------+----------+----------+-------+----------+----------+----+
Top 5 oldest in management
+-----+------------+---------+-----------+---------+---------+---------+------+-----------+-----+-------+----------+----------+-------+----------+----------+----+
| age |    job     | marital | education | default | balance | housing | loan |  contact  | day | month | duration | campaign | pdays | previous | poutcome | y  |
+-----+------------+---------+-----------+---------+---------+---------+------+-----------+-----+-------+----------+----------+-------+----------+----------+----+
|  80 | management | married |  primary  |    no   |   6483  |    no   |  no  | telephone |  19 |  oct  |   123    |    1     |   -1  |    0     | unknown  | no |
|  66 | management | married |  tertiary |    no   |   1048  |   yes   |  no  |  cellular |  23 |  jun  |   971    |    2     |   -1  |    0     | unknown  | no |
|  66 | management | married |  tertiary |    no   |   2169  |    no   |  no  |  cellular |  7  |  jul  |   282    |    1     |   -1  |    0     | unknown  | no |
|  66 | management |  single |  tertiary |    no   |   6483  |    no   |  no  |  cellular |  13 |  aug  |   275    |    1     |   -1  |    0     | unknown  | no |
|  64 | management | married |  tertiary |    no   |   679   |    no   |  no  | telephone |  22 |  apr  |   135    |    3     |   -1  |    0     | unknown  | no |
+-----+------------+---------+-----------+---------+---------+---------+------+-----------+-----+-------+----------+----------+-------+----------+----------+----+
Top 5 youngest in management
+-----+------------+---------+-----------+---------+---------+---------+------+----------+-----+-------+----------+----------+-------+----------+----------+-----+
| age |    job     | marital | education | default | balance | housing | loan | contact  | day | month | duration | campaign | pdays | previous | poutcome |  y  |
+-----+------------+---------+-----------+---------+---------+---------+------+----------+-----+-------+----------+----------+-------+----------+----------+-----+
|  23 | management |  single |  tertiary |    no   |   736   |   yes   |  no  | cellular |  30 |  apr  |   119    |    1     |   -1  |    0     | unknown  |  no |
|  24 | management |  single |  tertiary |    no   |   172   |    no   |  no  | cellular |  3  |  jul  |    67    |    2     |   -1  |    0     | unknown  |  no |
|  25 | management |  single |  tertiary |    no   |   553   |   yes   |  no  | cellular |  17 |  sep  |   113    |    4     |  479  |    1     | failure  |  no |
|  25 | management |  single |  tertiary |    no   |   2506  |    no   |  no  | cellular |  17 |  feb  |   485    |    1     |   -1  |    0     | unknown  | yes |
|  25 | management |  single |  tertiary |    no   |   808   |    no   |  no  | cellular |  18 |  sep  |   267    |    2     |  114  |    2     | failure  | yes |
+-----+------------+---------+-----------+---------+---------+---------+------+----------+-----+-------+----------+----------+-------+----------+----------+-----+

Mini Exercise:#

The marketing team is interested in understanding the distribution of age for their marketing campaigns. Select the top 5 oldest and youngest people who have been part of the marketing campaign (campaign column).

Your task: use a loop with variable expansion to accomplish this.

Answers
campaign = %sql SELECT DISTINCT campaign FROM bank
campaign = campaign.DataFrame()

campaigns = campaign["campaign"].tolist()[:2]

for campaign in campaigns:
    print(f"Top 5 oldest in campaign {campaign}")
    order = "DESC"
    oldest_camppaing = %sql SELECT * FROM bank WHERE campaign={{campaign}} ORDER BY age {{order}} LIMIT 5
    print(oldest_camppaing)

    print(f"Top 5 youngest in campaign {campaign}")
    order = "ASC"
    youngest_campaign = %sql SELECT * FROM bank WHERE campaign={{campaign}} ORDER BY age {{order}} LIMIT 5
    print(youngest_campaign)
Top 5 oldest in campaign 1
+-----+------------+----------+-----------+---------+---------+---------+------+-----------+-----+-------+----------+----------+-------+----------+----------+-----+
| age |    job     | marital  | education | default | balance | housing | loan |  contact  | day | month | duration | campaign | pdays | previous | poutcome |  y  |
+-----+------------+----------+-----------+---------+---------+---------+------+-----------+-----+-------+----------+----------+-------+----------+----------+-----+
|  87 |  retired   | married  |  primary  |    no   |   230   |    no   |  no  |  cellular |  30 |  oct  |   144    |    1     |   -1  |    0     | unknown  | yes |
|  83 |  retired   | married  |  primary  |    no   |   425   |    no   |  no  | telephone |  22 |  jun  |   883    |    1     |  140  |    1     | failure  | yes |
|  83 |  retired   | divorced |  primary  |    no   |    0    |    no   |  no  | telephone |  31 |  may  |   664    |    1     |   77  |    3     | success  |  no |
|  83 |  retired   | divorced |  primary  |    no   |   1097  |    no   |  no  | telephone |  5  |  mar  |   181    |    1     |   -1  |    0     | unknown  | yes |
|  80 | management | married  |  primary  |    no   |   6483  |    no   |  no  | telephone |  19 |  oct  |   123    |    1     |   -1  |    0     | unknown  |  no |
+-----+------------+----------+-----------+---------+---------+---------+------+-----------+-----+-------+----------+----------+-------+----------+----------+-----+
Top 5 youngest in campaign 1
+-----+----------+---------+-----------+---------+---------+---------+------+-----------+-----+-------+----------+----------+-------+----------+----------+-----+
| age |   job    | marital | education | default | balance | housing | loan |  contact  | day | month | duration | campaign | pdays | previous | poutcome |  y  |
+-----+----------+---------+-----------+---------+---------+---------+------+-----------+-----+-------+----------+----------+-------+----------+----------+-----+
|  19 | student  |  single | secondary |    no   |   302   |    no   |  no  |  cellular |  16 |  jul  |   205    |    1     |   -1  |    0     | unknown  | yes |
|  20 | student  |  single | secondary |    no   |   502   |    no   |  no  |  cellular |  30 |  apr  |   261    |    1     |   -1  |    0     | unknown  | yes |
|  20 | student  |  single | secondary |    no   |   1191  |    no   |  no  |  cellular |  12 |  feb  |   274    |    1     |   -1  |    0     | unknown  |  no |
|  21 | services |  single | secondary |    no   |   361   |    no   |  no  | telephone |  5  |  jun  |   329    |    1     |   95  |    1     |  other   |  no |
|  21 | student  |  single | secondary |    no   |   681   |    no   |  no  |  unknown  |  20 |  aug  |    6     |    1     |   -1  |    0     | unknown  |  no |
+-----+----------+---------+-----------+---------+---------+---------+------+-----------+-----+-------+----------+----------+-------+----------+----------+-----+
Top 5 oldest in campaign 4
+-----+---------------+---------+-----------+---------+---------+---------+------+-----------+-----+-------+----------+----------+-------+----------+----------+-----+
| age |      job      | marital | education | default | balance | housing | loan |  contact  | day | month | duration | campaign | pdays | previous | poutcome |  y  |
+-----+---------------+---------+-----------+---------+---------+---------+------+-----------+-----+-------+----------+----------+-------+----------+----------+-----+
|  78 |   housemaid   | married | secondary |    no   |   499   |    no   |  no  | telephone |  16 |  mar  |    80    |    4     |   -1  |    0     | unknown  |  no |
|  77 |    retired    | married |  primary  |    no   |   680   |    no   |  no  | telephone |  27 |  nov  |   341    |    4     |   94  |    3     | failure  |  no |
|  66 |     admin.    |  single | secondary |    no   |  10177  |    no   |  no  |  cellular |  5  |  may  |   272    |    4     |   -1  |    0     | unknown  |  no |
|  61 |   technician  | married | secondary |    no   |   1191  |    no   |  no  |  cellular |  21 |  may  |   214    |    4     |   -1  |    0     | unknown  | yes |
|  60 | self-employed |  single |  tertiary |    no   |   179   |    no   | yes  |  cellular |  29 |  jul  |    79    |    4     |   -1  |    0     | unknown  |  no |
+-----+---------------+---------+-----------+---------+---------+---------+------+-----------+-----+-------+----------+----------+-------+----------+----------+-----+
Top 5 youngest in campaign 4
+-----+-------------+---------+-----------+---------+---------+---------+------+-----------+-----+-------+----------+----------+-------+----------+----------+-----+
| age |     job     | marital | education | default | balance | housing | loan |  contact  | day | month | duration | campaign | pdays | previous | poutcome |  y  |
+-----+-------------+---------+-----------+---------+---------+---------+------+-----------+-----+-------+----------+----------+-------+----------+----------+-----+
|  24 | blue-collar |  single |  primary  |    no   |   111   |   yes   | yes  |  unknown  |  27 |  may  |    42    |    4     |   -1  |    0     | unknown  |  no |
|  25 |   student   |  single | secondary |    no   |   331   |    no   |  no  | telephone |  26 |  aug  |   170    |    4     |   -1  |    0     | unknown  | yes |
|  25 |  management |  single |  tertiary |    no   |   553   |   yes   |  no  |  cellular |  17 |  sep  |   113    |    4     |  479  |    1     | failure  |  no |
|  25 | blue-collar |  single | secondary |    no   |   913   |   yes   |  no  |  cellular |  7  |  jul  |   113    |    4     |   -1  |    0     | unknown  |  no |
|  25 |  management | married |  tertiary |    no   |   -242  |   yes   | yes  |  cellular |  18 |  may  |   600    |    4     |  332  |    4     |  other   | yes |
+-----+-------------+---------+-----------+---------+---------+---------+------+-----------+-----+-------+----------+----------+-------+----------+----------+-----+

Variable Expansion Using Loops#

We can also use loops inside SQL to parameterize our queries. For example, let’s calculate the average age for each job role in the bank table.

The query below will iterate over four kinds of jobs: blue collar, technician, services and management. For each job, it will compute the average age for people with that job and create a new column for the average of each job.

The resulting query can be reused as it is being saved as avg_age with the --save option.

%%sql --save avg_age
{% set jobs = [ "blue_collar", "technician", "services", "management"] %}
select
    {% for job in jobs %}
    avg(case when job = '{{job.replace('_', '-')}}' then age end) as avg_age_{{job}},
    {% endfor %}
from bank
avg_age_blue_collar avg_age_technician avg_age_services avg_age_management
40.15644820295983 39.470052083333336 38.57074340527578 40.54076367389061

We can see the final compiled query using:

final = %sqlrender avg_age
print(final)
select
    
    avg(case when job = 'blue-collar' then age end) as avg_age_blue_collar,
    
    avg(case when job = 'technician' then age end) as avg_age_technician,
    
    avg(case when job = 'services' then age end) as avg_age_services,
    
    avg(case when job = 'management' then age end) as avg_age_management,
    
from bank
/home/docs/checkouts/readthedocs.org/user_builds/ploomber-sql/conda/latest/lib/python3.11/site-packages/sql/magic.py:84: FutureWarning: 
'%sqlrender' will be deprecated soon, please use '%sqlcmd snippets avg_age' instead. 

For documentation, follow this link : https://jupysql.ploomber.io/en/latest/api/magic-snippets.html#id1
  warnings.warn(

Sample case#

The HR department would like to determine the average balance for employees by job (for jobs technician, services and management) and education level. We can achieve this by using the job and education columns from the bank table.

%%sql --save avg_balance
{% set jobs = [ "technician", "services", "management"] %}
{% set education_levels = [ "primary", "secondary", "tertiary", "unknown"] %}
select
    {% for job in jobs %}
    {% for education in education_levels %}
    avg(case when job = '{{job}}' and education = '{{education}}' then balance end) as avg_balance_{{job}}_{{education}},
    {% endfor %}
    {% endfor %}
from bank
avg_balance_technician_primary avg_balance_technician_secondary avg_balance_technician_tertiary avg_balance_technician_unknown avg_balance_services_primary avg_balance_services_secondary avg_balance_services_tertiary avg_balance_services_unknown avg_balance_management_primary avg_balance_management_secondary avg_balance_management_tertiary avg_balance_management_unknown
2593.0 1153.6076923076923 1631.6303317535544 1780.0 1107.32 998.8842975206611 1894.875 3058.0 2685.4102564102564 1250.103448275862 1776.3430749682339 2386.259259259259

Mini Exercise:#

Calculate the count of people for each job role for jobs “blue_collar”, “technician”, “services”, “management”.

Your task: use Variable Expansion Using Loops and the %%sql magic

Answers
%%sql --save count_jobs
{% set jobs = [ "blue_collar", "technician", "services", "management"] %}
select
    {% for job in jobs %}
    count(case when job = '{{job.replace('_', '-')}}' then 1 end) as count_{{job}},
    {% endfor %}
from bank
count_blue_collar count_technician count_services count_management
946 768 417 969

Macros + Variable Expansion#

Macros is a construct analogous to functions that promote re-usability. We define a macro using the {% macro %} tag and use this macro in the query using variable expansion.

For example, let’s define a macro to convert a value from days to years, and use this macro in our query. This macro will be called days_to_years and it takes as input the column_name and precision. With the default precision set to 2.

We can obtain the number of years by dividing the number of days by 365.

We can then use the days_to_years macro in the query, where we are passing age as the input column.

%%sql --save convert
{% macro days_to_years(column_name, precision=2) %}
    ({{ column_name }} / 365)::numeric(16, {{ precision }})
{% endmacro %}

select
  job, marital,
  {{ days_to_years('age') }} as age_in_years
from bank
job marital age_in_years
unemployed married 0.08
services married 0.09
management single 0.10
management married 0.08
blue-collar married 0.16
management single 0.10
self-employed married 0.10
technician married 0.11
entrepreneur married 0.11
services married 0.12
Truncated to displaylimit of 10.

In this query, we’re converting age from days to years using our macro. The results are saved in a variable convert.

The final rendered query can be viewed using:

final = %sqlrender convert
print(final)
select
  job, marital,
  
    (age / 365)::numeric(16, 2)
 as age_in_years
from bank
/home/docs/checkouts/readthedocs.org/user_builds/ploomber-sql/conda/latest/lib/python3.11/site-packages/sql/magic.py:84: FutureWarning: 
'%sqlrender' will be deprecated soon, please use '%sqlcmd snippets convert' instead. 

For documentation, follow this link : https://jupysql.ploomber.io/en/latest/api/magic-snippets.html#id1
  warnings.warn(

Sample usage#

Let’s use macros to handle NULL values. The macro below checks if the chosen column has NULL values.

The code block defines a macro that will be used later in the SQL query. The macro is named months_to_years_handle_null, and its purpose is to convert the values in a specified column from months to years. Additionally, this macro is designed to handle null values in a column, hence its name.

If the value in the column is null, the macro will return NULL; otherwise, it will convert the value from months to years.

%%sql --save convert_duration_handle_null
{% macro months_to_years_handle_null(column_name, precision=2) %}
    (case when {{ column_name }} is not null then {{ column_name }} / 12 else null end)::numeric(16, {{ precision }})
{% endmacro %}

select
  job, marital,
  {{ months_to_years_handle_null('duration') }} as duration_in_years
from bank
job marital duration_in_years
unemployed married 6.58
services married 18.33
management single 15.42
management married 16.58
blue-collar married 18.83
management single 11.75
self-employed married 28.42
technician married 12.58
entrepreneur married 4.75
services married 26.08
Truncated to displaylimit of 10.
final = %sqlrender convert_duration_handle_null
print(final)
select
  job, marital,
  
    (case when duration is not null then duration / 12 else null end)::numeric(16, 2)
 as duration_in_years
from bank
/home/docs/checkouts/readthedocs.org/user_builds/ploomber-sql/conda/latest/lib/python3.11/site-packages/sql/magic.py:84: FutureWarning: 
'%sqlrender' will be deprecated soon, please use '%sqlcmd snippets convert_duration_handle_null' instead. 

For documentation, follow this link : https://jupysql.ploomber.io/en/latest/api/magic-snippets.html#id1
  warnings.warn(

Mini exercise#

Use the months_to_years_handle_null macro to iterate over the jobs: “blue_collar”, “technician”, “services”, “management”. For each job in the loop, calculate the average duration for the specified job. Save this query as avg_duration. Display the query.

Answers

This modification is using the months_to_years_handle_null macro that you had defined earlier. This macro will convert duration from months to years for each job category, while also handling null values. The macro will apply to the average duration of each job category (where the job is equal to the current job in the loop). The output for each job category is then given an alias that indicates the job category and that it represents average duration in years.

Just like before, the code below will dynamically generate the SQL query, and the –save directive will save the query result into the avg_duration variable.

%%sql --save avg_duration
{% macro months_to_years_handle_null(column_name, precision=2) %}
    (case when {{ column_name }} is not null then {{ column_name }} / 12 else null end)::numeric(16, {{ precision }})
{% endmacro %}
{% set jobs = [ "blue_collar", "technician", "services", "management"] %}
select
    {% for job in jobs %}
    {{ months_to_years_handle_null('avg(case when job = \'' + job.replace('_', '-') + '\' then duration end)') }} as avg_duration_{{job}},
    {% endfor %}
from bank
avg_duration_blue_collar avg_duration_technician avg_duration_services avg_duration_management
23.18 21.01 21.87 21.71

Display the query:

final = %sqlrender avg_duration
print(final)
select
    
    
    (case when avg(case when job = 'blue-collar' then duration end) is not null then avg(case when job = 'blue-collar' then duration end) / 12 else null end)::numeric(16, 2)
 as avg_duration_blue_collar,
    
    
    (case when avg(case when job = 'technician' then duration end) is not null then avg(case when job = 'technician' then duration end) / 12 else null end)::numeric(16, 2)
 as avg_duration_technician,
    
    
    (case when avg(case when job = 'services' then duration end) is not null then avg(case when job = 'services' then duration end) / 12 else null end)::numeric(16, 2)
 as avg_duration_services,
    
    
    (case when avg(case when job = 'management' then duration end) is not null then avg(case when job = 'management' then duration end) / 12 else null end)::numeric(16, 2)
 as avg_duration_management,
    
from bank
/home/docs/checkouts/readthedocs.org/user_builds/ploomber-sql/conda/latest/lib/python3.11/site-packages/sql/magic.py:84: FutureWarning: 
'%sqlrender' will be deprecated soon, please use '%sqlcmd snippets avg_duration' instead. 

For documentation, follow this link : https://jupysql.ploomber.io/en/latest/api/magic-snippets.html#id1
  warnings.warn(

Creating multiple tables dynamically#

We’ll finish off this section by showing you how you can generate multiple tables dynamically.

Your task now is to create separate tables for each job. This can easily be accomplished using a loop and variable expansion:

%%sql
DROP TABLE IF EXISTS services;
DROP TABLE IF EXISTS management;
Success
jobs = ["services", "management"]
for job in jobs:
    %sql CREATE TABLE {{job}} AS (SELECT * from bank WHERE job = '{{job}}')

Let’s check the tables in our schema:

%sqlcmd tables
Name
management
services
bank

Delete table

%%sql
DROP TABLE bank;
DROP TABLE services;
DROP TABLE management;
Success

Conclusion#

In this module, we have explored how to parameterize SQL queries and effectively integrate with Python for a more interactive data analysis workflow within Jupyter notebooks. This method is facilitated using JupySQL and DuckDB. We’ve also demonstrated how to dynamically create and use variables within SQL queries, further enhancing the flexibility and interactivity of the notebooks.

We delved into the basics of variable expansion, using loops within SQL to iterate over arrays, and how to define and use macros to promote reusability and reduce redundancy. We further illustrated the potential of these techniques through several practical examples, including determining the top five oldest and youngest employees in various jobs, calculating average ages and balances by job and education level, handling NULL values with macros, and dynamically creating multiple tables.

Through these exercises, it has been made evident that integrating SQL into Jupyter notebooks provides an intuitive and powerful approach to data exploration and analysis. The techniques demonstrated in this module can be widely applied across various datasets, making it a useful skillset for data scientists and analysts. By merging the strengths of Python’s flexibility and SQL’s data management capabilities, we can create more powerful and interactive data workflows.

In the next section, you’ll learn how to combine paramaterization with ipywidgets.