Parameterizing Queries

Parameterizing Queries#

In the last module, you learned how to use SQL within Jupyter notebooks with JupySQL, and you learned how you can combine it with usage of widgets to create interactive graphical user interfaces (GUIs) for your SQL queries. This module will take you a step further. Here, we will discuss how to parameterize your SQL queries and effectively utilize ipywidgets to create more interactive data workflows. Let’s dive in!

Set up and data access#

Important

Note: The –save and %sqlcmd features used require the latest JupySQL version. Ensure you run the code below.

This code installs JupySQL, and DuckDB in your environment. We will be using these moving forward.

%pip install jupysql jupysql-plugin --quiet

Note: you may need to restart the kernel to use updated packages.

We continue to work with the Bank and Marketing data set.

Important

Source: UCI Machine Learning Repository

URL: https://archive-beta.ics.uci.edu/dataset/222/bank+marketing

Data Citation

Moro,S., Rita,P., and Cortez,P.. (2012). Bank Marketing. UCI Machine Learning Repository. https://doi.org/10.24432/C5K306.

We can use the following function to extract the downloaded data from the UCI repository.

import sys

sys.path.insert(0, "../../")
import banking  # noqa: E402

_ = banking.BankingData("https://tinyurl.com/jb-bank", "bank")
_.extract_to_csv()

Initialize a DuckDB Instance:

# Loading in SQL extension
%reload_ext sql
# Initiating a DuckDB database named 'bank.duck.db' to run our SQL queries on
%sql duckdb:///bank.duck.db

Found pyproject.toml from '/home/docs/checkouts/readthedocs.org/user_builds/ploomber-sql/checkouts/latest'

Settings changed:

Config	value
displaycon	False
feedback	True
autopandas	False
named_parameters	True

Load the data

%%sql
CREATE OR REPLACE TABLE bank AS
FROM read_csv_auto('bank_cleaned.csv', header=True, sep=',')

Count

We confirm the table was loaded

%sqlcmd tables

Name
bank

We can use JupySQL’s Table Explorer to take a quick look at the table.

%sqlcmd explore --table bank

%sqlcmd columns -t bank

name	type	nullable	default	autoincrement	comment
age	BIGINT	True	None	False	None
job	VARCHAR	True	None	False	None
marital	VARCHAR	True	None	False	None
education	VARCHAR	True	None	False	None
default	VARCHAR	True	None	False	None
balance	BIGINT	True	None	False	None
housing	VARCHAR	True	None	False	None
loan	VARCHAR	True	None	False	None
contact	VARCHAR	True	None	False	None
day	BIGINT	True	None	False	None
month	VARCHAR	True	None	False	None
duration	BIGINT	True	None	False	None
campaign	BIGINT	True	None	False	None
pdays	BIGINT	True	None	False	None
previous	BIGINT	True	None	False	None
poutcome	VARCHAR	True	None	False	None
y	VARCHAR	True	None	False	None

Variable Expansion#

JupySQL supports variable expansion in the form of {{variable}}. This allows you to write a query with placeholders that can be replaced by variables dynamically. The benefits of using parameterized SQL queries are that they can be reused with different values, prepared ahead of time, and used with dynamic data.

Let’s start with a simple query template with placeholders, and substitute the placeholders with a couple of variables using variable expansion.

We are going to select two columns (age and job) from the table bank with the variable dynamic_column and show only 5 rows. We can control this via another variable dynamic_limit.

dynamic_limit = 5
dynamic_column = "age, job"
%sql SELECT {{dynamic_column}} FROM bank LIMIT {{dynamic_limit}}

age	job
30	unemployed
33	services
35	management
30	management
59	blue-collar

Sample case#

The HR department would like to determine top five oldest and youngest employees in two different jobs: services, management.

We can combine Python with SQL code using the %sql magic. With Python we can iterate over the jobs in the list jobs. With parameterization, we can then insert the job into the SQL query via the {{variable}} notation.

jobs = ["services", "management"]

for job in jobs:
    print(f"Top 5 oldest in {job}")
    order = "DESC"
    o_ = %sql SELECT * FROM bank WHERE job='{{job}}' ORDER BY age {{order}} LIMIT 5
    print(o_)

    print(f"Top 5 youngest in {job}")
    order = "ASC"
    y_ = %sql SELECT * FROM bank WHERE job='{{job}}' ORDER BY age {{order}} LIMIT 5
    print(y_)

Top 5 oldest in services
+-----+----------+----------+-----------+---------+---------+---------+------+----------+-----+-------+----------+----------+-------+----------+----------+-----+
| age |   job    | marital  | education | default | balance | housing | loan | contact  | day | month | duration | campaign | pdays | previous | poutcome |  y  |
+-----+----------+----------+-----------+---------+---------+---------+------+----------+-----+-------+----------+----------+-------+----------+----------+-----+
|  60 | services | married  |  tertiary |    no   |   4256  |   yes   |  no  | cellular |  16 |  aug  |   239    |    3     |  761  |    1     |  other   | yes |
|  59 | services | married  | secondary |    no   |   204   |    no   |  no  | cellular |  23 |  jul  |   286    |    2     |   -1  |    0     | unknown  |  no |
|  59 | services | married  | secondary |    no   |    0    |   yes   |  no  | cellular |  12 |  may  |   214    |    3     |   -1  |    0     | unknown  |  no |
|  58 | services | divorced | secondary |    no   |   462   |    no   |  no  | cellular |  13 |  nov  |   495    |    1     |   93  |    1     | success  | yes |
|  58 | services | married  |  primary  |    no   |    65   |    no   |  no  | unknown  |  27 |  jun  |   162    |    1     |   -1  |    0     | unknown  |  no |
+-----+----------+----------+-----------+---------+---------+---------+------+----------+-----+-------+----------+----------+-------+----------+----------+-----+
Top 5 youngest in services
+-----+----------+---------+-----------+---------+---------+---------+------+-----------+-----+-------+----------+----------+-------+----------+----------+----+
| age |   job    | marital | education | default | balance | housing | loan |  contact  | day | month | duration | campaign | pdays | previous | poutcome | y  |
+-----+----------+---------+-----------+---------+---------+---------+------+-----------+-----+-------+----------+----------+-------+----------+----------+----+
|  21 | services |  single | secondary |    no   |   361   |    no   |  no  | telephone |  5  |  jun  |   329    |    1     |   95  |    1     |  other   | no |
|  21 | services |  single | secondary |    no   |   1903  |   yes   |  no  |  unknown  |  29 |  may  |   107    |    2     |   -1  |    0     | unknown  | no |
|  23 | services |  single |  tertiary |    no   |   363   |   yes   |  no  |  unknown  |  30 |  may  |    16    |    18    |   -1  |    0     | unknown  | no |
|  23 | services | married | secondary |    no   |   -19   |   yes   |  no  |  unknown  |  20 |  jun  |    13    |    10    |   -1  |    0     | unknown  | no |
|  23 | services |  single | secondary |    no   |   500   |   yes   |  no  |  unknown  |  4  |  jun  |   111    |    3     |   -1  |    0     | unknown  | no |
+-----+----------+---------+-----------+---------+---------+---------+------+-----------+-----+-------+----------+----------+-------+----------+----------+----+
Top 5 oldest in management
+-----+------------+---------+-----------+---------+---------+---------+------+-----------+-----+-------+----------+----------+-------+----------+----------+----+
| age |    job     | marital | education | default | balance | housing | loan |  contact  | day | month | duration | campaign | pdays | previous | poutcome | y  |
+-----+------------+---------+-----------+---------+---------+---------+------+-----------+-----+-------+----------+----------+-------+----------+----------+----+
|  80 | management | married |  primary  |    no   |   6483  |    no   |  no  | telephone |  19 |  oct  |   123    |    1     |   -1  |    0     | unknown  | no |
|  66 | management | married |  tertiary |    no   |   1048  |   yes   |  no  |  cellular |  23 |  jun  |   971    |    2     |   -1  |    0     | unknown  | no |
|  66 | management | married |  tertiary |    no   |   2169  |    no   |  no  |  cellular |  7  |  jul  |   282    |    1     |   -1  |    0     | unknown  | no |
|  66 | management |  single |  tertiary |    no   |   6483  |    no   |  no  |  cellular |  13 |  aug  |   275    |    1     |   -1  |    0     | unknown  | no |
|  64 | management | married |  tertiary |    no   |   679   |    no   |  no  | telephone |  22 |  apr  |   135    |    3     |   -1  |    0     | unknown  | no |
+-----+------------+---------+-----------+---------+---------+---------+------+-----------+-----+-------+----------+----------+-------+----------+----------+----+
Top 5 youngest in management
+-----+------------+---------+-----------+---------+---------+---------+------+----------+-----+-------+----------+----------+-------+----------+----------+-----+
| age |    job     | marital | education | default | balance | housing | loan | contact  | day | month | duration | campaign | pdays | previous | poutcome |  y  |
+-----+------------+---------+-----------+---------+---------+---------+------+----------+-----+-------+----------+----------+-------+----------+----------+-----+
|  23 | management |  single |  tertiary |    no   |   736   |   yes   |  no  | cellular |  30 |  apr  |   119    |    1     |   -1  |    0     | unknown  |  no |
|  24 | management |  single |  tertiary |    no   |   172   |    no   |  no  | cellular |  3  |  jul  |    67    |    2     |   -1  |    0     | unknown  |  no |
|  25 | management |  single |  tertiary |    no   |   553   |   yes   |  no  | cellular |  17 |  sep  |   113    |    4     |  479  |    1     | failure  |  no |
|  25 | management |  single |  tertiary |    no   |   2506  |    no   |  no  | cellular |  17 |  feb  |   485    |    1     |   -1  |    0     | unknown  | yes |
|  25 | management |  single |  tertiary |    no   |   808   |    no   |  no  | cellular |  18 |  sep  |   267    |    2     |  114  |    2     | failure  | yes |
+-----+------------+---------+-----------+---------+---------+---------+------+----------+-----+-------+----------+----------+-------+----------+----------+-----+

Mini Exercise:#

The marketing team is interested in understanding the distribution of age for their marketing campaigns. Select the top 5 oldest and youngest people who have been part of the marketing campaign (campaign column).

Your task: use a loop with variable expansion to accomplish this.

Answers

campaign = %sql SELECT DISTINCT campaign FROM bank
campaign = campaign.DataFrame()

campaigns = campaign["campaign"].tolist()[:2]

for campaign in campaigns:
    print(f"Top 5 oldest in campaign {campaign}")
    order = "DESC"
    oldest_camppaing = %sql SELECT * FROM bank WHERE campaign={{campaign}} ORDER BY age {{order}} LIMIT 5
    print(oldest_camppaing)

    print(f"Top 5 youngest in campaign {campaign}")
    order = "ASC"
    youngest_campaign = %sql SELECT * FROM bank WHERE campaign={{campaign}} ORDER BY age {{order}} LIMIT 5
    print(youngest_campaign)

Top 5 oldest in campaign 1
+-----+------------+----------+-----------+---------+---------+---------+------+-----------+-----+-------+----------+----------+-------+----------+----------+-----+
| age |    job     | marital  | education | default | balance | housing | loan |  contact  | day | month | duration | campaign | pdays | previous | poutcome |  y  |
+-----+------------+----------+-----------+---------+---------+---------+------+-----------+-----+-------+----------+----------+-------+----------+----------+-----+
|  87 |  retired   | married  |  primary  |    no   |   230   |    no   |  no  |  cellular |  30 |  oct  |   144    |    1     |   -1  |    0     | unknown  | yes |
|  83 |  retired   | married  |  primary  |    no   |   425   |    no   |  no  | telephone |  22 |  jun  |   883    |    1     |  140  |    1     | failure  | yes |
|  83 |  retired   | divorced |  primary  |    no   |    0    |    no   |  no  | telephone |  31 |  may  |   664    |    1     |   77  |    3     | success  |  no |
|  83 |  retired   | divorced |  primary  |    no   |   1097  |    no   |  no  | telephone |  5  |  mar  |   181    |    1     |   -1  |    0     | unknown  | yes |
|  80 | management | married  |  primary  |    no   |   6483  |    no   |  no  | telephone |  19 |  oct  |   123    |    1     |   -1  |    0     | unknown  |  no |
+-----+------------+----------+-----------+---------+---------+---------+------+-----------+-----+-------+----------+----------+-------+----------+----------+-----+
Top 5 youngest in campaign 1
+-----+----------+---------+-----------+---------+---------+---------+------+-----------+-----+-------+----------+----------+-------+----------+----------+-----+
| age |   job    | marital | education | default | balance | housing | loan |  contact  | day | month | duration | campaign | pdays | previous | poutcome |  y  |
+-----+----------+---------+-----------+---------+---------+---------+------+-----------+-----+-------+----------+----------+-------+----------+----------+-----+
|  19 | student  |  single | secondary |    no   |   302   |    no   |  no  |  cellular |  16 |  jul  |   205    |    1     |   -1  |    0     | unknown  | yes |
|  20 | student  |  single | secondary |    no   |   502   |    no   |  no  |  cellular |  30 |  apr  |   261    |    1     |   -1  |    0     | unknown  | yes |
|  20 | student  |  single | secondary |    no   |   1191  |    no   |  no  |  cellular |  12 |  feb  |   274    |    1     |   -1  |    0     | unknown  |  no |
|  21 | services |  single | secondary |    no   |   361   |    no   |  no  | telephone |  5  |  jun  |   329    |    1     |   95  |    1     |  other   |  no |
|  21 | student  |  single | secondary |    no   |   681   |    no   |  no  |  unknown  |  20 |  aug  |    6     |    1     |   -1  |    0     | unknown  |  no |
+-----+----------+---------+-----------+---------+---------+---------+------+-----------+-----+-------+----------+----------+-------+----------+----------+-----+
Top 5 oldest in campaign 4
+-----+---------------+---------+-----------+---------+---------+---------+------+-----------+-----+-------+----------+----------+-------+----------+----------+-----+
| age |      job      | marital | education | default | balance | housing | loan |  contact  | day | month | duration | campaign | pdays | previous | poutcome |  y  |
+-----+---------------+---------+-----------+---------+---------+---------+------+-----------+-----+-------+----------+----------+-------+----------+----------+-----+
|  78 |   housemaid   | married | secondary |    no   |   499   |    no   |  no  | telephone |  16 |  mar  |    80    |    4     |   -1  |    0     | unknown  |  no |
|  77 |    retired    | married |  primary  |    no   |   680   |    no   |  no  | telephone |  27 |  nov  |   341    |    4     |   94  |    3     | failure  |  no |
|  66 |     admin.    |  single | secondary |    no   |  10177  |    no   |  no  |  cellular |  5  |  may  |   272    |    4     |   -1  |    0     | unknown  |  no |
|  61 |   technician  | married | secondary |    no   |   1191  |    no   |  no  |  cellular |  21 |  may  |   214    |    4     |   -1  |    0     | unknown  | yes |
|  60 | self-employed |  single |  tertiary |    no   |   179   |    no   | yes  |  cellular |  29 |  jul  |    79    |    4     |   -1  |    0     | unknown  |  no |
+-----+---------------+---------+-----------+---------+---------+---------+------+-----------+-----+-------+----------+----------+-------+----------+----------+-----+
Top 5 youngest in campaign 4
+-----+-------------+---------+-----------+---------+---------+---------+------+-----------+-----+-------+----------+----------+-------+----------+----------+-----+
| age |     job     | marital | education | default | balance | housing | loan |  contact  | day | month | duration | campaign | pdays | previous | poutcome |  y  |
+-----+-------------+---------+-----------+---------+---------+---------+------+-----------+-----+-------+----------+----------+-------+----------+----------+-----+
|  24 | blue-collar |  single |  primary  |    no   |   111   |   yes   | yes  |  unknown  |  27 |  may  |    42    |    4     |   -1  |    0     | unknown  |  no |
|  25 |   student   |  single | secondary |    no   |   331   |    no   |  no  | telephone |  26 |  aug  |   170    |    4     |   -1  |    0     | unknown  | yes |
|  25 |  management |  single |  tertiary |    no   |   553   |   yes   |  no  |  cellular |  17 |  sep  |   113    |    4     |  479  |    1     | failure  |  no |
|  25 | blue-collar |  single | secondary |    no   |   913   |   yes   |  no  |  cellular |  7  |  jul  |   113    |    4     |   -1  |    0     | unknown  |  no |
|  25 |  management | married |  tertiary |    no   |   -242  |   yes   | yes  |  cellular |  18 |  may  |   600    |    4     |  332  |    4     |  other   | yes |
+-----+-------------+---------+-----------+---------+---------+---------+------+-----------+-----+-------+----------+----------+-------+----------+----------+-----+

Variable Expansion Using Loops#

We can also use loops inside SQL to parameterize our queries. For example, let’s calculate the average age for each job role in the bank table.

The query below will iterate over four kinds of jobs: blue collar, technician, services and management. For each job, it will compute the average age for people with that job and create a new column for the average of each job.

The resulting query can be reused as it is being saved as avg_age with the --save option.

%%sql --save avg_age
{% set jobs = [ "blue_collar", "technician", "services", "management"] %}
select
    {% for job in jobs %}
    avg(case when job = '{{job.replace('_', '-')}}' then age end) as avg_age_{{job}},
    {% endfor %}
from bank

avg_age_blue_collar	avg_age_technician	avg_age_services	avg_age_management
40.15644820295983	39.470052083333336	38.57074340527578	40.54076367389061

We can see the final compiled query using:

final = %sqlrender avg_age
print(final)

select
    
    avg(case when job = 'blue-collar' then age end) as avg_age_blue_collar,
    
    avg(case when job = 'technician' then age end) as avg_age_technician,
    
    avg(case when job = 'services' then age end) as avg_age_services,
    
    avg(case when job = 'management' then age end) as avg_age_management,
    
from bank

/home/docs/checkouts/readthedocs.org/user_builds/ploomber-sql/conda/latest/lib/python3.11/site-packages/sql/magic.py:84: FutureWarning: 
'%sqlrender' will be deprecated soon, please use '%sqlcmd snippets avg_age' instead. 

For documentation, follow this link : https://jupysql.ploomber.io/en/latest/api/magic-snippets.html#id1
  warnings.warn(

Sample case#

The HR department would like to determine the average balance for employees by job (for jobs technician, services and management) and education level. We can achieve this by using the job and education columns from the bank table.

%%sql --save avg_balance
{% set jobs = [ "technician", "services", "management"] %}
{% set education_levels = [ "primary", "secondary", "tertiary", "unknown"] %}
select
    {% for job in jobs %}
    {% for education in education_levels %}
    avg(case when job = '{{job}}' and education = '{{education}}' then balance end) as avg_balance_{{job}}_{{education}},
    {% endfor %}
    {% endfor %}
from bank

avg_balance_technician_primary	avg_balance_technician_secondary	avg_balance_technician_tertiary	avg_balance_technician_unknown	avg_balance_services_primary	avg_balance_services_secondary	avg_balance_services_tertiary	avg_balance_services_unknown	avg_balance_management_primary	avg_balance_management_secondary	avg_balance_management_tertiary	avg_balance_management_unknown
2593.0	1153.6076923076923	1631.6303317535544	1780.0	1107.32	998.8842975206611	1894.875	3058.0	2685.4102564102564	1250.103448275862	1776.3430749682339	2386.259259259259

Mini Exercise:#

Calculate the count of people for each job role for jobs “blue_collar”, “technician”, “services”, “management”.

Your task: use Variable Expansion Using Loops and the %%sql magic

Answers

%%sql --save count_jobs
{% set jobs = [ "blue_collar", "technician", "services", "management"] %}
select
    {% for job in jobs %}
    count(case when job = '{{job.replace('_', '-')}}' then 1 end) as count_{{job}},
    {% endfor %}
from bank

count_blue_collar	count_technician	count_services	count_management
946	768	417	969

Macros + Variable Expansion#

Macros is a construct analogous to functions that promote re-usability. We define a macro using the {% macro %} tag and use this macro in the query using variable expansion.

For example, let’s define a macro to convert a value from days to years, and use this macro in our query. This macro will be called days_to_years and it takes as input the column_name and precision. With the default precision set to 2.

We can obtain the number of years by dividing the number of days by 365.

We can then use the days_to_years macro in the query, where we are passing age as the input column.

%%sql --save convert
{% macro days_to_years(column_name, precision=2) %}
    ({{ column_name }} / 365)::numeric(16, {{ precision }})
{% endmacro %}

select
  job, marital,
  {{ days_to_years('age') }} as age_in_years
from bank

job	marital	age_in_years
unemployed	married	0.08
services	married	0.09
management	single	0.10
management	married	0.08
blue-collar	married	0.16
management	single	0.10
self-employed	married	0.10
technician	married	0.11
entrepreneur	married	0.11
services	married	0.12

Truncated to displaylimit of 10.

In this query, we’re converting age from days to years using our macro. The results are saved in a variable convert.

The final rendered query can be viewed using:

final = %sqlrender convert
print(final)

select
  job, marital,
  
    (age / 365)::numeric(16, 2)
 as age_in_years
from bank

/home/docs/checkouts/readthedocs.org/user_builds/ploomber-sql/conda/latest/lib/python3.11/site-packages/sql/magic.py:84: FutureWarning: 
'%sqlrender' will be deprecated soon, please use '%sqlcmd snippets convert' instead. 

For documentation, follow this link : https://jupysql.ploomber.io/en/latest/api/magic-snippets.html#id1
  warnings.warn(

Sample usage#

Let’s use macros to handle NULL values. The macro below checks if the chosen column has NULL values.

The code block defines a macro that will be used later in the SQL query. The macro is named months_to_years_handle_null, and its purpose is to convert the values in a specified column from months to years. Additionally, this macro is designed to handle null values in a column, hence its name.

If the value in the column is null, the macro will return NULL; otherwise, it will convert the value from months to years.

%%sql --save convert_duration_handle_null
{% macro months_to_years_handle_null(column_name, precision=2) %}
    (case when {{ column_name }} is not null then {{ column_name }} / 12 else null end)::numeric(16, {{ precision }})
{% endmacro %}

select
  job, marital,
  {{ months_to_years_handle_null('duration') }} as duration_in_years
from bank

job	marital	duration_in_years
unemployed	married	6.58
services	married	18.33
management	single	15.42
management	married	16.58
blue-collar	married	18.83
management	single	11.75
self-employed	married	28.42
technician	married	12.58
entrepreneur	married	4.75
services	married	26.08

Truncated to displaylimit of 10.

final = %sqlrender convert_duration_handle_null
print(final)

select
  job, marital,
  
    (case when duration is not null then duration / 12 else null end)::numeric(16, 2)
 as duration_in_years
from bank

/home/docs/checkouts/readthedocs.org/user_builds/ploomber-sql/conda/latest/lib/python3.11/site-packages/sql/magic.py:84: FutureWarning: 
'%sqlrender' will be deprecated soon, please use '%sqlcmd snippets convert_duration_handle_null' instead. 

For documentation, follow this link : https://jupysql.ploomber.io/en/latest/api/magic-snippets.html#id1
  warnings.warn(

Mini exercise#

Use the months_to_years_handle_null macro to iterate over the jobs: “blue_collar”, “technician”, “services”, “management”. For each job in the loop, calculate the average duration for the specified job. Save this query as avg_duration. Display the query.

Answers

This modification is using the months_to_years_handle_null macro that you had defined earlier. This macro will convert duration from months to years for each job category, while also handling null values. The macro will apply to the average duration of each job category (where the job is equal to the current job in the loop). The output for each job category is then given an alias that indicates the job category and that it represents average duration in years.

Just like before, the code below will dynamically generate the SQL query, and the –save directive will save the query result into the avg_duration variable.

%%sql --save avg_duration
{% macro months_to_years_handle_null(column_name, precision=2) %}
    (case when {{ column_name }} is not null then {{ column_name }} / 12 else null end)::numeric(16, {{ precision }})
{% endmacro %}
{% set jobs = [ "blue_collar", "technician", "services", "management"] %}
select
    {% for job in jobs %}
    {{ months_to_years_handle_null('avg(case when job = \'' + job.replace('_', '-') + '\' then duration end)') }} as avg_duration_{{job}},
    {% endfor %}
from bank

avg_duration_blue_collar	avg_duration_technician	avg_duration_services	avg_duration_management
23.18	21.01	21.87	21.71

Display the query:

final = %sqlrender avg_duration
print(final)

select
    
    (case when avg(case when job = 'blue-collar' then duration end) is not null then avg(case when job = 'blue-collar' then duration end) / 12 else null end)::numeric(16, 2)
 as avg_duration_blue_collar,
    
    (case when avg(case when job = 'technician' then duration end) is not null then avg(case when job = 'technician' then duration end) / 12 else null end)::numeric(16, 2)
 as avg_duration_technician,
    
    (case when avg(case when job = 'services' then duration end) is not null then avg(case when job = 'services' then duration end) / 12 else null end)::numeric(16, 2)
 as avg_duration_services,
    
    (case when avg(case when job = 'management' then duration end) is not null then avg(case when job = 'management' then duration end) / 12 else null end)::numeric(16, 2)
 as avg_duration_management,
    
from bank

/home/docs/checkouts/readthedocs.org/user_builds/ploomber-sql/conda/latest/lib/python3.11/site-packages/sql/magic.py:84: FutureWarning: 
'%sqlrender' will be deprecated soon, please use '%sqlcmd snippets avg_duration' instead. 

For documentation, follow this link : https://jupysql.ploomber.io/en/latest/api/magic-snippets.html#id1
  warnings.warn(

Creating multiple tables dynamically#

We’ll finish off this section by showing you how you can generate multiple tables dynamically.

Your task now is to create separate tables for each job. This can easily be accomplished using a loop and variable expansion:

%%sql
DROP TABLE IF EXISTS services;
DROP TABLE IF EXISTS management;

Success

jobs = ["services", "management"]
for job in jobs:
    %sql CREATE TABLE {{job}} AS (SELECT * from bank WHERE job = '{{job}}')

Let’s check the tables in our schema:

%sqlcmd tables

Name
management
services
bank

Delete table

%%sql
DROP TABLE bank;
DROP TABLE services;
DROP TABLE management;

Success

Conclusion#

In this module, we have explored how to parameterize SQL queries and effectively integrate with Python for a more interactive data analysis workflow within Jupyter notebooks. This method is facilitated using JupySQL and DuckDB. We’ve also demonstrated how to dynamically create and use variables within SQL queries, further enhancing the flexibility and interactivity of the notebooks.

We delved into the basics of variable expansion, using loops within SQL to iterate over arrays, and how to define and use macros to promote reusability and reduce redundancy. We further illustrated the potential of these techniques through several practical examples, including determining the top five oldest and youngest employees in various jobs, calculating average ages and balances by job and education level, handling NULL values with macros, and dynamically creating multiple tables.

Through these exercises, it has been made evident that integrating SQL into Jupyter notebooks provides an intuitive and powerful approach to data exploration and analysis. The techniques demonstrated in this module can be widely applied across various datasets, making it a useful skillset for data scientists and analysts. By merging the strengths of Python’s flexibility and SQL’s data management capabilities, we can create more powerful and interactive data workflows.

In the next section, you’ll learn how to combine paramaterization with ipywidgets.

Parameterizing Queries

Contents

Parameterizing Queries#

Set up and data access#

Variable Expansion#

Sample case#

Mini Exercise:#

Variable Expansion Using Loops#

Sample case#

Mini Exercise:#

Macros + Variable Expansion#

Sample usage#

Mini exercise#

Creating multiple tables dynamically#

Conclusion#