Parameterizing Queries#
In the last module, you learned how to use SQL within Jupyter notebooks with JupySQL, and you learned how you can combine it with usage of widgets to create interactive graphical user interfaces (GUIs) for your SQL queries. This module will take you a step further. Here, we will discuss how to parameterize your SQL queries and effectively utilize ipywidgets to create more interactive data workflows. Let’s dive in!
Set up and data access#
Important
Note: The –save and %sqlcmd features used require the latest JupySQL version. Ensure you run the code below.
This code installs JupySQL, and DuckDB in your environment. We will be using these moving forward.
%pip install jupysql jupysql-plugin --quiet
Note: you may need to restart the kernel to use updated packages.
We continue to work with the Bank and Marketing data set.
Important
Source: UCI Machine Learning Repository
URL: https://archive-beta.ics.uci.edu/dataset/222/bank+marketing
Data Citation
Moro,S., Rita,P., and Cortez,P.. (2012). Bank Marketing. UCI Machine Learning Repository. https://doi.org/10.24432/C5K306.
We can use the following function to extract the downloaded data from the UCI repository.
import sys
sys.path.insert(0, "../../")
import banking # noqa: E402
_ = banking.BankingData("https://tinyurl.com/jb-bank", "bank")
_.extract_to_csv()
Initialize a DuckDB Instance:
# Loading in SQL extension
%reload_ext sql
# Initiating a DuckDB database named 'bank.duck.db' to run our SQL queries on
%sql duckdb:///bank.duck.db
Config | value |
---|---|
displaycon | False |
feedback | True |
autopandas | False |
named_parameters | True |
Load the data
%%sql
CREATE OR REPLACE TABLE bank AS
FROM read_csv_auto('bank_cleaned.csv', header=True, sep=',')
Count |
---|
We confirm the table was loaded
%sqlcmd tables
Name |
---|
bank |
We can use JupySQL’s Table Explorer to take a quick look at the table.
%sqlcmd explore --table bank
%sqlcmd columns -t bank
name | type | nullable | default | autoincrement | comment |
---|---|---|---|---|---|
age | BIGINT | True | None | False | None |
job | VARCHAR | True | None | False | None |
marital | VARCHAR | True | None | False | None |
education | VARCHAR | True | None | False | None |
default | VARCHAR | True | None | False | None |
balance | BIGINT | True | None | False | None |
housing | VARCHAR | True | None | False | None |
loan | VARCHAR | True | None | False | None |
contact | VARCHAR | True | None | False | None |
day | BIGINT | True | None | False | None |
month | VARCHAR | True | None | False | None |
duration | BIGINT | True | None | False | None |
campaign | BIGINT | True | None | False | None |
pdays | BIGINT | True | None | False | None |
previous | BIGINT | True | None | False | None |
poutcome | VARCHAR | True | None | False | None |
y | VARCHAR | True | None | False | None |
Variable Expansion#
JupySQL supports variable expansion in the form of {{variable}}
. This allows you to write a query with placeholders that can be replaced by variables dynamically. The benefits of using parameterized SQL queries are that they can be reused with different values, prepared ahead of time, and used with dynamic data.
Let’s start with a simple query template with placeholders, and substitute the placeholders with a couple of variables using variable expansion.
We are going to select two columns (age
and job
) from the table bank
with the variable dynamic_column
and show only 5 rows. We can control this via another variable dynamic_limit
.
dynamic_limit = 5
dynamic_column = "age, job"
%sql SELECT {{dynamic_column}} FROM bank LIMIT {{dynamic_limit}}
age | job |
---|---|
30 | unemployed |
33 | services |
35 | management |
30 | management |
59 | blue-collar |
Sample case#
The HR department would like to determine top five oldest and youngest employees in two different jobs: services, management.
We can combine Python with SQL code using the %sql
magic. With Python we can iterate over the jobs in the list jobs
. With parameterization, we can
then insert the job into the SQL query via the {{variable}}
notation.
jobs = ["services", "management"]
for job in jobs:
print(f"Top 5 oldest in {job}")
order = "DESC"
o_ = %sql SELECT * FROM bank WHERE job='{{job}}' ORDER BY age {{order}} LIMIT 5
print(o_)
print(f"Top 5 youngest in {job}")
order = "ASC"
y_ = %sql SELECT * FROM bank WHERE job='{{job}}' ORDER BY age {{order}} LIMIT 5
print(y_)
Top 5 oldest in services
+-----+----------+----------+-----------+---------+---------+---------+------+----------+-----+-------+----------+----------+-------+----------+----------+-----+
| age | job | marital | education | default | balance | housing | loan | contact | day | month | duration | campaign | pdays | previous | poutcome | y |
+-----+----------+----------+-----------+---------+---------+---------+------+----------+-----+-------+----------+----------+-------+----------+----------+-----+
| 60 | services | married | tertiary | no | 4256 | yes | no | cellular | 16 | aug | 239 | 3 | 761 | 1 | other | yes |
| 59 | services | married | secondary | no | 204 | no | no | cellular | 23 | jul | 286 | 2 | -1 | 0 | unknown | no |
| 59 | services | married | secondary | no | 0 | yes | no | cellular | 12 | may | 214 | 3 | -1 | 0 | unknown | no |
| 58 | services | divorced | secondary | no | 462 | no | no | cellular | 13 | nov | 495 | 1 | 93 | 1 | success | yes |
| 58 | services | married | primary | no | 65 | no | no | unknown | 27 | jun | 162 | 1 | -1 | 0 | unknown | no |
+-----+----------+----------+-----------+---------+---------+---------+------+----------+-----+-------+----------+----------+-------+----------+----------+-----+
Top 5 youngest in services
+-----+----------+---------+-----------+---------+---------+---------+------+-----------+-----+-------+----------+----------+-------+----------+----------+----+
| age | job | marital | education | default | balance | housing | loan | contact | day | month | duration | campaign | pdays | previous | poutcome | y |
+-----+----------+---------+-----------+---------+---------+---------+------+-----------+-----+-------+----------+----------+-------+----------+----------+----+
| 21 | services | single | secondary | no | 361 | no | no | telephone | 5 | jun | 329 | 1 | 95 | 1 | other | no |
| 21 | services | single | secondary | no | 1903 | yes | no | unknown | 29 | may | 107 | 2 | -1 | 0 | unknown | no |
| 23 | services | single | tertiary | no | 363 | yes | no | unknown | 30 | may | 16 | 18 | -1 | 0 | unknown | no |
| 23 | services | married | secondary | no | -19 | yes | no | unknown | 20 | jun | 13 | 10 | -1 | 0 | unknown | no |
| 23 | services | single | secondary | no | 500 | yes | no | unknown | 4 | jun | 111 | 3 | -1 | 0 | unknown | no |
+-----+----------+---------+-----------+---------+---------+---------+------+-----------+-----+-------+----------+----------+-------+----------+----------+----+
Top 5 oldest in management
+-----+------------+---------+-----------+---------+---------+---------+------+-----------+-----+-------+----------+----------+-------+----------+----------+----+
| age | job | marital | education | default | balance | housing | loan | contact | day | month | duration | campaign | pdays | previous | poutcome | y |
+-----+------------+---------+-----------+---------+---------+---------+------+-----------+-----+-------+----------+----------+-------+----------+----------+----+
| 80 | management | married | primary | no | 6483 | no | no | telephone | 19 | oct | 123 | 1 | -1 | 0 | unknown | no |
| 66 | management | married | tertiary | no | 1048 | yes | no | cellular | 23 | jun | 971 | 2 | -1 | 0 | unknown | no |
| 66 | management | married | tertiary | no | 2169 | no | no | cellular | 7 | jul | 282 | 1 | -1 | 0 | unknown | no |
| 66 | management | single | tertiary | no | 6483 | no | no | cellular | 13 | aug | 275 | 1 | -1 | 0 | unknown | no |
| 64 | management | married | tertiary | no | 679 | no | no | telephone | 22 | apr | 135 | 3 | -1 | 0 | unknown | no |
+-----+------------+---------+-----------+---------+---------+---------+------+-----------+-----+-------+----------+----------+-------+----------+----------+----+
Top 5 youngest in management
+-----+------------+---------+-----------+---------+---------+---------+------+----------+-----+-------+----------+----------+-------+----------+----------+-----+
| age | job | marital | education | default | balance | housing | loan | contact | day | month | duration | campaign | pdays | previous | poutcome | y |
+-----+------------+---------+-----------+---------+---------+---------+------+----------+-----+-------+----------+----------+-------+----------+----------+-----+
| 23 | management | single | tertiary | no | 736 | yes | no | cellular | 30 | apr | 119 | 1 | -1 | 0 | unknown | no |
| 24 | management | single | tertiary | no | 172 | no | no | cellular | 3 | jul | 67 | 2 | -1 | 0 | unknown | no |
| 25 | management | single | tertiary | no | 553 | yes | no | cellular | 17 | sep | 113 | 4 | 479 | 1 | failure | no |
| 25 | management | single | tertiary | no | 2506 | no | no | cellular | 17 | feb | 485 | 1 | -1 | 0 | unknown | yes |
| 25 | management | single | tertiary | no | 808 | no | no | cellular | 18 | sep | 267 | 2 | 114 | 2 | failure | yes |
+-----+------------+---------+-----------+---------+---------+---------+------+----------+-----+-------+----------+----------+-------+----------+----------+-----+
Mini Exercise:#
The marketing team is interested in understanding the distribution of age for their marketing campaigns. Select the top 5 oldest and youngest people who have been part of the marketing campaign (campaign
column).
Your task: use a loop with variable expansion to accomplish this.
Answers
campaign = %sql SELECT DISTINCT campaign FROM bank
campaign = campaign.DataFrame()
campaigns = campaign["campaign"].tolist()[:2]
for campaign in campaigns:
print(f"Top 5 oldest in campaign {campaign}")
order = "DESC"
oldest_camppaing = %sql SELECT * FROM bank WHERE campaign={{campaign}} ORDER BY age {{order}} LIMIT 5
print(oldest_camppaing)
print(f"Top 5 youngest in campaign {campaign}")
order = "ASC"
youngest_campaign = %sql SELECT * FROM bank WHERE campaign={{campaign}} ORDER BY age {{order}} LIMIT 5
print(youngest_campaign)
Top 5 oldest in campaign 1
+-----+------------+----------+-----------+---------+---------+---------+------+-----------+-----+-------+----------+----------+-------+----------+----------+-----+
| age | job | marital | education | default | balance | housing | loan | contact | day | month | duration | campaign | pdays | previous | poutcome | y |
+-----+------------+----------+-----------+---------+---------+---------+------+-----------+-----+-------+----------+----------+-------+----------+----------+-----+
| 87 | retired | married | primary | no | 230 | no | no | cellular | 30 | oct | 144 | 1 | -1 | 0 | unknown | yes |
| 83 | retired | married | primary | no | 425 | no | no | telephone | 22 | jun | 883 | 1 | 140 | 1 | failure | yes |
| 83 | retired | divorced | primary | no | 0 | no | no | telephone | 31 | may | 664 | 1 | 77 | 3 | success | no |
| 83 | retired | divorced | primary | no | 1097 | no | no | telephone | 5 | mar | 181 | 1 | -1 | 0 | unknown | yes |
| 80 | management | married | primary | no | 6483 | no | no | telephone | 19 | oct | 123 | 1 | -1 | 0 | unknown | no |
+-----+------------+----------+-----------+---------+---------+---------+------+-----------+-----+-------+----------+----------+-------+----------+----------+-----+
Top 5 youngest in campaign 1
+-----+----------+---------+-----------+---------+---------+---------+------+-----------+-----+-------+----------+----------+-------+----------+----------+-----+
| age | job | marital | education | default | balance | housing | loan | contact | day | month | duration | campaign | pdays | previous | poutcome | y |
+-----+----------+---------+-----------+---------+---------+---------+------+-----------+-----+-------+----------+----------+-------+----------+----------+-----+
| 19 | student | single | secondary | no | 302 | no | no | cellular | 16 | jul | 205 | 1 | -1 | 0 | unknown | yes |
| 20 | student | single | secondary | no | 502 | no | no | cellular | 30 | apr | 261 | 1 | -1 | 0 | unknown | yes |
| 20 | student | single | secondary | no | 1191 | no | no | cellular | 12 | feb | 274 | 1 | -1 | 0 | unknown | no |
| 21 | services | single | secondary | no | 361 | no | no | telephone | 5 | jun | 329 | 1 | 95 | 1 | other | no |
| 21 | student | single | secondary | no | 681 | no | no | unknown | 20 | aug | 6 | 1 | -1 | 0 | unknown | no |
+-----+----------+---------+-----------+---------+---------+---------+------+-----------+-----+-------+----------+----------+-------+----------+----------+-----+
Top 5 oldest in campaign 4
+-----+---------------+---------+-----------+---------+---------+---------+------+-----------+-----+-------+----------+----------+-------+----------+----------+-----+
| age | job | marital | education | default | balance | housing | loan | contact | day | month | duration | campaign | pdays | previous | poutcome | y |
+-----+---------------+---------+-----------+---------+---------+---------+------+-----------+-----+-------+----------+----------+-------+----------+----------+-----+
| 78 | housemaid | married | secondary | no | 499 | no | no | telephone | 16 | mar | 80 | 4 | -1 | 0 | unknown | no |
| 77 | retired | married | primary | no | 680 | no | no | telephone | 27 | nov | 341 | 4 | 94 | 3 | failure | no |
| 66 | admin. | single | secondary | no | 10177 | no | no | cellular | 5 | may | 272 | 4 | -1 | 0 | unknown | no |
| 61 | technician | married | secondary | no | 1191 | no | no | cellular | 21 | may | 214 | 4 | -1 | 0 | unknown | yes |
| 60 | self-employed | single | tertiary | no | 179 | no | yes | cellular | 29 | jul | 79 | 4 | -1 | 0 | unknown | no |
+-----+---------------+---------+-----------+---------+---------+---------+------+-----------+-----+-------+----------+----------+-------+----------+----------+-----+
Top 5 youngest in campaign 4
+-----+-------------+---------+-----------+---------+---------+---------+------+-----------+-----+-------+----------+----------+-------+----------+----------+-----+
| age | job | marital | education | default | balance | housing | loan | contact | day | month | duration | campaign | pdays | previous | poutcome | y |
+-----+-------------+---------+-----------+---------+---------+---------+------+-----------+-----+-------+----------+----------+-------+----------+----------+-----+
| 24 | blue-collar | single | primary | no | 111 | yes | yes | unknown | 27 | may | 42 | 4 | -1 | 0 | unknown | no |
| 25 | student | single | secondary | no | 331 | no | no | telephone | 26 | aug | 170 | 4 | -1 | 0 | unknown | yes |
| 25 | management | single | tertiary | no | 553 | yes | no | cellular | 17 | sep | 113 | 4 | 479 | 1 | failure | no |
| 25 | blue-collar | single | secondary | no | 913 | yes | no | cellular | 7 | jul | 113 | 4 | -1 | 0 | unknown | no |
| 25 | management | married | tertiary | no | -242 | yes | yes | cellular | 18 | may | 600 | 4 | 332 | 4 | other | yes |
+-----+-------------+---------+-----------+---------+---------+---------+------+-----------+-----+-------+----------+----------+-------+----------+----------+-----+
Variable Expansion Using Loops#
We can also use loops inside SQL to parameterize our queries. For example, let’s calculate the average age for each job role in the bank
table.
The query below will iterate over four kinds of jobs: blue collar, technician, services and management. For each job, it will compute the average age for people with that job and create a new column for the average of each job.
The resulting query can be reused as it is being saved as avg_age
with the --save
option.
%%sql --save avg_age
{% set jobs = [ "blue_collar", "technician", "services", "management"] %}
select
{% for job in jobs %}
avg(case when job = '{{job.replace('_', '-')}}' then age end) as avg_age_{{job}},
{% endfor %}
from bank
avg_age_blue_collar | avg_age_technician | avg_age_services | avg_age_management |
---|---|---|---|
40.15644820295983 | 39.470052083333336 | 38.57074340527578 | 40.54076367389061 |
We can see the final compiled query using:
final = %sqlrender avg_age
print(final)
select
avg(case when job = 'blue-collar' then age end) as avg_age_blue_collar,
avg(case when job = 'technician' then age end) as avg_age_technician,
avg(case when job = 'services' then age end) as avg_age_services,
avg(case when job = 'management' then age end) as avg_age_management,
from bank
/home/docs/checkouts/readthedocs.org/user_builds/ploomber-sql/conda/latest/lib/python3.11/site-packages/sql/magic.py:84: FutureWarning:
'%sqlrender' will be deprecated soon, please use '%sqlcmd snippets avg_age' instead.
For documentation, follow this link : https://jupysql.ploomber.io/en/latest/api/magic-snippets.html#id1
warnings.warn(
Sample case#
The HR department would like to determine the average balance for employees by job (for jobs technician, services and management) and education level. We can achieve this by using the job
and education
columns from the bank
table.
%%sql --save avg_balance
{% set jobs = [ "technician", "services", "management"] %}
{% set education_levels = [ "primary", "secondary", "tertiary", "unknown"] %}
select
{% for job in jobs %}
{% for education in education_levels %}
avg(case when job = '{{job}}' and education = '{{education}}' then balance end) as avg_balance_{{job}}_{{education}},
{% endfor %}
{% endfor %}
from bank
avg_balance_technician_primary | avg_balance_technician_secondary | avg_balance_technician_tertiary | avg_balance_technician_unknown | avg_balance_services_primary | avg_balance_services_secondary | avg_balance_services_tertiary | avg_balance_services_unknown | avg_balance_management_primary | avg_balance_management_secondary | avg_balance_management_tertiary | avg_balance_management_unknown |
---|---|---|---|---|---|---|---|---|---|---|---|
2593.0 | 1153.6076923076923 | 1631.6303317535544 | 1780.0 | 1107.32 | 998.8842975206611 | 1894.875 | 3058.0 | 2685.4102564102564 | 1250.103448275862 | 1776.3430749682339 | 2386.259259259259 |
Mini Exercise:#
Calculate the count of people for each job role for jobs “blue_collar”, “technician”, “services”, “management”.
Your task: use Variable Expansion Using Loops and the %%sql
magic
Answers
%%sql --save count_jobs
{% set jobs = [ "blue_collar", "technician", "services", "management"] %}
select
{% for job in jobs %}
count(case when job = '{{job.replace('_', '-')}}' then 1 end) as count_{{job}},
{% endfor %}
from bank
count_blue_collar | count_technician | count_services | count_management |
---|---|---|---|
946 | 768 | 417 | 969 |
Macros + Variable Expansion#
Macros
is a construct analogous to functions that promote re-usability. We define a macro
using the {% macro %} tag and use this macro in the query using variable expansion.
For example, let’s define a macro to convert a value from days to years, and use this macro in our query. This macro will be called days_to_years
and it takes as input the column_name
and precision
. With the default precision
set to 2.
We can obtain the number of years by dividing the number of days by 365.
We can then use the days_to_years
macro in the query, where we are passing age
as the input column.
%%sql --save convert
{% macro days_to_years(column_name, precision=2) %}
({{ column_name }} / 365)::numeric(16, {{ precision }})
{% endmacro %}
select
job, marital,
{{ days_to_years('age') }} as age_in_years
from bank
job | marital | age_in_years |
---|---|---|
unemployed | married | 0.08 |
services | married | 0.09 |
management | single | 0.10 |
management | married | 0.08 |
blue-collar | married | 0.16 |
management | single | 0.10 |
self-employed | married | 0.10 |
technician | married | 0.11 |
entrepreneur | married | 0.11 |
services | married | 0.12 |
In this query, we’re converting age from days to years using our macro. The results are saved in a variable convert
.
The final rendered query can be viewed using:
final = %sqlrender convert
print(final)
select
job, marital,
(age / 365)::numeric(16, 2)
as age_in_years
from bank
/home/docs/checkouts/readthedocs.org/user_builds/ploomber-sql/conda/latest/lib/python3.11/site-packages/sql/magic.py:84: FutureWarning:
'%sqlrender' will be deprecated soon, please use '%sqlcmd snippets convert' instead.
For documentation, follow this link : https://jupysql.ploomber.io/en/latest/api/magic-snippets.html#id1
warnings.warn(
Sample usage#
Let’s use macros to handle NULL
values. The macro below checks if the chosen column has NULL
values.
The code block defines a macro that will be used later in the SQL query. The macro is named months_to_years_handle_null
, and its purpose is to convert the values in a specified column from months to years. Additionally, this macro is designed to handle null values in a column, hence its name.
If the value in the column is null, the macro will return NULL
; otherwise, it will convert the value from months to years.
%%sql --save convert_duration_handle_null
{% macro months_to_years_handle_null(column_name, precision=2) %}
(case when {{ column_name }} is not null then {{ column_name }} / 12 else null end)::numeric(16, {{ precision }})
{% endmacro %}
select
job, marital,
{{ months_to_years_handle_null('duration') }} as duration_in_years
from bank
job | marital | duration_in_years |
---|---|---|
unemployed | married | 6.58 |
services | married | 18.33 |
management | single | 15.42 |
management | married | 16.58 |
blue-collar | married | 18.83 |
management | single | 11.75 |
self-employed | married | 28.42 |
technician | married | 12.58 |
entrepreneur | married | 4.75 |
services | married | 26.08 |
final = %sqlrender convert_duration_handle_null
print(final)
select
job, marital,
(case when duration is not null then duration / 12 else null end)::numeric(16, 2)
as duration_in_years
from bank
/home/docs/checkouts/readthedocs.org/user_builds/ploomber-sql/conda/latest/lib/python3.11/site-packages/sql/magic.py:84: FutureWarning:
'%sqlrender' will be deprecated soon, please use '%sqlcmd snippets convert_duration_handle_null' instead.
For documentation, follow this link : https://jupysql.ploomber.io/en/latest/api/magic-snippets.html#id1
warnings.warn(
Mini exercise#
Use the months_to_years_handle_null
macro to iterate over the jobs
: “blue_collar”, “technician”, “services”, “management”. For each job in the loop, calculate the average duration for the specified job. Save this query as avg_duration
. Display the query.
Answers
This modification is using the months_to_years_handle_null
macro that you had defined earlier. This macro will convert duration from months to years for each job category, while also handling null values. The macro will apply to the average duration of each job category (where the job is equal to the current job in the loop). The output for each job category is then given an alias that indicates the job category and that it represents average duration in years.
Just like before, the code below will dynamically generate the SQL query, and the –save directive will save the query result into the avg_duration variable.
%%sql --save avg_duration
{% macro months_to_years_handle_null(column_name, precision=2) %}
(case when {{ column_name }} is not null then {{ column_name }} / 12 else null end)::numeric(16, {{ precision }})
{% endmacro %}
{% set jobs = [ "blue_collar", "technician", "services", "management"] %}
select
{% for job in jobs %}
{{ months_to_years_handle_null('avg(case when job = \'' + job.replace('_', '-') + '\' then duration end)') }} as avg_duration_{{job}},
{% endfor %}
from bank
avg_duration_blue_collar | avg_duration_technician | avg_duration_services | avg_duration_management |
---|---|---|---|
23.18 | 21.01 | 21.87 | 21.71 |
Display the query:
final = %sqlrender avg_duration
print(final)
select
(case when avg(case when job = 'blue-collar' then duration end) is not null then avg(case when job = 'blue-collar' then duration end) / 12 else null end)::numeric(16, 2)
as avg_duration_blue_collar,
(case when avg(case when job = 'technician' then duration end) is not null then avg(case when job = 'technician' then duration end) / 12 else null end)::numeric(16, 2)
as avg_duration_technician,
(case when avg(case when job = 'services' then duration end) is not null then avg(case when job = 'services' then duration end) / 12 else null end)::numeric(16, 2)
as avg_duration_services,
(case when avg(case when job = 'management' then duration end) is not null then avg(case when job = 'management' then duration end) / 12 else null end)::numeric(16, 2)
as avg_duration_management,
from bank
/home/docs/checkouts/readthedocs.org/user_builds/ploomber-sql/conda/latest/lib/python3.11/site-packages/sql/magic.py:84: FutureWarning:
'%sqlrender' will be deprecated soon, please use '%sqlcmd snippets avg_duration' instead.
For documentation, follow this link : https://jupysql.ploomber.io/en/latest/api/magic-snippets.html#id1
warnings.warn(
Creating multiple tables dynamically#
We’ll finish off this section by showing you how you can generate multiple tables dynamically.
Your task now is to create separate tables for each job. This can easily be accomplished using a loop and variable expansion:
%%sql
DROP TABLE IF EXISTS services;
DROP TABLE IF EXISTS management;
Success |
---|
jobs = ["services", "management"]
for job in jobs:
%sql CREATE TABLE {{job}} AS (SELECT * from bank WHERE job = '{{job}}')
Let’s check the tables in our schema:
%sqlcmd tables
Name |
---|
management |
services |
bank |
Delete table
%%sql
DROP TABLE bank;
DROP TABLE services;
DROP TABLE management;
Success |
---|
Conclusion#
In this module, we have explored how to parameterize SQL queries and effectively integrate with Python for a more interactive data analysis workflow within Jupyter notebooks. This method is facilitated using JupySQL and DuckDB. We’ve also demonstrated how to dynamically create and use variables within SQL queries, further enhancing the flexibility and interactivity of the notebooks.
We delved into the basics of variable expansion, using loops within SQL to iterate over arrays, and how to define and use macros to promote reusability and reduce redundancy. We further illustrated the potential of these techniques through several practical examples, including determining the top five oldest and youngest employees in various jobs, calculating average ages and balances by job and education level, handling NULL
values with macros
, and dynamically creating multiple tables.
Through these exercises, it has been made evident that integrating SQL into Jupyter notebooks provides an intuitive and powerful approach to data exploration and analysis. The techniques demonstrated in this module can be widely applied across various datasets, making it a useful skillset for data scientists and analysts. By merging the strengths of Python’s flexibility and SQL’s data management capabilities, we can create more powerful and interactive data workflows.
In the next section, you’ll learn how to combine paramaterization with ipywidgets
.