Connecting to Database Engines#
In this tutorial you will learn how to connect to various databases using JupySQL.
!pip install jupysql duckdb-engine -q
We shall start by importing all required libraries:
from os import environ
import urllib
Connect with a URL string#
Connection strings follow the SQLAlchemy URL format. This is the fastest way to connect to your database and the recommended way if you’re using SQLite or DuckDB.
Database URLs have the following format:
dialect+driver://username:password@host:port/database
Important
If you’re using a database that requires a password, keep reading for more secure methods.
Building URL strings securely#
To connect in a more secure way, you can dynamically build your URL string so your password isn’t hardcoded:
password = getpass.getpass()
When you execute the cell above in a notebook, a text box will appear and whatever you type will be stored in the password
variable.
Then, you can build your connection string:
db_url = f"postgresql://user:{password}@localhost/database"
Secure Connections#
It is highly recommended that you do not pass plain credentials.
Important
Unlike ipython-sql
, JupySQL doesn’t allow expanding your database URL with the $
character, we use {{}}
instead:
db_url = f"dialect+driver://username:{password}@host:port/database"
%sql {{db_url}}
Securely storing your password#
If you want to store your password securely (and don’t get prompted whenever you start a connection), you can use keyring:
%pip install keyring --quiet
Execute the following in your notebook:
import keyring
keyring.set_password("my_database", "my_username", "my_password")
Then, delete the cell above (so your password isn’t hardcoded!). Now, you can retrieve your password with:
password = keyring.get_password("my_database", "my_username")
You can then connect to the database using JupySQL
db_url = f"postgresql://user:{password}@localhost/database"
%sql {{db_url}}
Tip
If you have issues using keyring
, send us a message on Slack.
Passing custom arguments to a URL#
Connection arguments not whitelisted by SQLALchemy can be provided with --connection_arguments
. See SQLAlchemy Args.
Here’s an example using SQLite:
%reload_ext sql
%sql --connection_arguments '{"timeout":10}' sqlite://
Connecting via an environment variable#
You can create a local .env
file with a db_password
variable and use python-dotenv
to load it to your environment.
Set the DATABASE_URL
environment variable, and %sql
will automatically load it. You can do this either by setting the environment variable from your terminal or in your notebook:
from dotenv import load_dotenv
load_dotenv(".env")
password = os.environ.get("db_password")
environ["DATABASE_URL"] = f"postgresql://user:{password}@localhost/database"
%reload_ext sql
%sql
Custom Connection#
If you are using a database that is not supported by SQLAlchemy but follows the DB API 2.0 specification, you can still use JupySQL.
Note
We currently support %sql
, %sqlplot
, and the ggplot
API
when using custom connection.
However, please be advised that there may be some
features/functionalities that won’t be fully compatible with JupySQL.
For this example we’ll generate a DuckDB
connection, using its native connect
method.
Now, load %sql
and initialize it with our DuckDB connection.
%reload_ext sql
%sql duckdb://
Config | value |
---|---|
displaycon | False |
feedback | True |
autopandas | False |
named_parameters | True |
Download some data.
urllib.request.urlretrieve(
"https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv", # noqa
"penguins.csv",
)
('penguins.csv', <http.client.HTTPMessage at 0x7f580a904050>)
You’re all set!
%sql select * from penguins.csv limit 3
species | island | bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g | sex |
---|---|---|---|---|---|---|
Adelie | Torgersen | 39.1 | 18.7 | 181 | 3750 | MALE |
Adelie | Torgersen | 39.5 | 17.4 | 186 | 3800 | FEMALE |
Adelie | Torgersen | 40.3 | 18.0 | 195 | 3250 | FEMALE |
Use JupySQL to perform the queries and answer the questions.#
Question 1 (Easy):#
Load a CSV file into a DuckDB instance. The Bonus section can help you with this.
Show Answers
Recall that a connection string has the following format:
dialect+driver://username:password@host:port/database
To connect to a DuckDB database, you can use the %sql
magic command the appropriate duckdb://
URL string:
%sql duckdb://
Download CSV data from GitHub:
urllib.request.urlretrieve(
"https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv", # noqa
"penguins.csv",
)
('penguins.csv', <http.client.HTTPMessage at 0x7f580a878190>)
You’re all set!
%sql select * from penguins.csv limit 3
species | island | bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g | sex |
---|---|---|---|---|---|---|
Adelie | Torgersen | 39.1 | 18.7 | 181 | 3750 | MALE |
Adelie | Torgersen | 39.5 | 17.4 | 186 | 3800 | FEMALE |
Adelie | Torgersen | 40.3 | 18.0 | 195 | 3250 | FEMALE |
Question 2 (Medium):#
Write a code snippet to establish a secure connection for a PostgreSQL database by using a connection string, the get_pass()
function, and by creating an engine.
Show Answers
To securely connect to a PostgreSQL database, you can use the getpass
function from the getpass
module to prompt the user for a password. This way, the password is not hardcoded in the notebook.
import getpass
password = getpass()
Then, you can build your connection string:
db_url = f"postgresql://user:{password}@localhost/database" #noqa
Create an engine and connect:
engine = create_engine(db_url)
Question 3 (Hard):#
If you have a database that is not supported by SQLAlchemy but follows the DB API 2.0 specification, how can you still use JupySQL?
Show Answers
The answer is using a Custom Connection. For this example, we’ll generate a SQLite
connection, using its native connect
method, and a custom table to query from.
First, let’s import the library and create a new database connection to our custom table, my_numbers
.
with sqlite3.connect("a.db") as conn: # noqa
conn.execute("DROP TABLE IF EXISTS my_numbers") # noqa
conn.execute("CREATE TABLE my_numbers (number FLOAT)") # noqa
conn.execute("INSERT INTO my_numbers VALUES (1)") # noqa
conn.execute("INSERT INTO my_numbers VALUES (2)") # noqa
conn.execute("INSERT INTO my_numbers VALUES (3)") # noqa
Next, load %sql
and create a schema, a_schema
, for the table.
%%sql
ATTACH DATABASE 'a.db' AS a_schema
You’re all set!
%sql select * from a_schema.my_numbers limit 3
Bonus#
In-memory Database with DuckDB#
Although URL-based connections are more secure, can handle various types of workloads, and offer more functionality, in-memory databases are a great option for quick querying and testing. In this tutorial, we’ll use DuckDB to create an in-memory database with JupySQL.
The first step is to install the dependencies:
%pip install jupysql duckdb duckdb-engine --quiet
Note: you may need to restart the kernel to use updated packages.
Then, load the ipython-sql library using the %load_ext
iPython extension syntax and connect to the database:
%load_ext sql
Finally, load %sql
and initialize the database:
%sql duckdb://
Download some data:
urllib.request.urlretrieve(
"https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv", # noqa
"penguins.csv",
)
('penguins.csv', <http.client.HTTPMessage at 0x7f580a884050>)
You’re all set!
%sql select * from penguins.csv limit 3
species | island | bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g | sex |
---|---|---|---|---|---|---|
Adelie | Torgersen | 39.1 | 18.7 | 181 | 3750 | MALE |
Adelie | Torgersen | 39.5 | 17.4 | 186 | 3800 | FEMALE |
Adelie | Torgersen | 40.3 | 18.0 | 195 | 3250 | FEMALE |
Databases Supported by JupySQL#
Check out our guides for connecting to supported databases: