Connecting to Database Engines

Connecting to Database Engines#

In this tutorial you will learn how to connect to various databases using JupySQL.

!pip install jupysql duckdb-engine -q

We shall start by importing all required libraries:

from os import environ
import urllib

Connect with a URL string#

Connection strings follow the SQLAlchemy URL format. This is the fastest way to connect to your database and the recommended way if you’re using SQLite or DuckDB.

Database URLs have the following format:

dialect+driver://username:password@host:port/database

Important

If you’re using a database that requires a password, keep reading for more secure methods.

Building URL strings securely#

To connect in a more secure way, you can dynamically build your URL string so your password isn’t hardcoded:

password = getpass.getpass()

When you execute the cell above in a notebook, a text box will appear and whatever you type will be stored in the password variable.

Then, you can build your connection string:

db_url = f"postgresql://user:{password}@localhost/database"

Secure Connections#

It is highly recommended that you do not pass plain credentials.

Important

Unlike ipython-sql, JupySQL doesn’t allow expanding your database URL with the $ character, we use {{}} instead:

db_url = f"dialect+driver://username:{password}@host:port/database"
%sql {{db_url}}

Securely storing your password#

If you want to store your password securely (and don’t get prompted whenever you start a connection), you can use keyring:

%pip install keyring --quiet

Execute the following in your notebook:

import keyring
keyring.set_password("my_database", "my_username", "my_password")

Then, delete the cell above (so your password isn’t hardcoded!). Now, you can retrieve your password with:

password = keyring.get_password("my_database", "my_username")

You can then connect to the database using JupySQL

db_url = f"postgresql://user:{password}@localhost/database"
%sql {{db_url}}

Tip

If you have issues using keyring, send us a message on Slack.

Passing custom arguments to a URL#

Connection arguments not whitelisted by SQLALchemy can be provided with --connection_arguments. See SQLAlchemy Args.

Here’s an example using SQLite:

%reload_ext sql

%sql --connection_arguments '{"timeout":10}' sqlite://

Connecting via an environment variable#

You can create a local .env file with a db_password variable and use python-dotenv to load it to your environment.

Set the DATABASE_URL environment variable, and %sql will automatically load it. You can do this either by setting the environment variable from your terminal or in your notebook:

from dotenv import load_dotenv
load_dotenv(".env")
password = os.environ.get("db_password")
environ["DATABASE_URL"] = f"postgresql://user:{password}@localhost/database"

%reload_ext sql

%sql

Custom Connection#

If you are using a database that is not supported by SQLAlchemy but follows the DB API 2.0 specification, you can still use JupySQL.

Note

We currently support %sql, %sqlplot, and the ggplot API when using custom connection. However, please be advised that there may be some features/functionalities that won’t be fully compatible with JupySQL.

For this example we’ll generate a DuckDB connection, using its native connect method.

Now, load %sql and initialize it with our DuckDB connection.

%reload_ext sql
%sql duckdb://

Found pyproject.toml from '/home/docs/checkouts/readthedocs.org/user_builds/ploomber-sql/checkouts/latest'

Settings changed:

Config	value
displaycon	False
feedback	True
autopandas	False
named_parameters	True

Connecting and switching to connection duckdb://

Download some data.

urllib.request.urlretrieve(
    "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv",  # noqa
    "penguins.csv",
)

('penguins.csv', <http.client.HTTPMessage at 0x7f580a904050>)

You’re all set!

%sql select * from penguins.csv limit 3

species	island	bill_length_mm	bill_depth_mm	flipper_length_mm	body_mass_g	sex
Adelie	Torgersen	39.1	18.7	181	3750	MALE
Adelie	Torgersen	39.5	17.4	186	3800	FEMALE
Adelie	Torgersen	40.3	18.0	195	3250	FEMALE

Use JupySQL to perform the queries and answer the questions.#

Question 1 (Easy):#

Load a CSV file into a DuckDB instance. The Bonus section can help you with this.

Show Answers

Recall that a connection string has the following format:

dialect+driver://username:password@host:port/database

To connect to a DuckDB database, you can use the %sql magic command the appropriate duckdb:// URL string:

%sql duckdb://

Download CSV data from GitHub:

urllib.request.urlretrieve(
    "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv",  # noqa
    "penguins.csv",
)

('penguins.csv', <http.client.HTTPMessage at 0x7f580a878190>)

You’re all set!

%sql select * from penguins.csv limit 3

species	island	bill_length_mm	bill_depth_mm	flipper_length_mm	body_mass_g	sex
Adelie	Torgersen	39.1	18.7	181	3750	MALE
Adelie	Torgersen	39.5	17.4	186	3800	FEMALE
Adelie	Torgersen	40.3	18.0	195	3250	FEMALE

Question 2 (Medium):#

Write a code snippet to establish a secure connection for a PostgreSQL database by using a connection string, the get_pass() function, and by creating an engine.

Show Answers

To securely connect to a PostgreSQL database, you can use the getpass function from the getpass module to prompt the user for a password. This way, the password is not hardcoded in the notebook.

import getpass
password = getpass()

Then, you can build your connection string:

db_url = f"postgresql://user:{password}@localhost/database" #noqa

Create an engine and connect:

engine = create_engine(db_url)

Question 3 (Hard):#

If you have a database that is not supported by SQLAlchemy but follows the DB API 2.0 specification, how can you still use JupySQL?

Show Answers

The answer is using a Custom Connection. For this example, we’ll generate a SQLite connection, using its native connect method, and a custom table to query from.

First, let’s import the library and create a new database connection to our custom table, my_numbers.

with sqlite3.connect("a.db") as conn:  # noqa
    conn.execute("DROP TABLE IF EXISTS my_numbers")  # noqa
    conn.execute("CREATE TABLE my_numbers (number FLOAT)")  # noqa
    conn.execute("INSERT INTO my_numbers VALUES (1)")  # noqa
    conn.execute("INSERT INTO my_numbers VALUES (2)")  # noqa
    conn.execute("INSERT INTO my_numbers VALUES (3)")  # noqa

Next, load %sql and create a schema, a_schema, for the table.

%%sql
ATTACH DATABASE 'a.db' AS a_schema

You’re all set!

%sql select * from a_schema.my_numbers limit 3

Bonus#

In-memory Database with DuckDB#

Although URL-based connections are more secure, can handle various types of workloads, and offer more functionality, in-memory databases are a great option for quick querying and testing. In this tutorial, we’ll use DuckDB to create an in-memory database with JupySQL.

The first step is to install the dependencies:

%pip install jupysql duckdb duckdb-engine --quiet

Note: you may need to restart the kernel to use updated packages.

Then, load the ipython-sql library using the %load_ext iPython extension syntax and connect to the database:

%load_ext sql

Finally, load %sql and initialize the database:

%sql duckdb://

Download some data:

urllib.request.urlretrieve(
    "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv",  # noqa
    "penguins.csv",
)

('penguins.csv', <http.client.HTTPMessage at 0x7f580a884050>)

You’re all set!

%sql select * from penguins.csv limit 3

species	island	bill_length_mm	bill_depth_mm	flipper_length_mm	body_mass_g	sex
Adelie	Torgersen	39.1	18.7	181	3750	MALE
Adelie	Torgersen	39.5	17.4	186	3800	FEMALE
Adelie	Torgersen	40.3	18.0	195	3250	FEMALE

Databases Supported by JupySQL#

Check out our guides for connecting to supported databases:

Connecting to Database Engines

Contents

Connecting to Database Engines#

Connect with a URL string#

Building URL strings securely#

Secure Connections#

Securely storing your password#

Passing custom arguments to a URL#

Connecting via an environment variable#

Custom Connection#

Use JupySQL to perform the queries and answer the questions.#

Question 1 (Easy):#

Question 2 (Medium):#

Question 3 (Hard):#

Bonus#

In-memory Database with DuckDB#

Databases Supported by JupySQL#