Sample Ploomber Cloud Deployment

Sample Ploomber Cloud Deployment#

In this blog we will explore how we can deploy Python applications with Ploomber Cloud and GitHub actions. We will use a sample project to demonstrate the process. Imagine you need to extract weather data from an API periodically and store it for analysis. You can achieve this by creating a Python script and scheduling its execution with GitHub Actions.

Important

Note: Please ensure you have reviewed the deployment with Ploomber Cloud and GitHub Actions tutorials before proceeding.

Initialize data extraction and data loading script#

This script defines functions to extract weather data, transform it into a DataFrame, and populates a Motherduck instance. To review how to initialize a MotherDuck instance, visit the MotherDuck documentation. You will need to create an account and generate a token.

The executable script takes as input a list of coordinates and extracts weather data for each coordinate. The script then concatenates all the data frames and saves the result into a CSV file. The CSV file is then uploaded to a Motherduck instance.

You can create a Python script called dataextraction.py with the following code. The code has four functions: extract_weather_by_lat_lon, transform_json_to_dataframe, extraction_df_lat_lon, and save_to_motherduck.

The extract_weather_by_lat_lon function extracts weather data from the RapidAPI.
The transform_json_to_dataframe function transforms the JSON response to a DataFrame.
The extraction_df_lat_lon function extracts weather data from the RapidAPI and transforms it to a DataFrame.
The save_to_motherduck function saves the DataFrame to a Motherduck instance.

The main function of the script is the __main__ function. It loads the API key from an environment variable, extracts weather data for a list of coordinates, concatenates the dataframes, and saves the result to a CSV file. The CSV file is then uploaded to a Motherduck instance.

Show code cell source Hide code cell source

import requests
import pandas as pd
import duckdb


def extract_weather_by_lat_lon(api_key, lat, lon):
    """
    Extracts weather data from RapidAPI

    Parameters
    ----------
    api_key : str
        API key for RapidAPI
    lat : float
        Latitude
    lon : float
        Longitude
    """
    try:
        # Perform call
        url = "https://weatherapi-com.p.rapidapi.com/forecast.json"

        querystring = {"q": f"{lat},{lon}", "days": "5"}

        headers = {
            "X-RapidAPI-Key": api_key,
            "X-RapidAPI-Host": "weatherapi-com.p.rapidapi.com",
        }

        response = requests.get(url, headers=headers, params=querystring)
        response.raise_for_status()
        return response.json()
    except requests.exceptions.HTTPError as e:
        print(e.response.text)
        return {}


def transform_json_to_dataframe(response):
    """
    Transforms JSON response to dataframe

    Parameters
    ----------
    response : dict
        Response from API call
    """
    try:
        unnested_df = pd.json_normalize(
            response,
            record_path=["forecast", "forecastday", "hour"],
            meta=["location", "current"],
        )
        unnested_df.drop(columns=["location", "current"], inplace=True)

        location_df = pd.json_normalize(response["location"])

        for col_name in location_df.columns:
            unnested_df[col_name] = location_df[col_name][0]

        return unnested_df

    except KeyError as e:
        print("Key Error:", e)
        return pd.DataFrame()
    except Exception as e:
        print("Other Error:", e)
        return pd.DataFrame()


def extraction_df_lat_lon(api_key, lat, lon):
    """
    Extracts weather data from RapidAPI and transforms it to a dataframe

    Parameters
    ----------
    api_key : str
        API key for RapidAPI
    lat : float
    lon : float

    Returns
    -------
    df : pandas.DataFrame
        Weather data

    """
    response = extract_weather_by_lat_lon(api_key, lat, lon)
    return transform_json_to_dataframe(response)


def save_to_motherduck(df, motherduck):
    """
    Saves dataframe to MotherDuck

    Parameters
    ----------
    df : pandas.DataFrame
        Dataframe to save
    motherduck : str
        MotherDuck service token
    """
    try:
        # Save to csv
        df.to_csv("weather_data.csv", index=False)

        # initiate the MotherDuck connection through a service token through
        con = duckdb.connect(f"md:?motherduck_token={motherduck}")

        # Delete table weatherdata if exists
        con.execute("DROP TABLE IF EXISTS weatherdata")

        # Create table weatherdata
        con.sql("CREATE TABLE weatherdata AS SELECT * FROM 'weather_data.csv'")

    except Exception as e:
        print("Error:", e)

To download weather data for different locations, you can replace the latitude and longitude coordinates in the script. The following locations were used in the sample script using the coordinates corresponding to the cities listed below:

Continent	Cities
North America	New York City, Los Angeles , Toronto
South America	São Paulo ,Buenos Aires, Bogotá
Europe	London, Paris, Berlin
Asia	Tokyo , Beijing,Mumbai
Africa	Cairo ,Lagos, Johannesburg
Australia	Sydney , Melbourne , Brisbane

import os 
from dotenv import load_dotenv 
import pandas as pd 
from dataextraction import extraction_df_lat_lon, save_to_motherduck 
import duckdb 

if __name__ == "__main__":
    # Load api key
    load_dotenv()
    api_key = os.getenv("RapidAPI")
    motherduck = os.getenv("motherduck")
    
   
    # Extract data
    latitudes = [
        40.7128,
        34.0522,
        43.6532,
        -23.5505,
        -34.6037,
        4.7110,
        51.5074,
        48.8566,
        52.5200,
        35.6762,
        39.9042,
        19.0760,
        30.0444,
        6.5244,
        -26.2041,
        -33.8688,
        -37.8136,
        -27.4698,
    ]
    longitudes = [
        -74.0060,
        -118.2437,
        -79.3832,
        -46.6333,
        -58.3816,
        -74.0721,
        -0.1278,
        2.3522,
        13.4050,
        139.6503,
        116.4074,
        72.8777,
        31.2357,
        3.3792,
        28.0473,
        151.2093,
        144.9631,
        153.0251,
    ]
    master_list = []
    for lat, lon in zip(latitudes, longitudes):
        master_list.append(extraction_df_lat_lon(api_key, lat, lon))

    # Concatenate all data frames
    df = pd.concat(master_list)

    # Save to MotherDuck
    save_to_motherduck(df, motherduck)

Once you have created the script, you can run it to extract the data. You can also schedule its execution with GitHub Actions.

Visualize the data#

Let’s visualize the data to see what it looks like. We will use the Plotly package. For the purpose of the blog, we read the CSV file, to see what loading the data directly from MotherDuck, please review this notebook.

import pandas as pd  # noqa E402
import plotly.express as px  # noqa E402


df = pd.read_csv("weather.csv")

fig = px.scatter_geo(
    df,
    lat="lat",
    lon="lon",
    color="region",
    hover_name="country",
    size="wind_kph",
    animation_frame="time",
    projection="natural earth",
    title="Wind forecast (next 5 days) in kph for cities in the world",
)

fig.show()

The code above creates an interactive map that shows the wind forecast for the next 5 days for the cities in the world. Press the Play > button to see the animation. Let’s save this into a notebook called app.ipynb.

Create a GitHub repository and initialize Ploomber Cloud deployment#

Create a GitHub repository and add the Python script and Jupyter notebook to it. You can also add a README file to describe your project.

Next, create a Ploomber Cloud account and initialize the deployment. You can do this by running the following command in your terminal:

ploomber cloud init

This will generate a ploomber-cloud.json file. This file contains the configuration for your deployment. You can edit this file to add more information about your deployment. We will create a Dockerfile for our application.

{
    "id": "generated-id",
    "type": "docker"
}

Let’s take a look at the Dockerfile. For our deployment we will assume that we are using a Python 3.11 image. We will copy the Jupyter notebook and the .env file containing our RapidAPI and MotherDuck tokens to the image. We will install the dependencies and configure the entrypoint.

FROM python:3.11

# Copy all files 
COPY app.ipynb app.ipynb
COPY .env .env

# install dependencies
RUN pip install voila==0.5.1 pandas==2.0.3 plotly python-dotenv requests duckdb==v0.9.2

# this configuration is needed for your app to work, do not change it
ENTRYPOINT ["voila", "app.ipynb","--port 5000:80"]

To deploy this from the terminal, we simply run

ploomber cloud deploy

This will build the image and push it to the Ploomber Cloud registry. You can see the status of your deployment in the Ploomber Cloud dashboard.

Create a GitHub workflow#

The following action is triggered every day at midnight. It runs the Python script to extract the data and deploys the application to Ploomber Cloud. It assumes we have stored our RapidAPI and MotherDuck tokens as GitHub secrets.

name: Ploomber cloud deploy,en

on:
  schedule:
    - cron: '0 0 * * *' 

jobs:

  build:

    runs-on: ubuntu-latest

    steps:
    - uses: actions/checkout@v3
    - name: Set up Python 3.10
      uses: actions/setup-python@v3
      with:
        python-version: "3.10"
    - name: Set up credentials
      run: |
          cd mini-projects/end-to-end/
          touch .env
          echo RapidAPI=${{ secrets.RapidAPI }} >> .env
          echo motherduck=${{ secrets.motherduck }} >> .env
    - name: Install dependencies
      run: |
          cd mini-projects/end-to-end/
          pip install -r requirements.txt
    - name: Execute data download
      run: |
          cd mini-projects/end-to-end/
          python dataextraction.py
    - name: Deploy to Ploomber cloud
      run: |
          cd mini-projects/end-to-end/
          ploomber-cloud deploy

Conclusion#

In this blog we explored how to deploy Python applications with Ploomber Cloud and GitHub actions. We used a sample project to demonstrate the process. We created a Python script to extract weather data from an API and load it into a Motherduck instance. We then created a Jupyter notebook to visualize the data. We created a GitHub repository and initialized the deployment with Ploomber Cloud. We created a GitHub workflow to run the Python script and deploy the application to Ploomber Cloud.