Sample Ploomber Cloud Deployment#

In this blog we will explore how we can deploy Python applications with Ploomber Cloud and GitHub actions. We will use a sample project to demonstrate the process. Imagine you need to extract weather data from an API periodically and store it for analysis. You can achieve this by creating a Python script and scheduling its execution with GitHub Actions.

Important

Note: Please ensure you have reviewed the deployment with Ploomber Cloud and GitHub Actions tutorials before proceeding.

Initialize data extraction and data loading script#

This script defines functions to extract weather data, transform it into a DataFrame, and populates a Motherduck instance. To review how to initialize a MotherDuck instance, visit the MotherDuck documentation. You will need to create an account and generate a token.

The executable script takes as input a list of coordinates and extracts weather data for each coordinate. The script then concatenates all the data frames and saves the result into a CSV file. The CSV file is then uploaded to a Motherduck instance.

You can create a Python script called dataextraction.py with the following code. The code has four functions: extract_weather_by_lat_lon, transform_json_to_dataframe, extraction_df_lat_lon, and save_to_motherduck.

  • The extract_weather_by_lat_lon function extracts weather data from the RapidAPI.

  • The transform_json_to_dataframe function transforms the JSON response to a DataFrame.

  • The extraction_df_lat_lon function extracts weather data from the RapidAPI and transforms it to a DataFrame.

  • The save_to_motherduck function saves the DataFrame to a Motherduck instance.

The main function of the script is the __main__ function. It loads the API key from an environment variable, extracts weather data for a list of coordinates, concatenates the dataframes, and saves the result to a CSV file. The CSV file is then uploaded to a Motherduck instance.

Hide code cell source
import requests
import pandas as pd
import duckdb


def extract_weather_by_lat_lon(api_key, lat, lon):
    """
    Extracts weather data from RapidAPI

    Parameters
    ----------
    api_key : str
        API key for RapidAPI
    lat : float
        Latitude
    lon : float
        Longitude
    """
    try:
        # Perform call
        url = "https://weatherapi-com.p.rapidapi.com/forecast.json"

        querystring = {"q": f"{lat},{lon}", "days": "5"}

        headers = {
            "X-RapidAPI-Key": api_key,
            "X-RapidAPI-Host": "weatherapi-com.p.rapidapi.com",
        }

        response = requests.get(url, headers=headers, params=querystring)
        response.raise_for_status()
        return response.json()
    except requests.exceptions.HTTPError as e:
        print(e.response.text)
        return {}


def transform_json_to_dataframe(response):
    """
    Transforms JSON response to dataframe

    Parameters
    ----------
    response : dict
        Response from API call
    """
    try:
        unnested_df = pd.json_normalize(
            response,
            record_path=["forecast", "forecastday", "hour"],
            meta=["location", "current"],
        )
        unnested_df.drop(columns=["location", "current"], inplace=True)

        location_df = pd.json_normalize(response["location"])

        for col_name in location_df.columns:
            unnested_df[col_name] = location_df[col_name][0]

        return unnested_df

    except KeyError as e:
        print("Key Error:", e)
        return pd.DataFrame()
    except Exception as e:
        print("Other Error:", e)
        return pd.DataFrame()


def extraction_df_lat_lon(api_key, lat, lon):
    """
    Extracts weather data from RapidAPI and transforms it to a dataframe

    Parameters
    ----------
    api_key : str
        API key for RapidAPI
    lat : float
    lon : float

    Returns
    -------
    df : pandas.DataFrame
        Weather data

    """
    response = extract_weather_by_lat_lon(api_key, lat, lon)
    return transform_json_to_dataframe(response)


def save_to_motherduck(df, motherduck):
    """
    Saves dataframe to MotherDuck

    Parameters
    ----------
    df : pandas.DataFrame
        Dataframe to save
    motherduck : str
        MotherDuck service token
    """
    try:
        # Save to csv
        df.to_csv("weather_data.csv", index=False)

        # initiate the MotherDuck connection through a service token through
        con = duckdb.connect(f"md:?motherduck_token={motherduck}")

        # Delete table weatherdata if exists
        con.execute("DROP TABLE IF EXISTS weatherdata")

        # Create table weatherdata
        con.sql("CREATE TABLE weatherdata AS SELECT * FROM 'weather_data.csv'")

    except Exception as e:
        print("Error:", e)

To download weather data for different locations, you can replace the latitude and longitude coordinates in the script. The following locations were used in the sample script using the coordinates corresponding to the cities listed below:

Continent

Cities

North America

New York City, Los Angeles , Toronto

South America

São Paulo ,Buenos Aires, Bogotá

Europe

London, Paris, Berlin

Asia

Tokyo , Beijing,Mumbai

Africa

Cairo ,Lagos, Johannesburg

Australia

Sydney , Melbourne , Brisbane

import os 
from dotenv import load_dotenv 
import pandas as pd 
from dataextraction import extraction_df_lat_lon, save_to_motherduck 
import duckdb 

if __name__ == "__main__":
    # Load api key
    load_dotenv()
    api_key = os.getenv("RapidAPI")
    motherduck = os.getenv("motherduck")
    
   
    # Extract data
    latitudes = [
        40.7128,
        34.0522,
        43.6532,
        -23.5505,
        -34.6037,
        4.7110,
        51.5074,
        48.8566,
        52.5200,
        35.6762,
        39.9042,
        19.0760,
        30.0444,
        6.5244,
        -26.2041,
        -33.8688,
        -37.8136,
        -27.4698,
    ]
    longitudes = [
        -74.0060,
        -118.2437,
        -79.3832,
        -46.6333,
        -58.3816,
        -74.0721,
        -0.1278,
        2.3522,
        13.4050,
        139.6503,
        116.4074,
        72.8777,
        31.2357,
        3.3792,
        28.0473,
        151.2093,
        144.9631,
        153.0251,
    ]
    master_list = []
    for lat, lon in zip(latitudes, longitudes):
        master_list.append(extraction_df_lat_lon(api_key, lat, lon))

    # Concatenate all data frames
    df = pd.concat(master_list)

    # Save to MotherDuck
    save_to_motherduck(df, motherduck)

Once you have created the script, you can run it to extract the data. You can also schedule its execution with GitHub Actions.

Visualize the data#

Let’s visualize the data to see what it looks like. We will use the Plotly package. For the purpose of the blog, we read the CSV file, to see what loading the data directly from MotherDuck, please review this notebook.

import pandas as pd  # noqa E402
import plotly.express as px  # noqa E402


df = pd.read_csv("weather.csv")

fig = px.scatter_geo(
    df,
    lat="lat",
    lon="lon",
    color="region",
    hover_name="country",
    size="wind_kph",
    animation_frame="time",
    projection="natural earth",
    title="Wind forecast (next 5 days) in kph for cities in the world",
)

fig.show()