Skip to main content

Analyzing Bluesky data

In this example, you'll build a pipeline with Dagster that:

  • Ingestion of data-related Bluesky posts
  • Modelling data using dbt
  • Creates and validates the data files needed for an OpenAI fine-tuning job
  • Representing data in a dashboard
Prerequisites

To follow the steps in this guide, you'll need:

  • Basic Python knowledge
  • Python 3.9+ installed on your system. Refer to the Installation guide for information.
  • Understanding of data pipelines and the extract, transform, and load process (ETL).
  • Familiar with dbt and data transformation.
  • Usage of BI tools for dashboards.

Step 1: Set up your Dagster environment

First, set up a new Dagster project.

  1. Clone the Dagster repo and navigate to the project:

    cd examples/docs_project/project_atproto_dashboard
  2. Create and activate a virtual environment:

    uv venv dagster_example
    source dagster_example/bin/activate
  3. Install Dagster and the required dependencies:

    uv pip install -e ".[dev]"
  4. Ensure the following environments have been populated in your .env file. Start by copying the template:

    cp .env.example .env

    And then populate the fields.

Step 2: Launch the Dagster webserver

To make sure Dagster and its dependencies were installed correctly, navigate to the project root directory and start the Dagster webserver:

followed by a bash code snippet for

dagster dev

Next steps