Skip to main content
Version: Next

Dagster & Azure Data Lake Storage Gen 2

Dagster helps you use Azure Storage Accounts as part of your data pipeline. Azure Data Lake Storage Gen 2 (ADLS2) is our primary focus but we also provide utilities for Azure Blob Storage.


pip install dagster-azure


import pandas as pd
from dagster_azure.adls2 import ADLS2Resource, ADLS2SASToken

import dagster as dg

def example_adls2_asset(adls2: ADLS2Resource):
df = pd.DataFrame({"column1": [1, 2, 3], "column2": ["A", "B", "C"]})

csv_data = df.to_csv(index=False)

file_client = adls2.adls2_client.get_file_client(
"my-file-system", "path/to/my_dataframe.csv"
file_client.upload_data(csv_data, overwrite=True)

defs = dg.Definitions(
"adls2": ADLS2Resource(

In this updated code, we use ADLS2Resource directly instead of adls2_resource. The configuration is passed to ADLS2Resource during its instantiation.

About Azure Data Lake Storage Gen 2 (ADLS2)

Azure Data Lake Storage Gen 2 (ADLS2) is a set of capabilities dedicated to big data analytics, built on Azure Blob Storage. ADLS2 combines the scalability, cost-effectiveness, security, and rich capabilities of Azure Blob Storage with a high-performance file system that's built for analytics and is compatible with the Hadoop Distributed File System (HDFS). This makes it an ideal choice for data lakes and big data analytics.