dbt (dagster_dbt)

This library provides a Dagster integration with dbt (data build tool), created by Fishtown Analytics.

CLI

dagster_dbt.dbt_cli_compile(*args, **kwargs)[source]

This solid executes dbt compile via the dbt CLI.

dagster_dbt.dbt_cli_run(*args, **kwargs)[source]

This solid executes dbt run via the dbt CLI.

dagster_dbt.dbt_cli_run_operation(*args, **kwargs)[source]

This solid executes dbt run-operation via the dbt CLI.

dagster_dbt.dbt_cli_snapshot(*args, **kwargs)[source]

This solid executes dbt snapshot via the dbt CLI.

dagster_dbt.dbt_cli_snapshot_freshness(*args, **kwargs)[source]

This solid executes dbt source snapshot-freshness via the dbt CLI.

dagster_dbt.dbt_cli_test(*args, **kwargs)[source]

This solid executes dbt test via the dbt CLI.

class dagster_dbt.DbtCliOutput[source]

The results of executing a dbt command, along with additional metadata about the dbt CLI process that was run.

Note that users should not construct instances of this class directly. This class is intended to be constructed from the JSON output of dbt commands.

If the executed dbt command is either run or test, then the .num_* attributes will contain non-None integer values. Otherwise, they will be None.

command

The full shell command that was executed.

Type

str

return_code

The return code of the dbt CLI process.

Type

int

raw_output

The raw output (stdout) of the dbt CLI process.

Type

str

num_pass

The number of dbt nodes (models) that passed.

Type

Optional[int]

num_warn

The number of dbt nodes (models) that emitted warnings.

Type

Optional[int]

num_error

The number of dbt nodes (models) that emitted errors.

Type

Optional[int]

num_skip

The number of dbt nodes (models) that were skipped.

Type

Optional[int]

num_total

The total number of dbt nodes (models) that were processed.

Type

Optional[int]

classmethod from_dict(d: Dict[str, Any]) → dagster_dbt.cli.types.DbtCliOutput[source]

Constructs an instance of DbtCliOutput from a dictionary.

Parameters

d (Dict[str, Any]) – A dictionary with key-values to construct a DbtCliOutput.

Returns

An instance of DbtCliOutput.

Return type

DbtCliOutput

RPC

dagster_dbt.create_dbt_rpc_run_sql_solid(name: str, output_def: Optional[dagster.core.definitions.output.OutputDefinition] = None, **kwargs) → Callable[source]

This function is a factory which constructs a solid that will copy the results of a SQL query run within the context of a dbt project to a pandas DataFrame.

Any kwargs passed to this function will be passed along to the underlying @solid decorator. However, note that overriding config_schema, input_defs, and required_resource_keys is not allowed and will throw a DagsterInvalidDefinitionError.

If you would like to configure this solid with different config fields, you could consider using @composite_solid to wrap this solid.

Parameters
  • name (str) – The name of this solid.

  • output_def (OutputDefinition, optional) – The OutputDefinition for the solid. This value should always be a representation of a pandas DataFrame. If not specified, the solid will default to an OutputDefinition named “df” with a DataFrame dagster type.

Returns

Returns the constructed solid definition.

Return type

SolidDefinition

dagster_dbt.dbt_rpc_compile_sql(*args, **kwargs)[source]

This solid sends the dbt compile command to a dbt RPC server and returns the request token.

This dbt RPC solid is asynchronous. The request token can be used in subsequent RPC requests to poll the progress of the running dbt process.

dagster_dbt.dbt_rpc_run(*args, **kwargs)[source]

This solid sends the dbt run command to a dbt RPC server and returns the request token.

This dbt RPC solid is asynchronous. The request token can be used in subsequent RPC requests to poll the progress of the running dbt process.

dagster_dbt.dbt_rpc_run_and_wait(*args, **kwargs)[source]

This solid sends the dbt run command to a dbt RPC server and returns the result of the executed dbt process.

This dbt RPC solid is synchronous, and will periodically poll the dbt RPC server until the dbt process is completed.

dagster_dbt.dbt_rpc_run_operation(*args, **kwargs)[source]

This solid sends the dbt run-operation command to a dbt RPC server and returns the request token.

This dbt RPC solid is asynchronous. The request token can be used in subsequent RPC requests to poll the progress of the running dbt process.

dagster_dbt.dbt_rpc_run_operation_and_wait(*args, **kwargs)[source]

This solid sends the dbt run-operation command to a dbt RPC server and returns the result of the executed dbt process.

This dbt RPC solid is synchronous, and will periodically poll the dbt RPC server until the dbt process is completed.

dagster_dbt.dbt_rpc_snapshot(*args, **kwargs)[source]

This solid sends the dbt snapshot command to a dbt RPC server and returns the request token.

This dbt RPC solid is asynchronous. The request token can be used in subsequent RPC requests to poll the progress of the running dbt process.

dagster_dbt.dbt_rpc_snapshot_and_wait(*args, **kwargs)[source]

This solid sends the dbt snapshot command to a dbt RPC server and returns the result of the executed dbt process.

This dbt RPC solid is synchronous, and will periodically poll the dbt RPC server until the dbt process is completed.

dagster_dbt.dbt_rpc_snapshot_freshness(*args, **kwargs)[source]

This solid sends the dbt source snapshot-freshness command to a dbt RPC server and returns the request token.

This dbt RPC solid is asynchronous. The request token can be used in subsequent RPC requests to poll the progress of the running dbt process.

dagster_dbt.dbt_rpc_snapshot_freshness_and_wait(*args, **kwargs)[source]

This solid sends the dbt source snapshot command to a dbt RPC server and returns the result of the executed dbt process.

This dbt RPC solid is synchronous, and will periodically poll the dbt RPC server until the dbt process is completed.

dagster_dbt.dbt_rpc_test(*args, **kwargs)[source]

This solid sends the dbt test command to a dbt RPC server and returns the request token.

This dbt RPC solid is asynchronous. The request token can be used in subsequent RPC requests to poll the progress of the running dbt process.

dagster_dbt.dbt_rpc_test_and_wait(*args, **kwargs)[source]

This solid sends the dbt test command to a dbt RPC server and returns the result of the executed dbt process.

This dbt RPC solid is synchronous, and will periodically poll the dbt RPC server until the dbt process is completed.

dagster_dbt.dbt_rpc_resource ResourceDefinition[source]

This resource defines a dbt RPC client.

To configure this resource, we recommend using the configured method.

Examples:

custom_dbt_rpc_resource = dbt_rpc_resource.configured({"host": "80.80.80.80","port": 8080,})

@pipeline(mode_defs=[ModeDefinition(resource_defs={"dbt_rpc": custom_dbt_rpc_resource})])
def dbt_rpc_pipeline():
    # Run solids with `required_resource_keys={"dbt_rpc", ...}`.
dagster_dbt.local_dbt_rpc_resource ResourceDefinition

This resource defines a dbt RPC client for an RPC server running on 0.0.0.0:8580.

class dagster_dbt.DbtRpcClient(host: str = '0.0.0.0', port: int = 8580, jsonrpc_version: str = '2.0', logger: Optional[Any] = None, **_)[source]

A client for a dbt RPC server.

If you are need a dbt RPC server as a Dagster resource, we recommend that you use dbt_rpc_resource.

cli(*, cli: str, **kwargs) → requests.models.Response[source]

Sends a request with CLI syntax to the dbt RPC server, and returns the response. For more details, see the dbt docs for running CLI commands via RPC.

Parameters

cli (str) – a dbt command in CLI syntax.

Returns

the HTTP response from the dbt RPC server.

Return type

Response

compile(*, models: List[str] = None, exclude: List[str] = None, **kwargs) → requests.models.Response[source]

Sends a request with the method compile to the dbt RPC server, and returns the response. For more details, see the dbt docs for compiling projects via RPC.

Parameters
  • models (List[str], optional) – the models to include in compilation.

  • exclude (List[str]), optional) – the models to exclude from compilation.

Returns

the HTTP response from the dbt RPC server.

Return type

Response

compile_sql(*, sql: str, name: str) → requests.models.Response[source]

Sends a request with the method compile_sql to the dbt RPC server, and returns the response. For more details, see the dbt docs for compiling SQL via RPC.

Parameters
  • sql (str) – the SQL to compile in base-64 encoding.

  • name (str) – a name for the compiled SQL.

Returns

the HTTP response from the dbt RPC server.

Return type

Response

generate_docs(*, models: List[str] = None, exclude: List[str] = None, compile: bool = False, **kwargs) → requests.models.Response[source]

Sends a request with the method docs.generate to the dbt RPC server, and returns the response. For more details, see the dbt docs for the RPC method docs.generate.

Parameters
  • models (List[str], optional) – the models to include in docs generation.

  • exclude (List[str], optional) – the models to exclude from docs generation.

  • compile (bool, optional) – If True (default), then compile the project before generating docs.

Returns

the HTTP response from the dbt RPC server.

Return type

Response

property host

The IP address of the host of the dbt RPC server.

Type

str

property jsonrpc_version

The JSON-RPC version to send in RPC requests.

Type

str

kill(*, task_id: str) → requests.models.Response[source]

Sends a request with the method kill to the dbt RPC server, and returns the response. For more details, see the dbt docs for the RPC method kill.

Parameters

task_id (str) – the ID of the task to terminate.

Returns

the HTTP response from the dbt RPC server.

Return type

Response

property logger

A property for injecting a logger dependency.

Type

Optional[Any]

poll(*, request_token: str, logs: bool = False, logs_start: int = 0) → requests.models.Response[source]

Sends a request with the method poll to the dbt RPC server, and returns the response. For more details, see the dbt docs for the RPC method poll.

Parameters
  • request_token (str) – the token to poll responses for.

  • logs (bool) – Whether logs should be returned in the response. Defaults to False.

  • logs_start (int) – The zero-indexed log line to fetch logs from. Defaults to 0.

Returns

the HTTP response from the dbt RPC server.

Return type

Response

property port

The port of the dbt RPC server.

Type

int

ps(*, completed: bool = False) → requests.models.Response[source]

Sends a request with the method ps to the dbt RPC server, and returns the response. For more details, see the dbt docs for the RPC method ps.

Parameters

compelted (bool) – If True, then also return completed tasks. Defaults to False.

Returns

the HTTP response from the dbt RPC server.

Return type

Response

run(*, models: List[str] = None, exclude: List[str] = None, **kwargs) → requests.models.Response[source]

Sends a request with the method run to the dbt RPC server, and returns the response. For more details, see the dbt docs for the RPC method run.

Parameters
  • models (List[str], optional) – the models to include in the run.

  • exclude (List[str]), optional) – the models to exclude from the run.

Returns

the HTTP response from the dbt RPC server.

Return type

Response

run_operation(*, macro: str, args: Optional[Dict[str, Any]] = None, **kwargs) → requests.models.Response[source]

Sends a request with the method run-operation to the dbt RPC server, and returns the response. For more details, see the dbt docs for the command run-operation.

Parameters
  • macro (str) – the dbt macro to invoke.

  • args (Dict[str, Any], optional) – the keyword arguments to be supplied to the macro.

Returns

the HTTP response from the dbt RPC server.

Return type

Response

run_sql(*, sql: str, name: str) → requests.models.Response[source]

Sends a request with the method run_sql to the dbt RPC server, and returns the response. For more details, see the dbt docs for running SQL via RPC.

Parameters
  • sql (str) – the SQL to run in base-64 encoding.

  • name (str) – a name for the compiled SQL.

Returns

the HTTP response from the dbt RPC server.

Return type

Response

seed(*, show: bool = False, **kwargs) → requests.models.Response[source]

Sends a request with the method seed to the dbt RPC server, and returns the response. For more details, see the dbt docs for the RPC method seed.

Parameters

show (bool, optional) – If True, then show a sample of the seeded data in the response. Defaults to False.

Returns

the HTTP response from the dbt RPC server.

Return type

Response

snapshot(*, select: List[str] = None, exclude: List[str] = None, **kwargs) → requests.models.Response[source]

Sends a request with the method snapshot to the dbt RPC server, and returns the response. For more details, see the dbt docs for the command snapshot.

Parameters
  • select (List[str], optional) – the snapshots to include in the run.

  • exclude (List[str], optional) – the snapshots to exclude from the run.

Returns

the HTTP response from the dbt RPC server.

Return type

Response

snapshot_freshness(*, select: Optional[List[str]] = None, **kwargs) → requests.models.Response[source]

Sends a request with the method snapshot-freshness to the dbt RPC server, and returns the response. For more details, see the dbt docs for the command source snapshot-freshness.

Parameters

select (List[str], optional) – the models to include in calculating snapshot freshness.

Returns

the HTTP response from the dbt RPC server.

Return type

Response

status()[source]

Sends a request with the method status to the dbt RPC server, and returns the response. For more details, see the dbt docs for the RPC method status.

Returns

the HTTP response from the dbt RPC server.

Return type

Response

test(*, models: List[str] = None, exclude: List[str] = None, data: bool = True, schema: bool = True, **kwargs) → requests.models.Response[source]

Sends a request with the method test to the dbt RPC server, and returns the response. For more details, see the dbt docs for the RPC method test.

Parameters
  • models (List[str], optional) – the models to include in testing.

  • exclude (List[str], optional) – the models to exclude from testing.

  • data (bool, optional) – If True (default), then run data tests.

  • schema (bool, optional) – If True (default), then run schema tests.

Returns

the HTTP response from the dbt RPC server.

Return type

Response

property url

The URL for sending dbt RPC requests.

Type

str

class dagster_dbt.DbtRpcOutput[source]

The output from executing a dbt command via the dbt RPC server.

Note that users should not construct instances of this class directly. This class is intended to be constructed from the JSON output of dbt commands.

result

The dbt results from the executed command.

Type

DbtResult

state

The state of the polled dbt process.

Type

str

start

An ISO string timestamp of when the dbt process started.

Type

str

end

An ISO string timestamp of when the dbt process ended.

Type

str

elapsed

The duration (in seconds) for which the dbt process was running.

Type

float

classmethod from_dict(d: Dict[str, Any]) → dagster_dbt.rpc.types.DbtRpcOutput[source]

Constructs an instance of DbtRpcOutput from a dictionary.

Parameters

d (Dict[str, Any]) – A dictionary with key-values to construct a DbtRpcOutput.

Returns

An instance of DbtRpcOutput.

Return type

DbtRpcOutput

Types

class dagster_dbt.DbtResult[source]

The results of executing a dbt command.

Note that users should not construct instances of this class directly. This class is intended to be constructed from the JSON output of dbt commands.

logs

JSON log output from the dbt process.

Type

List[Dict[str, Any]]

results

Details about each executed dbt node (model) in the run.

Type

List[NodeResult]]

generated_at

An ISO string timestamp of when the run result was generated by dbt.

Type

str

elapsed_time

The execution duration (in seconds) of the run.

Type

float

classmethod from_dict(d: Dict[str, Any]) → dagster_dbt.types.DbtResult[source]

Constructs an instance of DbtResult from a dictionary.

Parameters

d (Dict[str, Any]) – A dictionary with key-values to construct a DbtResult.

Returns

An instance of DbtResult.

Return type

DbtResult

class dagster_dbt.NodeResult[source]

The result of executing a dbt node (model).

Note that users should not construct instances of this class directly. This class is intended to be constructed from the JSON output of dbt commands.

node

Details about the executed dbt node (model).

Type

Dict[str, Any]

error

An error message if an error occurred.

Type

Optional[str]

fail

The fail field from the results of the executed dbt node.

Type

Optional[Any]

warn

The warn field from the results of the executed dbt node.

Type

Optional[Any]

skip

The skip field from the results of the executed dbt node.

Type

Optional[Any]

status

The status of the executed dbt node (model).

Type

Optional[Union[str,int]]

execution_time

The execution duration (in seconds) of the dbt node (model).

Type

float

thread_id

The dbt thread identifier that executed the dbt node (model).

Type

str

step_timings

The timings for each step in the executed dbt node (model).

Type

List[StepTiming]

table

Details about the table/view that is created from executing a run_sql command on an dbt RPC server.

Type

Optional[Dict]

classmethod from_dict(d: Dict[str, Any]) → dagster_dbt.types.NodeResult[source]

Constructs an instance of NodeResult from a dictionary.

Parameters

d (Dict[str, Any]) – A dictionary with key-values to construct a NodeResult.

Returns

An instance of NodeResult.

Return type

NodeResult

class dagster_dbt.StepTiming[source]

The timing information of an executed step for a dbt node (model).

Note that users should not construct instances of this class directly. This class is intended to be constructed from the JSON output of dbt commands.

name

The name of the executed step.

Type

str

started_at

An ISO string timestamp of when the step started executing.

Type

datetime.datetime

completed_at

An ISO string timestamp of when the step completed execution.

Type

datetime.datetime

property duration

The execution duration of the step.

Type

datetime.timedelta

Errors

exception dagster_dbt.DagsterDbtError(description=None, metadata_entries=None)[source]

The base exception of the dagster-dbt library.

exception dagster_dbt.DagsterDbtCliRuntimeError(description: str, logs: List[Dict[str, Any]], raw_output: str)[source]

Represents an error while executing a dbt CLI command.

exception dagster_dbt.DagsterDbtCliFatalRuntimeError(logs: List[Dict[str, Any]], raw_output: str)[source]

Represents a fatal error in the dbt CLI (return code 2).

exception dagster_dbt.DagsterDbtCliHandledRuntimeError(logs: List[Dict[str, Any]], raw_output: str)[source]

Represents a model error reported by the dbt CLI at runtime (return code 1).

exception dagster_dbt.DagsterDbtCliOutputsNotFoundError(path: str)[source]

Represents a problem in finding the target/run_results.json artifact when executing a dbt CLI command.

For more details on target/run_results.json, see https://docs.getdbt.com/reference/dbt-artifacts#run_resultsjson.

exception dagster_dbt.DagsterDbtCliUnexpectedOutputError(invalid_line_nos: List[int])[source]

Represents an error when parsing the output of a dbt CLI command.

exception dagster_dbt.DagsterDbtRpcUnexpectedPollOutputError(description=None, metadata_entries=None)[source]

Represents an unexpected response when polling the dbt RPC server.