Skip to main content

Profiling hanging or slow code with py-spy

If your Dagster code is hanging or taking longer than you expect to execute, we recommend using py-spy to profile your code.

For hanging code, py-spy dump can give you a dump of each thread, which usually makes it immediately clear where the hang is happening.

For slow code, py-spy record can produce a file that gives you a flame graph of where the process is spending the most time. (We recommend py-spy record -f speedscope --idle to produce speedscope profiles, and to include idle CPU time in the results.)

Permissions required to run py-spy

py-spy usually requires elevated permissions in order to run.

For example, to run py-spy locally to understand why definitions are taking a long time to import:

sudo py-spy record -f speedscope --idle -- dagster definitions validate

Generating a py-spy dump for a hanging run in Kubernetes

  1. Configure your Dagster deployment so that each run pod is using a security context that can run py-spy. Note that this gives the pod elevated permissions, so check with your cluster admins to make sure this is an acceptable change to make temporarily.

If you're using the Dagster Open Source Helm chart, you can configure the run launcher to launch each run with

runLauncher:
type: K8sRunLauncher
config:
k8sRunLauncher:
runK8sConfig:
containerConfig:
securityContext:
capabilities:
add:
- SYS_PTRACE

For more information on applying this type of configuration to your Kubernetes pod in Dagster OSS, see Customizing your Kubernetes deployment.

info

For more information on running py-spy in Kubernetes, see this py-spy guide.

  1. Launch a run and wait until it hangs.

  2. Check the event logs for the run to find the run pod, then kubectl exec into the pod to run py-spy:

    kubectl exec -it <pod name here> /bin/bash
  3. Install py-spy, then run it:

    pip install py-spy
    py-spy dump --pid 1

    This should output a dump of what each thread in the process is doing.