Rate limiting

One of the hurdles in getting data from Bluesky is working within the rate limits. Let's go back and look at the get_all_feed_items function that extracts feed information. This function uses tenacity to handle retries for the function _get_feed_with_retries and will back off requests if we begin to hit our limits.

def get_all_feed_items(client: Client, actor: str) -> list["models.AppBskyFeedDefs.FeedViewPost"]:
    """Retrieves all author feed items for a given `actor`.

    Args:
        client (Client): AT Protocol client
        actor (str): author identifier (did)

    Returns:
        List['models.AppBskyFeedDefs.FeedViewPost'] list of feed

    """
    import math

    import tenacity

    @tenacity.retry(
        stop=tenacity.stop_after_attempt(5),
        wait=tenacity.wait_fixed(math.ceil(60 * 2.5)),
    )
    def _get_feed_with_retries(client: Client, actor: str, cursor: Optional[str]):
        return client.get_author_feed(actor=actor, cursor=cursor, limit=100)

    feed = []
    cursor = None
    while True:
        data = _get_feed_with_retries(client, actor, cursor)
        feed.extend(data.feed)
        cursor = data.cursor
        if not cursor:
            break

    return feed

Then if we look at the actor_feed_snapshot asset that uses get_all_feed_items, you will see one additional parameter in the decorator.

    op_tags={"dagster/concurrency_key": "ingestion"},

This tells the asset to use the concurrency defined in the dagster.yaml which is a top level configuration of the Dagster instance.

concurrency:
  default_op_concurrency_limit: 1

We already mentioned that the actor_feed_snapshot asset is dynamically partitioned by user feeds. This means that without setting concurrency controls, all of those segments within the partition would execute in parallel. Given that Bluesky is the limiting factor, and the shared resource client by all of the assets, we want to ensure that only one asset is running at a time. Applying the concurrency control ensures that Dagster will do this without having to add additional code to our assets.

Now that we know how to extract data and store all this data, we can talk about how to model it.

Next steps

Continue this example with modeling

Next steps​

Next steps