How can I specify the retry behavior for a specific task?

anna_geller · January 24, 2022, 9:05pm

Prefect 2.0

You can set the maximum number of retries as an integer value on the @task decorator. The same applies to the retry delay:

@task(retries=2, retry_delay_seconds=60)

Prefect 1.0

Prefect 1.0 uses a slightly different syntax. The retry_delay expects a datetime.timedelta() object.

@task(max_retries=2, retry_delay=datetime.timedelta(minutes=1))

shayb · November 5, 2022, 7:42am

(If you’d like me to open a new thread, I will do so - I wasn’t sure if I should post it here).

Hi, we are in love with Prefect for the last ~1.5 years. Keep up the amazing work.

We could not find a way to implement conditional retries in prefect 2.0.
Here is a typical scenario we have:

We execute tasks with DaskExecuter, some tasks fail with exceptions that are unrecoverable, i.e., should not be retried.
Some fail with errors that can be retried, e.g. TimeoutError and we’d like them to follow the default prefect 2.0 mechanism (@task(retries …))

I tried to use prefect.orion.orchestration.dependencies.temporary_task_policy and override CoreTaskPolicy with a custom list returned from priority(), with a ConditionalRetryFailedTasks instead of RetryFailedTasks but couldn’t get it to work.

As a side note, in prefect 1.0 we successfully used a state_handler to achieve this.

Any other ideas,
Thanks,
Shay

anna_geller · November 6, 2022, 12:40pm

instead of using a state handler, you can operate on a state directly within a flow using if/else

and the retries argument still exists on a task decorator

if this doesn’t help, can you share your state handler?

shayb · November 9, 2022, 8:37pm

If I understand correctly, I’ll have to rethrow the exception if retries>1 which makes my code a bit verbose.
Here is an example:

# prefect 2.0

def should_exception_be_retried(exception: Exception) -> bool:
    return isinstance(exception, TimeoutError)


@task()
def conditional_failure(x):
    retries = 3
    for attempt in range(1, retries + 1):
        # TODO: we should replace print with the prefect logger
        print("ATTEMPT", attempt)
        try:
            # choice = random.choice([0, 1, 2])
            choice = random.choice([0]*1 + [1]*2 + [2]*8)
            print("DENOMINATOR", choice)
            if choice == 2:
                raise TimeoutError("A timeout error")
            return x / choice
        except Exception as e:
            if attempt < retries and should_exception_be_retried(e):
                # here we will log a warning
                continue
            raise

shayb · November 9, 2022, 8:44pm

For completeness this is how we used to solve it with prefect 1.0:

# prefect 1.0 code
# here is our old state_handler

@curry
def when_retry_exhausted_return_a_managed_failure_state_handler(
        tracked_obj: TrackedObjectType,
        old_state: state.State,
        new_state: state.State,
        # ignore_states: list = None,
        # only_states: list = None,
        webhook_secret: str = None,
        backend_info: bool = True,
) -> state.State:  # pragma: no cover
    """
    A state handler that acts when attempts are exhausted and returns a managed failure (quasi-state.Success) state notification.

    Copied from prefect open source and extended.
    For more information, see `better_slack_notifier()` method documentation.

    Args:
        tracked_obj: Task or Flow object the handler is
            registered with
        old_state: previous state of tracked object
        new_state: new state of tracked object
        webhook_secret: the name of the Prefect Secret that stores your slack
            webhook URL; defaults to `"SLACK_WEBHOOK_URL"`
        backend_info: Whether to supply slack notification with urls
            pointing to backend pages; defaults to True

    Returns:
        the `new_state` object that was provided
    """

    _logger: logging.Logger = prefect.context.get("logger")

    # while discarding flows is not mandatory, it simplifies the method
    if not isinstance(tracked_obj, Task):
        return new_state  # no notification

    task = cast(Task, tracked_obj)

    webhook_url = cast(
        str, prefect.client.Secret(webhook_secret or "SLACK_WEBHOOK_URL").get()
    )

    def managed_failure_state(log_msg: str) -> state.State:
        """A helper function that returns a quasi failure state."""
        new_state.context.update(managed_failure=True)  # Let others know this is a quasi-Successful state

        # this is our generated signals.FAIL?
        assert new_state.context.get("fail_on_mapped", False), \
            "While we have a `managed_failure`, `fail_on_mapped` is not True. How can this be?"

        managed_failed_state = state.Success(new_state.message,
                                             result=new_state.result, context=new_state.context)
        _logger.info(log_msg)
        return _format_message_and_create_slack_api_request(tracked_obj, new_state=managed_failed_state,
                                                            webhook_url=webhook_url, backend_info=backend_info)

    if new_state.is_failed():
        task_run_name = new_state.context.get('task_run_name',
                                              task.task_run_name)  # get task run name from context (on task, it is unformatted)
        # NOTE: the `should_retry` depends on the type of the error. It shall be updated by the task that failed resulting signals.FAIL()
        should_retry = new_state.context.get("should_retry", True)
        if not should_retry:
            return managed_failure_state(
                f"The task `{task_run_name}` encountered an exception with should_retry=false and hence will result a managed failure.")

        attempted = prefect.context.get("task_run_count", 1)
        max_retries = task.max_retries if task.max_retries else np.inf
        if attempted == max_retries:
            return managed_failure_state(
                f"All the task `{task_run_name}` retries were exhausted and hence will result a managed failure.")

    return new_state

anna_geller · November 10, 2022, 1:37am

thanks for sharing the code!

could you add more background about what problem are you trying to solve this way? do you want to retry on a specific exception type e.g. here retry on a Timeout Error?

we have an open issue for that here:

github.com/PrefectHQ/prefect

Add exception filters to retries

opened 01:44PM - 29 Mar 22 UTC

marvin-robot

enhancement from:slack v2 status:roadmap

## Opened from the [Prefect Public Slack Community](https://prefect.io/slack) *…*mike399**: Hi again, we have a requirement to handle two classes of errors: retryable errors and non-retryable errors. I think I saw some examples how this might be achieved somewhere but cant spot them for Prefect 2. In the non-retryable scenario we'd want the task state to go immediately to Failed (bypassing any retry settings configured on the task decorator) **anna**: In Prefect 2.0 you can do: For retriable errors: ``` @task(retries=2, retry_delay_seconds=60) ``` For non-retriable errors (no retries): ``` @task ``` **mike399**: But if I have a single task that can throw either type of error? Is there a context object that can be manipulated, I was thinking something similar to this? ``` import requests from prefect import flow, task @task(retries=3,retry_delay_seconds=5) def call_api(url): try: response = requests.get(url) print(response.status_code) return response.json() except NonRetryableError: TaskRunContext.set_state(TaskRun.FAILED) @flow def task_retry_bypass(url): fact_json = call_api(url) ``` **anna**: Generally speaking, there is no need to use try-except in a Prefect flow. That's why you use Prefect so that you don't need to worry about all the different types of exceptions that might need to be caught. The `retries` mean a *maximum* number of retries: 1. If the task doesn't need a retry and succeeds after its first execution, it's done. 2. If it succeeds after the first retry (even though you configured 3 of them), the same - the task ends successfully after this first retry. 3. If it still fails after 3 retries, then it will be marked as Failed. **anna**: But if you want e.g. to do something if the task throws a specific exception, you could do it within a `@flow` by acting on the `task.wait().result()` which would be an exception in such case - LMK if you need an example **anna**: ``` from prefect import task, flow, get_run_logger from prefect.orion.schemas.states import Failed import requests @task(retries=3, retry_delay_seconds=5) def call_api(url): response = requests.get(url) status_code = response.status_code logger = get_run_logger() <http://logger.info|logger.info>(status_code) if status_code != 200: return Failed(message="Stopping the task run immediately!") @flow def task_retry_bypass(url): call_api(url) if __name__ == "__main__": task_retry_bypass() ``` **anna**: this way you explicitly decide when the task run is supposed to be considered as `Failed` based on the API response - raising any exception will automatically cause the task run to be considered Failed without having to manually set that using: ``` TaskRunContext.set_state(TaskRun.FAILED) ``` **mike399**: Is overriding the task run state an option, it would allow us to benefit from the task level retry configuration - but also skip that when needed. A common scenario would be distinguishing between transient system failures (e.g. a service unavailable) vs permanent errors arising from unexpected application state (e.g. once an object has been deleted there's no point retrying any attempt to modify it) **anna**: Good question. You can return any State in your task to influence that: <https://orion-docs.prefect.io/concepts/states/> **anna**: Basically, instead of: ``` TaskRunContext.set_state(TaskRun.FAILED) ``` you do: ``` return prefect.orion.schemas.states.Failed("your_msg") ``` **mike399**: Thanks Anna, will give that a try **anna**: You could even do it on a flow level - having normal retries on a task level, but if the task result returns a specific value, you raise an exception on a flow level and end the flow run immediately - just mentioning this because it's also a valid option: ``` from prefect import task, flow @task(retries=3, retry_delay_seconds=5) def call_api(url): response = requests.get(url) status_code = response.status_code logger = get_run_logger() <http://logger.info|logger.info>(status_code) return response.json() @flow def task_retry_bypass(): res = call_api().wait().result() if res["sth"] == "sth": raise ValueError("Non retriable error!") if __name__ == "__main__": task_retry_bypass() ``` **mike399**: This one worked (no retries) ``` from prefect import flow, task import prefect @task(retries=3,retry_delay_seconds=5) def call_api(url): return prefect.orion.schemas.states.StateType.FAILED @flow def task_retry_bypass(url): fact_json = call_api(url) task_retry_bypass("foo") ~ ``` We'd like to make this as simple as possible (to reduce learning curve and help with developer experience), so I think we can extend the above and use an inner decorated function to hide all the details **anna**: Sure, you can customize it as you wish, it's pretty much just Python so the sky is the limit! :slightly_smiling_face: **mike399**: I hit a probem with this...if the return is ``` return prefect.orion.schemas.states.StateType.FAILED ``` Then the retry is skipped BUT the Flow Run ends up in a Completed state - tracing through this occurs because the StateType.FAILED is not a State (so Prefect defaults to Completed for "python object returns") and if we return ``` return prefect.orion.schemas.states.Failed() ``` Then client.py: propose_state appears to do a call to set_task_run_state, but the proposed state is rejected and we go into a retry loop **mike399**: This is getting a little too involved, most of our workflows tend to be idempotent system operations, so we may be able to handle this in a different way (checking result payload in the flow as you indicated). Would be nice if Prefect had additional Exception filters it could apply for retry (similar to the tenacity library) **anna**: Thanks for the suggestion, I'll pass it on to the product team **anna**: <@ULVA73B9P> open “Orion: feature request to add Exception filters applicable for retries” Original thread can be found [here](https://prefect-community.slack.com/archives/CL09KU1K7/p1648552515819469?thread_ts=1648552515.819469&cid=CL09KU1K7).

shayb · November 10, 2022, 5:02am

It looks like a similar request to the one I have. I’ll try it
Thanks for the answers

Topic		Replies	Views
Does Prefect v1.0 @task() support retry jitter? Archive prefect-1-0 , getting-started	2	486	February 17, 2022
Prefect 2.7.4 is here with improvements to retry delays, new defaults added to .prefectignore, and more! Announcements prefect-2-0	0	1053	December 22, 2022
Wrappers for tasks Archive prefect-2-0 , task-library	1	757	June 24, 2022
Selective Task Retries in Prefect Help community	0	273	May 22, 2023
How do I create my own task decorator in Prefect 1.0 to add functionality or set defaults? Archive prefect-1-0 , task-decorator	0	455	May 24, 2022

How can I specify the retry behavior for a specific task?

Prefect 2.0

Prefect 1.0

Related Topics