Result location vs. targets

anna_geller · February 20, 2022, 12:20pm

View in #prefect-community on Slack

@John_Shearer: Is it expected that running a local flow with PREFECT__FLOWS__CHECKPOINTING=false but with checkpoint data present in the prefect result directory would read from those results? - I would expect this, but this is the current behaviour (on my machine …)
fyi - in this case my result locations are set by only the current day, so today I ran a couple of flows with checkingpoint=true, but now further jobs with checkingpoint=false are using that (old) checkingpoint data

@Anna_Geller: @John_Shearer afaik, what you set on @task(checkpoint=False) is important

@John_Shearer: In all cases I have @task(checkpoint=True) for the tasks, but checkingpoint has been disabled at the flow level (either config file, or environment variable)

@Kevin_Kho: That env variable is overridden to True or Cloud or Server runs so it needs to be done at the task level

@John_Shearer: for tasks with @task(checkpoint=True) they don’t write out results when env var PREFECT__FLOWS__CHECKPOINTING=false , but they do appear to read results in that same case (if they are already present)
This is running without cloud or server (from local pytests)

@Kevin_Kho: They won’t for local runs without a backend. That’s right because that env variable is respected

@John_Shearer: I think my case is a little odd, I’ll try to go through step by step. one minute …
• I have a number of tasks in my flow with @task(checkpoint=True) - with a result location based on the current date (no time component).
results directory is initially empty

I run my flow from pytest with environment variable PREFECT__FLOWS__CHECKPOINTING=false
a. this creates no files in the results directory - YAY

I run my flow from pytest with environment variable PREFECT__FLOWS__CHECKPOINTING=true
a. this creates some files in the results directory - YAY

I run my flow (again) from pytest with environment variable PREFECT__FLOWS__CHECKPOINTING=true
a. this reads from the files in the results directory - YAY

I run my flow from pytest with environment variable PREFECT__FLOWS__CHECKPOINTING=false
a. this reads from the files in the results directory - Unexpected
Does that make sense?

@Kevin_Kho: Ah ok I see what you mean. Can you show me how you defined the Result location?

@John_Shearer:
@task(result=pandas_result, target=parquet_location)
def some_task():
   ...
import pendulum

def pickle_location(**kwargs) -> str:
    return location_by_extension(suffix="pickle", **kwargs)

def parquet_location(**kwargs) -> str:
    return location_by_extension(suffix="parquet", **kwargs)

def location_by_extension(flow_name, scheduled_start_time, task_slug, suffix="parquet", **kwargs):
    date: str = pendulum.instance(scheduled_start_time).format("Y/M/D")
    # time: str = slugify(pendulum.instance(scheduled_start_time).time().isoformat())

    # return f"{date}/{time}__{flow_run_id}/{task_slug}-prefect_result.{suffix}"
    return f"{flow_name}/{date}/{task_slug}-prefect_result.{suffix}"
sure. Sorry, it’s a little ugly
I wouldn’t expect result or target to be used with PREFECT__FLOWS__CHECKPOINTING=false , though I likely have a misunderstanding somewhere
(FYI - no urgency on this. It’s not blocking me, and I’m signing off for today anyway) - thanks

@Kevin_Kho: It’s the target that is causing this behavior. Targets are file based caching mechanisms so if the file exists, it will load the file instead of executing the task. You can use the result.location instead without the target and I think this will work an intended
@task(result=Result(..,location="..."))

@John_Shearer: Oh great. That’ll do nicely
Thanks so much

Topic		Replies	Views
Is it normal that results are not persisted when you run flows locally through flow.run()? Archive prefect-1-0 , checkpointing	0	438	April 2, 2022
How to ensure that a specific task gets executed only once a week in a flow scheduled to run daily? Archive prefect-1-0 , scheduling , caching , checkpointing , targets	0	788	March 14, 2022
Flow run states working different in a local run than in a backend run - can't use state results in a state-handlers the same way Archive prefect-1-0 , state_handlers , states , slack-notifications , failure , failure-notification , flow-run-view	1	551	April 5, 2022
I am facing issues with checkpointing Dask dataframe and restarting from failure Archive prefect-1-0 , dask , results , checkpointing , dask-dataframes	5	1654	March 9, 2022
Why results need to be configured in order to use retries? Archive prefect-1-0 , retries , results	0	805	May 3, 2022

Result location vs. targets

Related topics