View in #prefect-community on Slack
@John_Shearer: Is it expected that running a local flow with PREFECT__FLOWS__CHECKPOINTING=false
but with checkpoint data present in the prefect result directory would read from those results? - I would expect this, but this is the current behaviour (on my machine …)
fyi - in this case my result locations are set by only the current day, so today I ran a couple of flows with checkingpoint=true, but now further jobs with checkingpoint=false are using that (old) checkingpoint data
@Anna_Geller: @John_Shearer afaik, what you set on @task(checkpoint=False)
is important
@John_Shearer: In all cases I have @task(checkpoint=True)
for the tasks, but checkingpoint has been disabled at the flow level (either config file, or environment variable)
@Kevin_Kho: That env variable is overridden to True or Cloud or Server runs so it needs to be done at the task level
@John_Shearer: for tasks with @task(checkpoint=True)
they don’t write out results when env var PREFECT__FLOWS__CHECKPOINTING=false
, but they do appear to read results in that same case (if they are already present)
This is running without cloud or server (from local pytests)
@Kevin_Kho: They won’t for local runs without a backend. That’s right because that env variable is respected
@John_Shearer: I think my case is a little odd, I’ll try to go through step by step. one minute …
• I have a number of tasks in my flow with @task(checkpoint=True)
- with a result location based on the current date (no time component).
results directory is initially empty
- I run my flow from pytest with environment variable
PREFECT__FLOWS__CHECKPOINTING=false
a. this creates no files in the results directory - YAY
- I run my flow from pytest with environment variable
PREFECT__FLOWS__CHECKPOINTING=true
a. this creates some files in the results directory - YAY
- I run my flow (again) from pytest with environment variable
PREFECT__FLOWS__CHECKPOINTING=true
a. this reads from the files in the results directory - YAY
- I run my flow from pytest with environment variable
PREFECT__FLOWS__CHECKPOINTING=false
a. this reads from the files in the results directory - Unexpected
Does that make sense?
@Kevin_Kho: Ah ok I see what you mean. Can you show me how you defined the Result location?
@John_Shearer:
@task(result=pandas_result, target=parquet_location)
def some_task():
...
import pendulum
def pickle_location(**kwargs) -> str:
return location_by_extension(suffix="pickle", **kwargs)
def parquet_location(**kwargs) -> str:
return location_by_extension(suffix="parquet", **kwargs)
def location_by_extension(flow_name, scheduled_start_time, task_slug, suffix="parquet", **kwargs):
date: str = pendulum.instance(scheduled_start_time).format("Y/M/D")
# time: str = slugify(pendulum.instance(scheduled_start_time).time().isoformat())
# return f"{date}/{time}__{flow_run_id}/{task_slug}-prefect_result.{suffix}"
return f"{flow_name}/{date}/{task_slug}-prefect_result.{suffix}"
sure. Sorry, it’s a little ugly
I wouldn’t expect result
or target
to be used with PREFECT__FLOWS__CHECKPOINTING=false
, though I likely have a misunderstanding somewhere
(FYI - no urgency on this. It’s not blocking me, and I’m signing off for today anyway) - thanks
@Kevin_Kho: It’s the target
that is causing this behavior. Targets are file based caching mechanisms so if the file exists, it will load the file instead of executing the task. You can use the result.location
instead without the target and I think this will work an intended
@task(result=Result(..,location="..."))
@John_Shearer: Oh great. That’ll do nicely
Thanks so much