Idiomatic handling of external files

Let’s say I want to run an external command-line tool that generates some data:

from prefect import task
from pathlib import Path
from subprocess import run

@task
def sort(file: Path) -> Path:
    run(["wget", "www.google.com/robots.txt"])
    return Path("robots.txt")

This seems like a bad design because the data management is outside of Prefect. Is there a way to tell Prefect to track the newly generated file? Is the Path data type correct to use here, or should I be actually loading the contents of the file into memory (but this will be a problem if it’s enormous and I don’t need Python to read it directly). Should I be using blocks here? I’m lost

1 Like