When should I use FileStorageBlock vs. specific cloud-platform storage block such as S3StorageBlock, GoogleCloudStorageBlock or AzureBlobStorageBlock?

1. FileStorageBlock

Our general recommendation is to use the FileStorageBlock when:

  • you need a simple setup,
  • your execution layer (i.e. your agent) is already authenticated to the cloud provider object storage such as S3, GCS, or Azure Blob Storage so that you don’t need to store those credentials on the Prefect backend side

1.1 Installation

pip install s3fs
pip install gcsfs
pip install adlfs # for Azure Datake and Azure Blob Storage

1.2 Usage examples

from prefect.deployments import DeploymentSpec
from prefect.blocks.storage import FileStorageBlock

DeploymentSpec(
    name="s3",
    flow=hello_flow,
    tags=["local"],
    flow_storage=FileStorageBlock(base_path="s3://prefect-orion/flows"),
)

DeploymentSpec(
    name="gcs",
    flow=hello_flow,
    tags=["local"],
    flow_storage=FileStorageBlock(base_path="gcs://prefect-orion/flows"),
)

DeploymentSpec(
    name="azure",
    flow=hello_flow,
    tags=["local"],
    flow_storage=FileStorageBlock(base_path="az://orion"),
)

2. Cloud vendor-specific object storage services

Use those when you:

  • use a specific Cloud provider only
  • want to have those credentials being stored on the block object so that you don’t need any additional credentials being set up on your execution layer (i.e. your agent and various flow runners) - this means that regardless if you end up deploying your flows with a KubernetesFlowRunner, DockerFlowRunner, or a SubprocessFlowRunner, your credentials need to be set up only during the deployment creation, e.g. from the CI/CD pipeline. Those credentials are then securely stored on the API backend in an encrypted manner and can be reused across various flow runners and execution environments.

2.1 Installation

Choose the library relevant to your environment:

pip install azure 
pip install google
pip install boto3

Note: currently, all those libraries are installed as part of the core Prefect 2.0 library, but they will eventually be moved into separate Prefect Collections to avoid extraneous dependencies in the core library.

2.2 Usage examples

from prefect.deployments import DeploymentSpec
from prefect.blocks.storage import (
    AzureBlobStorageBlock,
    S3StorageBlock,
    GoogleCloudStorageBlock
)

DeploymentSpec(
    name="s3",
    flow=hello_flow,
    tags=["local"],
    flow_storage=S3StorageBlock(bucket="prefect-orion"),
)

DeploymentSpec(
    name="gcs",
    flow=hello_flow,
    tags=["local"],
    flow_storage=GoogleCloudStorageBlock(bucket="prefect-orion"),
)

DeploymentSpec(
    name="azure",
    flow=hello_flow,
    tags=["local"],
    flow_storage=AzureBlobStorageBlock(container="orion"),
)

3. Default storage

Use it when you:

  • want to configure storage only once and don’t want to explicitly define it on your DeploymentSpec
  • want a hands-off approach when Prefect takes care of it for you behind the scenes while still keeping the benefits of the hybrid execution model

3.1 Installation

Same as above, choose the library relevant to your environment.

3.2 Usage examples

Execute the command below and follow the CLI instructions:

prefect storage create

4. Define storage by ID

prefect storage ls

You may also specify the storage configured as above but set it explicitly by ID:

from prefect import task, flow


@task
def say_hi():
    result = "Hi!"
    print(result)
    return result


@flow
def hello_world():
    say_hi()


from prefect.deployments import DeploymentSpec

DeploymentSpec(
    flow=hello_world,
    name="local-hw",
    flow_storage="0623f503-8ed4-473a-b308-ff7af2384cbd",
    tags=["local"],
)
1 Like

How to configure minio FileStorageBlock?

To leverage minio, specify "endpoint_url" in your client_kwargs:

# for boto
os.environ["AWS_ACCESS_KEY_ID"] = "minio"
os.environ["AWS_SECRET_ACCESS_KEY"] = "miniominio"

minio_storage = FileStorageBlock(
    base_path='s3://prefect/deployments',
    key_type='hash',
    options={  # Additional options to pass to the underlying fsspec file system.
        # a gamble that as Prefect is using s3fs (that in its turn also uses fsspec)
        # these options are consumed by fsspec
        "client_kwargs": {"endpoint_url": "http://minio:9000"}
    }
)

References