⏱️ How to schedule CRON jobs on PaaS like Heroku or Render? ⏱️

Hi :grinning:

We have a couple of large jobs that run around 2-3 times per week on a predefined CRON expression and we want to schedule them in Prefect. We also use Render as our hosting service.

As far as I understand, currently, if you are using a PassS like Heroku or Render, there are two ways to achieve this on Prefect:

  • Keep an instance running 24/7 in Render with an agent.
  • Use a “parallel” CRON job in Render (it will wake up at the same time and start the agent which will process the flow)

The first option is obviously very costly and not efficient. Those jobs need a significant amount of memory, and keeping Render instances running 24/7 is pour money down the drain.

The second option is doable, but it’s super brittle and takes away a big part of the robustness that Prefect brings.

I know about the “serverless” AWS solution and read Prefect’s blog about it (this), but I am specifically talking about high-level PaaS like Heroku and Render.

Currently, we use Render (as Heroku is slowly dying :wink:) and it would be amazing if we could define in a couple of clicks an integration to run Prefect CRON schedules on Render instances automatically.

:pray: I would happily help to write such an integration with Render if possible and try to make it happen. This will be super helpful for us!

Also, I strongly believe that if such an option will exist out-of-box, many (many) users will use this since currently there is just no simple and “codeless” way to run infrequent CRON jobs on Prefect.

There are a huge amount of devs (solo and teams) that use Heroku/Render (most of them moving to Render), that don’t want to use AWS and get their hand “dirty”.

Another option will be for Prefect’s cloud to actually provision such instances, but I believe it’s quite a big defocus for Prefect’s mission.

Would love to hear your thoughts about the issue, and again, would love to help and contribute regarding the Render integration. :heart:

1 Like

we would be definitely interested and if you would like to contribute, feel free to take a stab at it by creating a custom collection, or if you want to chat first, the #prefect-contributors channel in the community slack is a great place to ask about it - you can sign up here Prefect - Prefect

Contribution intro docs:

regarding the actual problem, you could solve it even using GitHub Action or use the same principle on Render as described here:

Thanks for info Anna,

But the solution you’ve proposed in “Scheduled Data Pipelines in 5 Minutes with Prefect and GitHub Actions” misses the whole point of Prefect scheduler. I do want to use Prefect for the scheduling itself, so, for example, I can see future scheduled jobs (yellow circles in the cloud UI) and late jobs.

The sum it up: I would like some “code hook” so that when Prefect Cloud sees that it is time for a new CRON scheduled to run, it will give me time to run some code, and only than dipstach an event to a queue/agent.

This way, inside this “code hook” I will get a change to create a new instance in Render which will start a new agent.

1 Like

it seems that you ask for serverless execution (provisioning infrastructure on demand when needed and shutting it down after completion). I think what you are looking for is AWS ECSTask, GCP CloudRunJob or Azure AzureContainerInstance infrastructure blocks. I don’t know Render enough to judge whether this is possible, but if you wanted to follow the same approach as those 3 blocks do, we would accept a contribution of such infrastructure block to deploying scheduled flow run on such on-demand compute infrastructure.

definitely feel free to explore and see whether Render as a platform fits into the concept of infrastructure blocks, if so then this is a great fit for contribution - LMK if that makes sense

This is exactly what I want to do.

Can you point me to a place I can find info on how to build such an infrastructure block in Prefect? (Also if we can do a quick zoom with one of your engineers that would be great).

If I understand correctly when I’ll start building such infrastructure block for Render, inside the infrastructure block implementation I will get a chance to tell Render to “provision some resources”?

I think such an infrastructure block will be super useful for both Render and Prefect. I am sure Render will also help from their side.

1 Like

this is a great example:

We don’t do zoom calls, sorry but the Slack channel can help unblock you

you tell me :smile: dunno how Render works, but this is the intention, yes

nice, can’t wait! please share here once you have a working PoC

Ok super interesting… but I don’t understand one thing.
Looking at:

ECSTask(command=["echo", "hello world"], launch_type="FARGATE_SPOT").run()

The above command would provision a new FARGATE_SPOT with some Prefect python code containing a @flow and maybe some @tasks, looking like this right?

from prefect import flow

def someFunc():
    #some code

But inside that python code, the context is lost (meaning knowing which Prefect flow it is and what is the current CRON run).

To further explain, let’s say I log some data using Prefect’s logger

logger = get_run_logger()
logger.info('some log')

How will this log be routed back to Prefect Cloud and shown correctly? Because this is not the regular use-case in which the @flow started by an agent that listened on a queue.

I am missing something here…

it is! this is one of many infrastructure blocks that can be deployed for scheduled flows deployed by an agent. Perhaps check the Process block first in the core library, thus should be easier to understand

OK, I’ll check the code, maybe it will make things clearer for me.
But just to make sure I understand the overall architecture:

The scheduling itself is done via Prefect Cloud. Meaning there is a deployment.yaml file with some CRON expression, and it was pushed to Prefect Cloud using a:

prefect deployment apply deployment.yaml 

So an agent will need to sit somewhere and listen?
Where dose this agent sit? (meaning who runs it?)

1 Like

yup, you’re exactly right :100: – the user runs the agent - Prefect doesn’t have access to the user’s code or data for privacy reasons, so the agent must be hosted on your infrastructure, and this agent then provisions infrastructure for scheduled flow runs

ok, but if I run the agent we are back to the same problem I started with. Because now I have two options. Either run the agent 24/7 or wake it on a schedule.

Keeping it up 24/7 is wasteful for a job that runs once a week.
And using Render’s CRON schedule to create an instance with an agent at that specific time makes the whole process very brittle - two CRONS running to maintain and match. I want Prefect cloud to be the only place where scheduling is done.

To make things clearer regarding Render. This is how you would run an agent 24/7, using a background worker:

But again as I said, it will have a monthly cost depending on the size of the instance:

So for a massive job that runs once a week and needs a very big instance, I will have to keep this background-worker above running 24/7 on a very high plan. This is pure money wasting (-:

The second option is to run Render’s CRON job:

It’s like Render’s background-worker but it runs (and costs money) only by the CRON expression. It also have an instance size.

But like I said, this is also problematic because I will have to maintain those CRON jobs in Render. It makes the whole process easy to break. If something happens in Render’s CRON, the Prefect job will be late (no agent is will run to take it).

Currently, there are two options:

  1. You run flow without deployments (e.g. from your laptop, serverless function, Render, K8s pod) and you only point to PREFECT_API_URL to observe that flow function call with Prefect
  2. You deploy a Prefect agent on your infrastructure and this agent runs flows from deployments, and the infrastructure specified on a deployment defines where this flow should get deployed to - e.g. if you would build a Render infra block, your flow deployments could use that infra, but the agent is still needed to deploy those scheduled runs.

You can think of an agent as a lightweight process you could run anywhere - it could be your laptop, on-prem server, or a tiny e2-micro instance on GCP or t2.micro instance on AWS. But the agent is always needed to deploy flow runs from deployments. Does it clarify it a bit more?

for sure, perhaps you could then leverage the example I shared with GitHub Actions? you could then leverage the resources that GitHub kindly provides us and you could offload any heavy computation to e.g. a container running on Render (or anywhere you see fit)

[quote=“yaron, post:11, topic:2017”]
it will have a monthly cost depending on the size of the instance:
$7 per month sounds quite awesome, I didn’t know this option to run an agent on Render exists, thanks for sharing this!

Thank you for all the help! (-:
I’ll keep investigating on infrastructure blocks, and then maybe I’ll get the answer I am looking for.

@yaron the agent can be used to spin up ephemeral infrastructure to support your individual flows. In other words you can use a very cheap instance to poll for flows 24/7 and then within the flow provision the necessary, ephemeral, expensive infrastructure.

Hi @secrettoad,
I am just about to finish my work on a custom infrastructure block. I’ll create a community post about the process, so others can easily create a custom infrastructure block for their specific needs.