Click on “Launch a new EC2 instance” within your preferred region and then Select Ubuntu 20.04 AMI:
Follow all the default steps until the following step:
Create a key pair if needed:
Use the connect instructions explained under the Connect section to SSH to the instance:
Then, once you SSH’ed to it, run everything as sudo
user: sudo su
.
Create a file called install_script.bash
using e.g. vim or simply with cat
command:
cat >> install_script.bash
# paste the lines below and then do control + C to exit
sudo apt-get update -y
sudo apt-get upgrade -y
sudo apt-get install software-properties-common -y
sudo apt-get install python3-dateutil -y
sudo apt install python3-pip -y
sudo ln -s /usr/bin/python3 /usr/bin/python
# sudo apt install docker.io -y
PATH="$HOME/.local/bin:$PATH"
export PATH
pip3 install prefect supervisor s3fs
After running this script, you can check if Prefect was properly installed using the prefect version
command:
Attach IAM role with S3 access to the instance
To allow flow storage with S3, attach an IAM role to the instance:
For convenience in a simple PoC, you may create a role with S3FullAccess
permissions.
Once you save it, your instance should have access to S3! This is required so that your execution environment can pull flow from S3 storage.
Authenticate with Prefect Cloud 2.0
Create a new API key: Prefect Cloud
Copy the key and use it with this command:
prefect cloud login --key YOUR_API_KEY
This will prompt you to select a workspace - choose your workspace and hit Enter.
The result of the above command is several environment variables stored in your default profile. You can view those using prefect config view
:
PREFECT_PROFILE='default'
PREFECT_API_URL='https://api.prefect.cloud/api/accounts/acc_id/workspaces/workspace_id'
PREFECT_API_KEY='YOUR_API_KEY'
Create a work-queue for flows deployed to this EC2 instance
In this example, we create a queue named “ubuntu” which will allow for an agent polling for that work queue to deploy flow runs for any deployment that has a tag ubuntu
:
prefect work-queue create -t ubuntu ubuntu
The output of that command:
The above command generated a UUID that you’ll need to start an agent, as explained in the next section.
Starting a supervisor process
Create a file called supervisord.conf
with the following contents (replace with your WORK_QUEUE_ID):
[unix_http_server]
file=/tmp/supervisor.sock ; the path to the socket file
[supervisord]
loglevel=debug ; log level; default info; others: debug,warn,trace
[rpcinterface:supervisor]
supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface
[supervisorctl]
serverurl=unix:///tmp/supervisor.sock ; use a unix:// URL for a unix socket
[program:prefect-agent]
command=prefect agent start -q dev
Now to start the agent, run:
supervisord -c ./supervisord.conf
Start a simple flow to test that it worked!
Create a flow file called work_queue_test_flow.py
:
import platform
import prefect
from prefect import task, flow
from prefect import get_run_logger
from prefect.orion.api.server import ORION_API_VERSION
import sys
@task
def log_platform_info():
logger = get_run_logger()
logger.info("Host's network name = %s", platform.node())
logger.info("Python version = %s", platform.python_version())
logger.info("Platform information (instance type) = %s ", platform.platform())
logger.info("OS/Arch = %s/%s", sys.platform, platform.machine())
logger.info("Prefect Version = %s 🚀", prefect.__version__)
logger.info("Prefect API Version = %s", ORION_API_VERSION)
@flow
def healthcheck():
log_platform_info()
if __name__ == "__main__":
healthcheck()
You can run the following commands locally (not from EC2):
prefect deployment build flows/healthcheck.py:healthcheck --name cicd -q prod -t project -o deploy/s3.yaml -sb s3/prod -v GITHUB_SHA
prefect deployment apply deploy/s3.yaml
prefect deployment run healthcheck/s3
Note that it generated a flow run with a name lavender-pigeon
If you go to your Cloud 2.0 dashboard, you should see a flow run confirming the success. And in the flow run logs, you should see the confirmation that the flow run got executed on a remote Ubuntu EC2 instance even though we triggered the flow run from our local development machine from the terminal.
Ensuring the agent starts on VM reboot
But what if you stop your VM for the night and restart it later? To ensure you don’t have to go through this entire process again, you can use the following command:
echo "@reboot root supervisord -c /home/ubuntu/supervisord.conf -l /home/ubuntu/supervisord.log -u root" >> /etc/crontab
After running this command, stop your instance and try to rerun the same deployment from the UI:
Note that this flow run will be shown as late, as it cannot get deployed since the instance is stopped:
Now let’s start the instance again:
Once the instance boots up, we should see that the late flow run gets automatically picked up and executed!
Delightful, indeed!