How to use logging in Prefect - a tutorial by Andrew Brookins

anna_geller · February 7, 2022, 6:20pm

This blog post includes:

logging in tasks and flows
filtering logs in the UI
setting extra loggers using environment variable PREFECT_LOGGING_EXTRA_LOGGERS
formatting logs
custom log configuration via a YAML file

https://www.prefect.io/blog/logs-the-prefect-way/

simon_mackenzie · April 15, 2022, 12:28pm

Very useful thanks. I have some questions about logging in prefect2:

can I use the standard logging.getLogger() instead of get_run_logger()? This seems to work via the root log configuration. only difference seems to be it does not have task/flow metadata but is it likely to cause other issues?
can i do log=logging.getLogger() at the top of the file to define a log for all tasks? This won’t work with get_run_logger as it may not be in a task/flow when imported. It does seem to work so far but I am not sure if it will be a problem sometimes when multiprocessing.

anna_geller · April 15, 2022, 1:07pm

Yes, you can! But this is considered an extra logger, so you would need to add it to your prefect configuration e.g., as an environment variable. Check this usage example:

Yes, exactly as shown in the example. This is true only for the extra logger, though. Defining Prefect logger globally wouldn’t work - see this topic:

simon_mackenzie · June 25, 2022, 10:41am

this all works fine for sequentialtaskrunner. However it does not seem to format the output for dask or ray. Output I get and test code is below. What am I missing?

[I know there is a known issue which prevents it showing on orion but here I am talking about console output.]

sequential

11:15:29.331 | WARNING | Task run 'testtask-3dde1f43-0' - warning hi
11:15:29.333 | WARNING | root - warning std root log
11:15:29.335 | WARNING | test1 - warning extra log

dask

warning hi
warning std root log
warning extra log

ray

(begin_task_run pid=6347) warning hi
(begin_task_run pid=6347) warning std root log
(begin_task_run pid=6347) warning extra log

import os

os.environ.update(
    PREFECT_ORION_DATABASE_CONNECTION_TIMEOUT="60.0",
    PREFECT_LOGGING_EXTRA_LOGGERS="test1",
    PREFECT_API_URL="http://127.0.0.1:4200/api",
)

from prefect.flows import flow
from prefect.tasks import task

from prefect_dask.task_runners import DaskTaskRunner as runner

# from prefect.task_runners import SequentialTaskRunner as runner
# from prefect_ray.task_runners import RayTaskRunner as runner
from prefect import get_run_logger
import logging


@flow(task_runner=runner())
def testflow():
    testtask()


@task
def testtask():
    log = get_run_logger()
    log.warning("warning hi")
    log.debug("debug hi")

    log = logging.getLogger()
    log.warning("warning std root log")
    log.debug("debug std root log")

    log = logging.getLogger("test1")
    log.warning("warning extra log")
    log.debug("debug extra log")


if __name__ == "__main__":
    testflow()

ahuang11 · June 27, 2022, 5:11pm

Thanks for sharing the example! I created an issue here:

github.com/PrefectHQ/prefect-dask

DaskTaskRunner logs aren't formatted with timestamps

opened 05:10PM - 27 Jun 22 UTC

ahuang11

From https://discourse.prefect.io/t/how-to-use-logging-in-orion-a-tutorial-by-an…drew-brookins/197/4?u=ahuang11 ``` import os import dask os.environ.update( PREFECT_ORION_DATABASE_CONNECTION_TIMEOUT="60.0", PREFECT_LOGGING_EXTRA_LOGGERS="test1", PREFECT_API_URL="http://127.0.0.1:4200/api", ) from prefect.flows import flow from prefect.tasks import task from prefect_dask.task_runners import DaskTaskRunner as runner # from prefect.task_runners import SequentialTaskRunner as runner # from prefect_ray.task_runners import RayTaskRunner as runner from prefect.logging import get_logger import logging @flow(task_runner=runner()) def testflow(): testtask() @task def testtask(): log = get_logger("task_runner.dask") log.warning("warning hi") log.debug("debug hi") log = logging.getLogger() log.warning("warning std root log") log.debug("debug std root log") log = logging.getLogger("test1") log.warning("warning extra log") log.debug("debug extra log") if __name__ == "__main__": testflow() ``` Outputs (highlighted with <HERE>): ``` 10:07:25.570 | INFO | prefect.task_runner.dask - The Dask dashboard is available at http://127.0.0.1:8787/status 10:07:25.595 | WARNING | Flow run 'true-jacamar' - No default storage is configured on the server. Results from this flow run will be stored in a temporary directory in its runtime environment. 10:07:25.656 | INFO | Flow run 'true-jacamar' - Created task run 'testtask-e0d484d7-0' for task 'testtask' <HERE> warning hi warning std root log warning extra log <HERE> 10:07:29.678 | INFO | Flow run 'true-jacamar' - Finished in state Completed('All states completed.') ```

ahuang11 · June 30, 2022, 12:15am

At the moment, it seems that there’s a bug(?) with dask that only allows workers’ log configurations to be configured only through a ~/.config/dask/distributed.yaml file.

github.com/dask/distributed

worker config set by config.set is not read by worker

opened 09:46PM - 10 Jun 20 UTC

samaust

The configuration directly within Python is explained in the documentation here …: [Configuration - Directly within Python](https://docs.dask.org/en/latest/configuration.html#directly-within-python) When using `dask.config.set`, I expect the worker to use those values. Instead, the worker reads the default values and does not use the values set using `dask.config.set`. I modified distributed\worker.py as below to print the values received by the worker. ```python if "memory_spill_fraction" in kwargs: self.memory_spill_fraction = kwargs.pop("memory_spill_fraction") print("self.memory_spill_fraction from kwargs = {}".format(self.memory_spill_fraction)) else: self.memory_spill_fraction = dask.config.get( "distributed.worker.memory.spill" ) print("self.memory_spill_fraction from dask.config = {}".format(self.memory_spill_fraction)) ``` ```python import dask import dask.dataframe as dd from dask.distributed import Client, LocalCluster import pandas as pd cluster = LocalCluster() client = Client(cluster) new = {"distributed.worker.memory.target": 0.1, "distributed.worker.memory.spill": 0.2, "distributed.worker.memory.pause": 0.3} with dask.config.set(new): print(dask.config.get("distributed.worker.memory")) timestamp = pd.date_range('2018-01-01', periods=4, freq='S') col1 = pd.Series(["1", "3", "5", "7"], dtype="string") df = pd.DataFrame({"timestamp": timestamp,"col1": col1}).set_index('timestamp') ddf = dd.from_pandas(df, npartitions=1) ddf.compute() ddf.head(2) ``` Outputs ```python self.memory_spill_fraction from dask.config = 0.7 self.memory_spill_fraction from dask.config = 0.7 self.memory_spill_fraction from dask.config = 0.7 self.memory_spill_fraction from dask.config = 0.7 {'target': 0.1, 'spill': 0.2, 'pause': 0.3, 'terminate': 0.4} ``` Notice the 0.7 value which is the default. Passing the configuration by kwargs works. ```python import dask import dask.dataframe as dd from dask.distributed import Client, LocalCluster import pandas as pd cluster = LocalCluster( memory_target_fraction=0.1, memory_spill_fraction=0.2, memory_pause_fraction=0.3) client = Client(cluster) timestamp = pd.date_range('2018-01-01', periods=4, freq='S') col1 = pd.Series(["1", "3", "5", "7"], dtype="string") df = pd.DataFrame({"timestamp": timestamp,"col1": col1}).set_index('timestamp') ddf = dd.from_pandas(df, npartitions=1) ddf.compute() ddf.head(2) ``` Outputs ```python self.memory_spill_fraction from kwargs = 0.2 self.memory_spill_fraction from kwargs = 0.2 self.memory_spill_fraction from kwargs = 0.2 self.memory_spill_fraction from kwargs = 0.2 ``` **Environment**: - Dask version: 2.18.1 - distributed version : 2.18.0 - Python version: 3.8.3 - Operating System: Windows - Install method : pip

So you can populate that file like this:

logging:
  version: 1
  admin:
    log-format: '%(name)s - %(levelname)s - %(message)s'
  formatters:
    custom:
      format: "DASK ADMIN %(asctime)s __ %(levelname)-7s __ %(name)s __ %(message)s"
      datefmt: "%H:%M:%S"
  handlers:
    console:
      formatter: custom
      class: logging.StreamHandler
      level: INFO
  loggers:
    prefect.task_runs:
      level: INFO
      handlers:
        - console

And this:

import dask
from prefect import flow, task, get_run_logger
from prefect_dask import DaskTaskRunner


@task
def lazy_exponent(args):
    logger = get_run_logger()
    x, y = args
    result = x**y
    # the logging call to keep tabs on the computation
    logger.warning(f"Computed exponent {x}^{y} = {result}")
    return result


@flow(task_runner=DaskTaskRunner())
def test_flow():
    inputs = [[1, 2], [3, 4]]
    results = []
    for i in inputs:
        results.append(lazy_exponent(i))
    return results

test_flow().result()

Should output:

17:18:53.170 | INFO    | prefect.engine - Created flow run 'encouraging-frog' for flow 'test-flow'
17:18:53.171 | INFO    | Flow run 'encouraging-frog' - Using task runner 'DaskTaskRunner'
17:18:53.173 | INFO    | prefect.task_runner.dask - Creating a new Dask cluster with `distributed.deploy.local.LocalCluster`
17:18:53.653 | WARNING | bokeh.server.util - Host wildcard '*' will allow connections originating from multiple (or possibly all) hostnames or IPs. Use non-wildcard values to restrict access explicitly
17:18:55.548 | INFO    | prefect.task_runner.dask - The Dask dashboard is available at http://127.0.0.1:8787/status
17:18:55.946 | INFO    | Flow run 'encouraging-frog' - Created task run 'lazy_exponent-fddbc240-0' for task 'lazy_exponent'
17:18:56.272 | INFO    | Flow run 'encouraging-frog' - Created task run 'lazy_exponent-fddbc240-1' for task 'lazy_exponent'
DASK ADMIN 17:18:58 __ WARNING __ prefect.task_runs __ Computed exponent 1^2 = 1
WARNING:prefect.task_runs:Computed exponent 1^2 = 1
DASK ADMIN 17:18:58 __ WARNING __ prefect.task_runs __ Computed exponent 3^4 = 81
WARNING:prefect.task_runs:Computed exponent 3^4 = 81
DASK ADMIN 17:18:58 __ INFO    __ prefect.task_runs __ Finished in state Completed()
INFO:prefect.task_runs:Finished in state Completed()
DASK ADMIN 17:18:58 __ INFO    __ prefect.task_runs __ Finished in state Completed()
INFO:prefect.task_runs:Finished in state Completed()
17:19:00.407 | INFO    | Flow run 'encouraging-frog' - Finished in state Completed('All states completed.')
[Completed(message=None, type=COMPLETED, result=1, task_run_id=897ebb3b-9c28-4ff5-9eff-10e1d61c2af9),
 Completed(message=None, type=COMPLETED, result=81, task_run_id=42ead4a3-d1e2-4c5c-b822-1e31cc35ad4a)]

simon_mackenzie · July 3, 2022, 10:16pm

This partly works. I read the logging.yml file in and put it under a “logging” key and saved it in the dask config folder. It formats the logs correctly. However the “extra” loggers are at level WARNING even though I set the level as INFO which works in a sequential runner.

I tried DaskTaskRunner(cluster_kwargs=dict(env=dict(PREFECT_LOGGING_EXTRA_LOGGERS=“mylog”))

anna_geller · July 3, 2022, 11:31pm

I believe you would need to configure your extra logger level on the extra logger itself. Afaik the log level set with PREFECT_LOGGING_LEVEL is only for the Prefect logger

simon_mackenzie · July 3, 2022, 11:43pm

Yes I set the extra logger as info. It show as info on sequential runner but not in dask

anna_geller · July 4, 2022, 12:18am

Ahh that must be somehow configured on Dask probably. I don’t know how though. Keep us posted if you find out more

simon_mackenzie · July 4, 2022, 12:32pm

OK. I have now done that using code below. It now handles logs as expected to both console and orion. However I suggest this should be done by Prefect2. Also something similar will be needed for Ray which also does not handle extra loggers.

def setup_dask_logging():
    # read log settings
    with open(os.environ["PREFECT_LOGGING_SETTINGS_PATH"]) as f:
        logset = f.read()
    for x in set(re.findall("\${.*}", logset)):
        logset = logset.replace(x, os.environ.get(x[2:-1], "INFO"))
    logset = yaml.safe_load(logset)

    # set extra loggers
    extra_settings = logset["loggers"]["prefect.extra"].copy()
    extras = [
        x.strip()
        for x in os.environ.get("PREFECT_LOGGING_EXTRA_LOGGERS", "").split(",")
    ]
    for extra in extras:
        logset["loggers"][extra] = extra_settings

    # TODO update rather than overwrite?
    # save and copy to dask location
    HOME = os.path.expanduser("~")
    PREFECTX = os.path.abspath(os.path.dirname(__file__))
    DASK_SRC = f"{PREFECTX}/logging_daskdistributed.yaml"
    DASK_TGT = f"{HOME}/.config/dask/distributed.yaml"
    with open(DASK_SRC, "w") as f:
        f.write(yaml.dump(dict(logging=logset)))
    os.makedirs(os.path.dirname(DASK_TGT), exist_ok=True)
    shutil.copy(DASK_SRC, DASK_TGT)

anna_geller · July 4, 2022, 1:11pm

Wow, this looks complicated and definitely too much boilerplate if you would have to do that in every flow. Let me open an issue

Topic		Replies	Views
How can I add logs to my flow? Archive migration-guide , prefect-1-0 , prefect-2-0 , logging , getting-started	6	2150	September 19, 2022
How to log messages in imported classes Archive prefect-2-0 , logging	5	1190	October 31, 2022
Prefect 1.0 -> Prefect 2.0 Logging migration Archive migration-guide , prefect-1-0 , prefect-2-0 , logging	0	996	August 16, 2022
Add separate logging FileHandlers within Python code, specific to a flow Show and Tell prefect-2	3	1350	December 8, 2022
Can I define the logger globally? Archive migration-guide , prefect-1-0 , prefect-2-0 , logging	0	931	January 31, 2022

How to use logging in Prefect - a tutorial by Andrew Brookins

sequential

dask

ray

Related Topics