View in #show-us-what-you-got on Slack
@Evgeniya_Sukhodolskaya: Hi, Prefect community!
I am a data evangelist from toloka.ai - a crowdsourcing data labeling platform. I want to share with you our work on integration with Prefect which aims to help Big Data and Machine Learning engineers painlessly create data gathering & cleaning pipelines.
Our engineering team created a toloka-prefect python package to orchestrate crowdsourcing pipelines in Prefect. Now, with this integration and due to Prefect failure management abilities, if you need to solve a task of collecting huge various amounts of data, or validate your existing dataset, you can accomplish it without headache related to loosing control over crowd.
Let me continue in thread:)
P.S. A question on my behalf: are there cases of using Prefect for creating Machine Learning pipelines?
Toloka: Data solutions to drive AI
In Toloka, each labeling pipeline may consist of several projects created by requesters in which tasks of a particular nature are solved with the help of a diverse crowd from all over the world.
Considering the light barrier to entry and since markup of each task is paid by a requester, any failure in the pipeline leads to money loss. Hence, such Prefect semantics as сaching and persisting data became a key to the vast improvement & budget preservation!
We conducted a talk Launching human-in-the-loop process on Toloka using Prefect based on the popular example of a data-labeling task and want to share it with you.
We are super happy to be part of a Prefect community and looking forward to deepening our collaboration:)
If you have any questions or feedback regarding the integration, I will be happy to comment on them in the thread here.
If you want to share your pain&ideas&proposals with our engineering team directly, you’re welcome to join our Toloka Global Community.
YouTube Video: Data-Driven AI meetup: Launching Human-in-the-loop Processes on Toloka using Prefect
@Anna_Geller: Hi @Evgeniya_Sukhodolskaya, welcome to the community, great to have you with us!
Thank you so much for contributing and this excellent notebook explaining how to use this integration with Prefect Cloud!
I will cross-post it on Discourse and I’ll make sure to recommend it to any users asking about data labeling use cases for ML.
To answer your question: Prefect is a general-purpose workflow orchestration platform that supports basically all data-flow automation use cases you can think of, definitely including ML pipelines!
Thanks again for sharing and have a wonderful weekend!
GitHub: toloka-prefect/text_classification.ipynb at main · Toloka/toloka-prefect