All New: Evaluations for RAG & Chain applications

LabelStudio + Galileo: Fix your ML data quality 10x faster

Label Studio and Galileo
Vikram Chatterji
Vikram ChatterjiCo-founder
2 min readMarch 26 2023

With machine learning increasingly getting weaved into everyday applications, data science teams have had to fret the details to ensure high performing models like never before.


The ‘details’ mostly involve ensuring high quality training data – the inputs into the model, so that the model outputs can be high quality. 80% of the data scientist’s work involves painstaking and time consuming data related tasks – data selection, data labeling, labeled data evaluation, data inspection during training and more.

[source: past, present and future of ML Data]

To help speed up the ML workflow, it becomes critical to choose the right data-centric tools.

This comes down to 2 things:

  1. Data labeling and
  2. Data quality inspection


For data labeling, Heartex (creator of Label Studio) is the only end-to-end solution for managing internal data labeling projects. Label Studio is a wildly popular and loved open source labeling platform that in-house subject matter experts, labelers and data science teams use to label their text, images, videos, audio data and more, very quickly.


For data inspection, Galileo automates this by identifying low quality data across the ML workflow. This leads to models moving to production 10x faster, reducing labeling costs by over 40% and improving model performance by over 20% – getting visibility and control over your data makes all the difference.

Using Label Studio and Galileo together can therefore have a step function change in terms of building ML models considerably faster, better and cheaper than ever before.

  1. Quickly Identifying and fixing data and label errors

One important type of error commonly noticed in ML Datasets are annotation errors/mistakes or "mislabels". Galileo offers seamless integration with Label Studio to empower users to automatically detect annotation mistakes in their data, and send them directly to Label Studio with a single button click.


2. Monitoring and identifying the right production data to train with next

Galileo also uses advanced algorithms to detect semantically new clusters of ML data that the model is receiving from production, aka "drifted data", and offers a "Send to Labeler" feature wherein, in a single button click, a user can identify unlabeled data belonging to a new topic/cluster, and create a label studio project from within the platform and send the data over to Label Studio, along with instructions to the labeler.

Benefits

  1. Accelerate Labeling Workflows: utilize the single click integration to generate correctly labeled, clean ML datasets in hours rather than weeks.
  2. Automate Quality Checks on your human annotations by uploading your labeled dataset into Galileo Auto.
  3. Continuously Improve Model Performance: debug models, identify and fix labeling errors in one workflow to increase model accuracy.


Get Started today

Galileo: reach out to us here and we will be happy to walk you through a demo! Feel free to explore the Galileo documentation and how-to videos here.

Heartex: get started for free here


Working with Natural Language Processing?

Read about Galileo’s NLP Studio

Natural Language Processing

Natural Language Processing

Learn more