1 Star 0 Fork 3

morre/label-studio-ml-backend

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
克隆/下载
贡献代码
同步代码
取消
提示: 由于 Git 不支持空文件夾,创建文件夹后会生成空的 .keep 文件
Loading...
README
Apache-2.0

What is the Label Studio ML backend?

The Label Studio ML backend is an SDK that lets you wrap your machine learning code and turn it into a web server. The web server can be connected to a running Label Studio instance to automate labeling tasks.

If you just need to load static pre-annotated data into Label Studio, running an ML backend might be overkill for you. Instead, you can import pre-annotated data.

Quickstart

To start using the models, use docker-compose to run the ML backend server.

Use the following command to start serving the ML backend at http://localhost:9090:

git clone https://github.com/HumanSignal/label-studio-ml-backend.git
cd label-studio-ml-backend/label_studio_ml/examples/{MODEL_NAME}
docker-compose up

Replace {MODEL_NAME} with the name of the model you want to use (see below).

Allow the ML backend to access Label Studio data

In most cases, you will need to set LABEL_STUDIO_URL and LABEL_STUDIO_API_KEY environment variables to allow the ML backend access to the media data in Label Studio. Read more in the documentation.

Warning: Currently, ML backends support only Legacy Tokens and do not support Personal Tokens. You will encounter an Unauthorized Error if you use Personal Tokens.

Models

The following models are supported in the repository. Some of them work without any additional setup, and some of them require additional parameters to be set.

Check the Required parameters column to see if you need to set any additional parameters.

  • Pre-annotation column indicates if the model can be used for pre-annotation in Label Studio:
    you can see pre-annotated data when opening the labeling page or after running predictions for a batch of data.
  • Interactive mode column indicates if the model can be used for interactive labeling in Label Studio: see interactive predictions when performing actions on labeling page.
  • Training column indicates if the model can be used for training in Label Studio: update the model state based the submitted annotations.
MODEL_NAME Description Pre-annotation Interactive mode Training Required parameters Arbitrary or Set Labels?
bert_classifier Text classification with Huggingface None Arbitrary
easyocr Automated OCR. EasyOCR None Set (characters)
flair NER by flair None Arbitrary
gliner NER by GLiNER None Arbitrary
grounding_dino Object detection with prompts. Details None Arbitrary
grounding_sam Object Detection with Prompts and SAM 2 None Arbitrary
huggingface_llm LLM inference with Hugging Face None Arbitrary
huggingface_ner NER by Hugging Face None Arbitrary
interactive_substring_matching Simple keywords search None Arbitrary
langchain_search_agent RAG pipeline with Google Search and Langchain OPENAI_API_KEY, GOOGLE_CSE_ID, GOOGLE_API_KEY Arbitrary
llm_interactive Prompt engineering with OpenAI, Azure LLMs. OPENAI_API_KEY Arbitrary
mmdetection Object Detection with OpenMMLab None Arbitrary
nemo_asr Speech ASR by NVIDIA NeMo None Set (vocabulary and characters)
segment_anything_2_image Image segmentation with SAM 2 None Arbitrary
segment_anything_model Image segmentation by Meta None Arbitrary
sklearn_text_classifier Text classification with scikit-learn None Arbitrary
spacy NER by SpaCy None Set (see documentation)
tesseract Interactive OCR. Details None Set (characters)
watsonX LLM inference with WatsonX and integration with WatsonX.data None Arbitrary
yolo All YOLO tasks are supported: YOLO None Arbitrary

(Advanced usage) Develop your model

To start developing your own ML backend, follow the instructions below.

1. Installation

Download and install label-studio-ml from the repository:

git clone https://github.com/HumanSignal/label-studio-ml-backend.git
cd label-studio-ml-backend/
pip install -e .

2. Create empty ML backend:

label-studio-ml create my_ml_backend

You can go to the my_ml_backend directory and modify the code to implement your own inference logic.

The directory structure should look like this:

my_ml_backend/
├── Dockerfile
├── docker-compose.yml
├── model.py
├── _wsgi.py
├── README.md
└── requirements.txt

Dockefile and docker-compose.yml are used to run the ML backend with Docker. model.py is the main file where you can implement your own training and inference logic. _wsgi.py is a helper file that is used to run the ML backend with Docker (you don't need to modify it). README.md is a readme file with instructions on how to run the ML backend. requirements.txt is a file with Python dependencies.

3. Implement prediction logic

In your model directory, locate the model.py file (for example, my_ml_backend/model.py).

The model.py file contains a class declaration inherited from LabelStudioMLBase. This class provides wrappers for the API methods that are used by Label Studio to communicate with the ML backend. You can override the methods to implement your own logic:

def predict(self, tasks, context, **kwargs):
    """Make predictions for the tasks."""
    return predictions

The predict method is used to make predictions for the tasks. It uses the following:

Once you implement the predict method, you can see predictions from the connected ML backend in Label Studio.

4. Implement training logic (optional)

You can also implement the fit method to train your model. The fit method is typically used to train the model on the labeled data, although it can be used for any arbitrary operations that require data persistence (for example, storing labeled data in a database, saving model weights, keeping LLM prompts history, etc).

By default, the fit method is called at any data action in Label Studio, like creating a new task or updating annotations. You can modify this behavior from the project settings under Webhooks.

To implement the fit method, you need to override the fit method in your model.py file:

def fit(self, event, data, **kwargs):
    """Train the model on the labeled data."""
    old_model = self.get('old_model')
    # write your logic to update the model
    self.set('new_model', new_model)

with

  • event: event type can be 'ANNOTATION_CREATED', 'ANNOTATION_UPDATED', etc.
  • data the payload received from the event (check more on Webhook event reference)

Additionally, there are two helper methods that you can use to store and retrieve data from the ML backend:

  • self.set(key, value) - store data in the ML backend
  • self.get(key) - retrieve data from the ML backend

Both methods can be used elsewhere in the ML backend code, for example, in the predict method to get the new model weights.

Other methods and parameters

Other methods and parameters are available within the LabelStudioMLBase class:

  • self.label_config - returns the Label Studio labeling config as XML string.
  • self.parsed_label_config - returns the Label Studio labeling config as JSON.
  • self.model_version - returns the current model version.
  • self.get_local_path(url, task_id) - this helper function is used to download and cache an url that is typically stored in task['data'], and to return the local path to it. The URL can be: LS uploaded file, LS Local Storage, LS Cloud Storage or any other http(s) URL.

Run without Docker

To run without Docker (for example, for debugging purposes), you can use the following command:

label-studio-ml start my_ml_backend

Test your ML backend

Modify the my_ml_backend/test_api.py to ensure that your ML backend works as expected.

Modify the port

To modify the port, use the -p parameter:

label-studio-ml start my_ml_backend -p 9091

Deploy your ML backend to GCP

Before you start:

  1. Install gcloud.
  2. Initialize billing for your account if it's not activated.
  3. Initialize gcloud, enter the following commands and login with your browser:
gcloud auth login
  1. Activate your Cloud Build API.
  2. Find your GCP project ID.
  3. (Optional) Add GCP_REGION with your default region to your ENV variables.

To start deployment:

  1. Create your own ML backend
  2. Start deployment to GCP:
label-studio-ml deploy gcp {ml-backend-local-dir} \
--from={model-python-script} \
--gcp-project-id {gcp-project-id} \
--label-studio-host {https://app.heartex.com} \
--label-studio-api-key {YOUR-LABEL-STUDIO-API-KEY}
  1. After Label Studio deploys the model, you can find the model endpoint in the console.

Troubleshooting

Troubleshooting Docker Build on Windows

If you encounter an error similar to the following when running docker-compose up --build on Windows:

exec /app/start.sh : No such file or directory
exited with code 1

This issue is likely caused by Windows' handling of line endings in text files, which can affect scripts like start.sh. To resolve this issue, follow the steps below:

Step 1: Adjust Git Configuration

Before cloning the repository, ensure your Git is configured to not automatically convert line endings to Windows-style (CRLF) when checking out files. This can be achieved by setting core.autocrlf to false. Open Git Bash or your preferred terminal and execute the following command:

git config --global core.autocrlf false

Step 2: Clone the Repository Again

If you have already cloned the repository before adjusting your Git configuration, you'll need to clone it again to ensure that the line endings are preserved correctly:

  1. Delete the existing local repository. Ensure you have backed up any changes or work in progress.
  2. Clone the repository again. Use the standard Git clone command to clone the repository to your local machine.

Step 3: Build and Run the Docker Containers

Navigate to the appropriate directory within your cloned repository that contains the Dockerfile and docker-compose.yml. Then, proceed with the Docker commands:

  1. Build the Docker containers: Run docker-compose build to build the Docker containers based on the configuration specified in docker-compose.yml.

  2. Start the Docker containers: Once the build process is complete, start the containers using docker-compose up.

Additional Notes

  • This solution specifically addresses issues encountered on Windows due to the automatic conversion of line endings. If you're using another operating system, this solution may not apply.
  • Remember to check your project's .gitattributes file, if it exists, as it can also influence how Git handles line endings in your files.

By following these steps, you should be able to resolve issues related to Docker not recognizing the start.sh script on Windows due to line ending conversions.

Troubleshooting Pip Cache Reset in Docker Images

Sometimes, you want to reset the pip cache to ensure that the latest versions of the dependencies are installed. For example, Label Studio ML Backend library is used as label-studio-ml @ git+https://github.com/HumanSignal/label-studio-ml-backend.git in requirements.txt. Let's assume that it is updated, and you want to jump on the latest version in your docker image with the ML model.

You can rebuild a docker image from scratch with the following command:

docker compose build --no-cache

Troubleshooting Bad Gateway and Service Unavailable errors

You might see these errors if you send multiple concurrent requests.

Note that the provided ML backend examples are offered in development mode, and do not support production-level inference serving.

Troubleshooting the ML backend failing to make simple auto-annotations or unable to see predictions

You must ensure that the ML backend can access your Label Studio data. If it can't, you might encounter the following issues:

  • no such file or directory errors in the server logs.
  • You are unable to see predictions when loading tasks in Label Studio.
  • Your ML backend appears to be connected properly, but cannot seem to complete any auto annotations within tasks.

To remedy this, ensure you have set the LABEL_STUDIO_URL and LABEL_STUDIO_API_KEY environment variables. For more information, see Allow the ML backend to access Label Studio data.

Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 1. Definitions. "License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document. "Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License. "Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity. "You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License. "Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files. "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types. "Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below). "Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof. "Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution." "Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work. 2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form. 3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed. 4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions: (a) You must give any other recipients of the Work or Derivative Works a copy of this License; and (b) You must cause any modified files to carry prominent notices stating that You changed the files; and (c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and (d) If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License. You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License. 5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions. 6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file. 7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License. 8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages. 9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability. END OF TERMS AND CONDITIONS APPENDIX: How to apply the Apache License to your work. To apply the Apache License to your work, attach the following boilerplate notice, with the fields enclosed by brackets "{}" replaced with your own identifying information. (Don't include the brackets!) The text should be enclosed in the appropriate comment syntax for the file format. We also recommend that a file or class name and description of purpose be included on the same "printed page" as the copyright notice for easier identification within third-party archives. Copyright 2019 Heartex, Inc Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

简介

The Personal mirror of https://github.com/HumanSignal/label-studio-ml-backend.git 展开 收起
README
Apache-2.0
取消

发行版

暂无发行版

贡献者

全部

近期动态

不能加载更多了
马建仓 AI 助手
尝试更多
代码解读
代码找茬
代码优化
1
https://gitee.com/morre/label-studio-ml-backend.git
git@gitee.com:morre/label-studio-ml-backend.git
morre
label-studio-ml-backend
label-studio-ml-backend
master

搜索帮助