# mislabeled **Repository Path**: mirrors_Orange-OpenSource/mislabeled ## Basic Information - **Project Name**: mislabeled - **Description**: No description available - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2024-08-21 - **Last Updated**: 2026-02-14 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # mislabeled > Model-probing mislabeled examples detection in machine learning datasets A `ModelProbingDetector` assigns `trust_scores` to training examples $(x, y)$ from a dataset by `probing` an `Ensemble` of machine learning `model`. ## Install ```console pip install git+https://github.com/orange-opensource/mislabeled ``` ## Find suspicious digits in MNIST ### 1. Train a MLP on MNIST ```python X, y = fetch_openml("mnist_784", return_X_y=True, as_frame=False) y = LabelEncoder().fit_transform(y) mlp = make_pipeline(MinMaxScaler(), MLPClassifier()) mlp.fit(X, y) ``` ### 2. Compute Representer values of the MLP ```python probe = Representer() representer_values = probe(mlp, X, y) ``` ### 3. Inspect your training data ```python supicious = np.argsort(-representer_values)[0:top_k] for i in suspicious: plt.imshow(X[i].reshape(28, 28)) ``` ### 4. Wanna get the variance of the Representer values during training ? ```python detector = ModelProbingDetector(mlp, Representer(), ProgressiveEnsemble(), "var") var_representer_values = detector.trust_scores(X, y) ``` ## Predefined detectors | Detector | Paper | Code (`from mislabeled.detect.detectors`) | | - | - | - | | Area Under the Margin (AUM) | [NeurIPS 2020](https://proceedings.neurips.cc/paper/2020/file/c6102b3727b2a7d8b1bb6981147081ef-Paper.pdf) | `import AreaUnderMargin` | | Influence | [Paper 1974](https://www.tandfonline.com/doi/abs/10.1080/01621459.1974.10482962) | `import SelfInfluenceDetector` | | Cook's Distance | [Paper 1977](https://www.jstor.org/stable/1268249) | `import CookDistanceDetector` | | Approximate Leave-One-Out | [Paper 1981](https://www.jstor.org/stable/2240841) | `import ApproximateLOODetector` | | Representer | [Paper 1972](https://www.jstor.org/stable/2240067) | `import RepresenterDetector` | | TracIn | [NeurIPS 2020](https://proceedings.neurips.cc/paper_files/paper/2020/file/e6385d39ec9394f2f3a354d9d2b88eec-Paper.pdf) | `import TracIn` | | Forget Scores | [ICLR 2019](https://openreview.net/pdf?id=BJlxm30cKm) | `import ForgetScores` | | VoG | [CVPR 2022](https://openaccess.thecvf.com/content/CVPR2022/papers/Agarwal_Estimating_Example_Difficulty_Using_Variance_of_Gradients_CVPR_2022_paper.pdf) | `import FiniteDiffVoG, FiniteDiffVoLG, VoLG`| | Small Loss | [ICML 2018](https://proceedings.mlr.press/v80/jiang18c/jiang18c.pdf) | `import SmallLoss`| | CleanLab | [JAIR 2021](https://www.jair.org/index.php/jair/article/view/12125/26676) | `import ConfidentLearning` | | Consensus (C-Scores) | [Applied Intelligence 2011](https://link.springer.com/article/10.1007/s10489-010-0225-4) | `import ConsensusConsistency`| | AGRA | [ECML 2023](https://dl.acm.org/doi/10.1007/978-3-031-43412-9_14) | `import AGRA` | and other limitless combinations by using `ModelProbingDetector` with any `probe` and `Ensembles` from the library. Most of these detectors work for both regression and classification diagnostics. ## Tutorials For more details and examples, check the [notebooks](https://github.com/orange-opensource/mislabeled/tree/master/examples) ! ## Paper If you use this library in a research project, please consider citing the corresponding paper with the following bibtex entry: @article{george2024mislabeled, title={Mislabeled examples detection viewed as probing machine learning models: concepts, survey and extensive benchmark}, author={Thomas George and Pierre Nodet and Alexis Bondu and Vincent Lemaire}, journal={Transactions on Machine Learning Research}, issn={2835-8856}, year={2024}, url={https://openreview.net/forum?id=3YlOr7BHkx}, note={} } ## Development Formatting and linting is done with ruff as a [pre-commit](https://pre-commit.com/): - install: ```pre-commit install```, - format and lint: ```pre-commit run --all-files``` (automatically done before a commit). Run tests with [uv](https://docs.astral.sh/uv/getting-started/installation/#standalone-installer): ```uv run pytest```.