# ml-sampleplan

**Repository Path**: mirrors_apple/ml-sampleplan

## Basic Information

- **Project Name**: ml-sampleplan
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2024-08-21
- **Last Updated**: 2026-03-28

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# sampleplan

This software project accompanies the research paper, [On Efficient and Statistical Quality Estimation for Data Annotation](https://arxiv.org/abs/2405.11919). Find a demo of the tool at: https://www.acceptancesampling.com/.

`sampleplan` is a tool to determine sample sizes required for data quality control. It supports sample size calculation for:

- Clopper-Pearson exact confidence intervals with and without mid-P
- Single Acceptance Sampling
- Double Acceptance Sampling
- Sequential Acceptance Sampling

for the binomial (sampling with replacement) and hypergeometric (sampling without replacement) distributions. 

> This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.

## Citation

```
@inproceedings{klie2024onEfficient,
	title        = {On Efficient and Statistical Quality Estimation for Data Annotation},
	author       = {Jan-Christoph Klie and Juan Haladjian and Marc Kirchner and Rahul Nair},
	year         = 2024,
	booktitle    = {The 62nd Annual Meeting of the Association for Computational Linguistics}
}
```

## Installation

This software is not yet published to PyPi. To install, run

```bash
pip install git+https://github.com/apple/ml-sampleplan
```

## Usage

The following section describes how to use this package to compute sample sizes for confidence intervals
and acceptance sampling that give certain statistical guarantees. We implement all for the binomial (sampling with replacement)
and hypergeometric (sampling without replacement) distributions.

The following parameters describe the statistical guarantees one wants from the tests:

- **alpha**: Rejecting a lot that should have been accepted (producer's risk)
- **beta**: Accepting a lot that should have been rejected (consumer's risk)
- **p0/p**: The assumed error rate, the closer it is to .5, the larger the sample size required typically is for binomial and hypergeometric distributions 
- **half_width**: The size of the confidence interval in one direction
- **p_a**: Acceptable ratio of defective items in a lot
- **p_r**: Unacceptable ratio of defective items in a lot

### Confidence Intervals

#### Binomial

```python
from sampleplan.confidence_interval import sample_size_exact_binomial

p0 = 0.01
alpha = 0.05
ci_half_width = 0.01

n_binomial = sample_size_exact_binomial(p0, alpha, ci_half_width)
```

#### Hypergeometric

```python
from sampleplan.confidence_interval import sample_size_exact_hypergeometric

lot_size = 1000
p0 = 0.01
alpha = 0.05
ci_half_width = 0.01

n_binomial = sample_size_exact_hypergeometric(lot_size, p0, alpha, ci_half_width)
```

### Single Sampling

A single sample of size *n* is inspected. If it contains more annotation errors than the critical value *c*, it is 
rejected, otherwise, it is accepted.

#### Binomial

```python
from sampleplan.acceptance_sampling import SingleSamplingPlan

alpha = 0.05
beta = 0.2
p_a = 0.01
p_r = 0.05

plan = SingleSamplingPlan.binomial(p_a, p_r, alpha, beta)
print(f"Single Sampling Plan binomial: n={plan.n}, c={plan.c}")
```

#### Hypergeometric

```python
from sampleplan.acceptance_sampling import SingleSamplingPlan

alpha = 0.05
beta = 0.2
p_a = 0.01
p_r = 0.05
lot_size = 1000

plan = SingleSamplingPlan.hypergeometric(p_a, p_r, alpha, beta, lot_size)
print(f"Single Sampling Plan hypergeometric: n={plan.n}, c={plan.c}")
```

### Double Sampling

Instead of taking a single sample, in double sampling, batches are accepted or rejected based on two (usually smaller) samples.
At first, a sample of size *n_1* is taken and inspected.
If it contains less defects than a lower limit *c_1*, it is accepted, if it contains more defects than an upper limit *c_2*, it is rejected.
If the number of defects it is between both, then a second sample is taken; the batch is rejected if the number of defects in both samples combined is larger than *c_2*. 
The advantage is that in the happy case, only *n_1* samples need to be inspected, thereby saving time and money. 
In order to make the actual computation more tractable, we only use double-stage plans such that *n_1 = n_2* . 
We implement two versions of double sampling, **full* where samples are always completely inspected and **curtailed*, where inspection of the second sample is stopped in case there are more than *c_2* defects found.
It is recommended to always at least look at the first *n_1* samples in order to get an estimate for the error rate, we will follow this textbook advice in this paper.


#### Binomial

```python
from sampleplan.acceptance_sampling import DoubleSamplingPlan

alpha = 0.05
beta = 0.2
p_a = 0.01
p_r = 0.05

plan = DoubleSamplingPlan.binomial(p_a, p_r, alpha, beta)
print(f"Doube Sampling Plan binomial: n1=n2=n={plan.n}, c1={plan.c1}, c2={plan.c2}")
```

#### Hypergeometric

```python
from sampleplan.acceptance_sampling import DoubleSamplingPlan

alpha = 0.05
beta = 0.2
p_a = 0.01
p_r = 0.05
lot_size = 1000

plan = DoubleSamplingPlan.hypergeometric(p_a, p_r, alpha, beta, lot_size)
print(f"Doube Sampling Plan binomial: n1=n2=n={plan.n}, c1={plan.c1}, c2={plan.c2}")
```

### Sequential Sampling

A generalization of double sampling is sequential sampling.
It is based on the sequential probability ratio test by Wald.
In this setting, instances in a batch are inspected one by one and after each step, it is decided whether to continue or stop and accept or reject.
The acceptance and rejection boundaries  are computed at every step from *p_a* and *p_r*, *\alpha* and *\beta* as well as the number of incorrect and total instances inspected so far.
It can happen that the whole batch needs to be inspected, especially if the actual error rate is between *p_a* and *p_r*.
As this is an undesirable outcome, we truncate at the sample size of single sampling and accept or reject based on its critical value.
Note that This is an approximation, computing an optimal curtailment in general is quite difficult.

#### Binomial

```python
from sampleplan.acceptance_sampling import BinomialSequentialSamplingPlan, SingleSamplingPlan

alpha = 0.05
beta = 0.2
p_a = 0.01
p_r = 0.05

p = 0.07

plan = BinomialSequentialSamplingPlan(p_a, p_r, alpha, beta)
asn = plan.average_sample_number(p)

# For curtailing, we stop if we did not make a decision when reaching 3x the Single Sampling Plan sample size
# This is recommended by Montgomery, D. (2005), Introduction to Statistical Quality Control 
# as binomial sequential sampling has no natural stopping point due to sampling with replacement.
ssp = SingleSamplingPlan.binomial(p_a, p_r, alpha, beta)
asn_curtailed = plan.average_sample_with_cutoff(p, ssp.n * 3)
```

#### Hypergeometric

```python
from sampleplan.acceptance_sampling import HypergeometricSequentialSamplingPlan, SingleSamplingPlan

alpha = 0.05
beta = 0.2
p_a = 0.01
p_r = 0.05
lot_size = 1000
defects_in_lot = 4  # For simulation purposes, this is just known post inspection the whole lot

plan = HypergeometricSequentialSamplingPlan(p_a, p_r, alpha, beta, lot_size)
borders = plan.compute_truncated_wald_region()
asn = plan.average_sample_number(defects_in_lot)

# For curtailing, we stop if we did not make a decision when reaching the Single Sampling Plan sample size 
ssp = SingleSamplingPlan.hypergeometric(p_a, p_r, alpha, beta, lot_size)
borders_curtailed = plan.compute_truncated_wald_region(cutoff=ssp.n)

for i in range(borders_curtailed.num_trials):
    print(f"Items inspected: \t{i}\tAccept if num_errors < {borders_curtailed.lower_limits[i]}\tReject if num errors > {borders_curtailed.upper_limits[i]}")
```

## Development

This project uses `poetry` to manage the build process as well as dependencies.

You can format the code via

    make format

which should be run before every commit.

You can run the tests via

    make test