# probabilistic-pca

**Repository Path**: LiangShuo1997/probabilistic-pca

## Basic Information

- **Project Name**: probabilistic-pca
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2020-12-03
- **Last Updated**: 2024-12-05

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# Probabilistic PCA in Python

This repository contains a simple implementation of probabilistic PCA as
introduced in [1].

    [1] Michael E. Tipping and Christopher M. Bishop.
        Probabilistic Principal Component Analysis
        Journal of the Royal Statistical Society. Series B (Statistical Methodology)
        Vol. 61, No. 3 (1999), pp. 611-622.

Also consider citing this master thesis for which this version of probabilistic
PCA was implemented:

    @misc{Stutz2017,
        author = {David Stutz},
        title = {Learning Shape Completion from Bounding Boxes with CAD Shape Priors},
        month = {September},
        year = {2017},
        institution = {RWTH Aachen University},
        address = {Aachen, Germany},
        howpublished = {http://davidstutz.de/},
    }

For theoretical background, consider reading [1], or see the discussion in
Section B.1 of the [master thesis](http://davidstutz.de/projects/shape-completion/#).

## Requirements

Python packages:

* NumPy
* SciPy (specifically `scipy.sparse.linalg.svds`)
* HDF5, i.e. h5py

For visualization:

* Matplotlib

## Usage

To compute a probabilistic PCA, use `ppca_train.py`:

    usage: ppca_train.py [-h] [--input INPUT] [--code CODE]
                         [--approximate_k APPROXIMATE_K] [--mean_file MEAN_FILE]
                         [--V_file V_FILE] [--var_file VAR_FILE]
    
    optional arguments:
      -h, --help            show this help message and exit
      --input INPUT         path input HDF5 file
      --code CODE           size of latent space
      --approximate_k APPROXIMATE_K
                            approximate the variance using approximate_k singular
                            values
      --mean_file MEAN_FILE
                            path to HDF5 mean file
      --V_file V_FILE       path to HDF5 matrix file
      --var_file VAR_FILE   path to HDF5 variance file

The main parameter is the input, which has to be a HDF5 file where the first
dimension is the number of samples, the remaining dimensions do not matter
as they are reshaped. Then, `--code` determines the number of principal
components to use.

As probabilistic PCA requires to compute the variance (see [1]) for which
_all_ eigenvalues are required, the computation can become infeasible for high
dimensionality. Therefore, the variance can be approximated using the first `k`
eigenvalues instead which can be set using `--approximate_k` and should be significantly
larger than `--code` but can also be smaller than the total dimensionality.

The output is stored separately in `--mean_file`, `V_file` and `var_file` --
all HDF5 files.

Using `ppca_test.py`, the computed probabilistic PCA can be tested; for example
on a test or validation set:

    usage: ppca_test.py [-h] [--input INPUT] [--mean_file MEAN_FILE]
                        [--V_file V_FILE] [--var_file VAR_FILE] [--output OUTPUT]
    
    optional arguments:
      -h, --help            show this help message and exit
      --input INPUT         path input HDF5 file
      --mean_file MEAN_FILE
                            path to HDF5 mean file
      --V_file V_FILE       path to HDF5 matrix file
      --var_file VAR_FILE   path to HDF5 variance file
      --output OUTPUT       path to output HDF5 file

Here, the input is a HDF5 file containing the test/validation data.

## Example

As example, we provide a simple dataset of rotated and slightly translated
binary rectangles in `32 x 32` resolution. Probabilistic PCA can be applied
as follows:

    python ppca_train.py --input=/BS/dstutz/work/data/2d/outputs_training_prior_moderate.h5 --code=10

In order to test the decomposition:

    python ppca_test.py --input=/BS/dstutz/work/data/2d/outputs_validation_moderate.h5 --output=predictions.h5

The results can be viewed using:

    python view_hdf5.py --predictions=predictions.h5 --target=/BS/dstutz/work/data/2d/outputs_validation_moderate.h5

## License

License for source code corresponding to:

D. Stutz. **Learning Shape Completion from Bounding Boxes with CAD Shape Priors.** Master Thesis, RWTH Aachen University, 2017.

Copyright (c) 2018 David Stutz, Max-Planck-Gesellschaft

**Please read carefully the following terms and conditions and any accompanying documentation before you download and/or use this software and associated documentation files (the "Software").**

The authors hereby grant you a non-exclusive, non-transferable, free of charge right to copy, modify, merge, publish, distribute, and sublicense the Software for the sole purpose of performing non-commercial scientific research, non-commercial education, or non-commercial artistic projects.

Any other use, in particular any use for commercial purposes, is prohibited. This includes, without limitation, incorporation in a commercial product, use in a commercial service, or production of other artefacts for commercial purposes.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

You understand and agree that the authors are under no obligation to provide either maintenance services, update services, notices of latent defects, or corrections of defects with regard to the Software. The authors nevertheless reserve the right to update, modify, or discontinue the Software at any time.

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. You agree to cite the corresponding papers (see above) in documents and papers that report on research using the Software.