# misvm **Repository Path**: frontxiang/misvm ## Basic Information - **Project Name**: misvm - **Description**: Multiple-Instance Support Vector Machines - **Primary Language**: Python - **License**: BSD-3-Clause - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2020-06-07 - **Last Updated**: 2020-12-19 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README MISVM: Multiple-Instance Support Vector Machines ================================================ by Gary Doran () Overview -------- MISVM contains a Python implementation of numerous support vector machine (SVM) algorithms for the multiple-instance (MI) learning framework. The implementations were created for use in the following publication: > Doran, Gary and Soumya Ray. **A theoretical and empirical analysis of support > vector machine methods for multiple-instance classification.** _To appear in > Machine Learning Journal._ 2013. Installation ------------ This package can be installed in two ways (the easy way): # If needed: # pip install numpy # pip install scipy # pip install cvxopt pip install -e git+https://github.com/garydoranjr/misvm.git#egg=misvm or by running the setup file manually git clone [the url for misvm] cd misvm python setup.py install Note the code depends on the `numpy`, `scipy`, and `cvxopt` packages. So have those installed first. The build will likely fail if it can't find them. For more information, see: + [NumPy](http://www.numpy.org/): Library for efficient matrix math in Python + [SciPy](http://www.scipy.org/): Library for more MATLAB-like functionality + [CVXOPT](http://cvxopt.org/): Efficient convex (including quadratic program) optimization Contents -------- The MISVM package currently implements the following algorithms: ### SIL Single-Instance Learning (SIL) is a "naive" approach that assigns each instance the label of its bag, creating a supervised learning problem but mislabeling negative instances in positive bags. It works surprisingly well for many problems. > Ray, Soumya, and Mark Craven. **Supervised versus multiple instance learning: > an empirical comparison.** _Proceedings of the 22nd International Conference > on Machine Learning._ 2005. ### MI-SVM and mi-SVM These approaches modify the standard SVM formulation so that the constraints on instance labels correspond to the MI assumption that at least one instance in each bag is positive. For more information, see: > Andrews, Stuart, Ioannis Tsochantaridis, and Thomas Hofmann. **Support vector > machines for multiple-instance learning.** _Advances in Neural Information > Processing Systems._ 2002. ### NSK and STK The normalized set kernel (NSK) and statistics kernel (STK) approaches use kernels to map entire bags into a features, then use the standard SVM formulation to find bag classifiers: > Gärtner, Thomas, Peter A. Flach, Adam Kowalczyk, and Alex J. Smola. > **Multi-instance kernels.** _Proceedings of the 19th International Conference on > Machine Learning._ 2002. ### MissSVM MissSVM uses a semi-supervised learning approach, treating the instances in positive bags as unlabeled data: > Zhou, Zhi-Hua, and Jun-Ming Xu. **On the relation between multi-instance > learning and semi-supervised learning.** _Proceedings of the 24th > International Conference on Machine Learning._ 2007. ### MICA The "multiple-instance classification algorithm" (MICA) represents each bag using a convex combinations of its instances. The optimization program is then solved by iteratively solving a series of linear programs. In our formulation, we use L2 regularization, so we solve alternating linear and quadratic programs. For more information on the original algorithm, see: > Mangasarian, Olvi L., and Edward W. Wild. **Multiple instance classification > via successive linear programming.** _Journal of Optimization Theory and > Applications_ 137.3 (2008): 555-568. ### sMIL, stMIL, and sbMIL This family of approaches intentionally bias SVM formulations to handle the assumption that there are very few positive instances in each positive bag. In the case of sbMIL, prior knowledge on the "sparsity" of positive bags can be specified or found via cross-validation: > Bunescu, Razvan C., and Raymond J. Mooney. **Multiple instance learning for > sparse positive bags.** _Proceedings of the 24th International Conference on > Machine Learning._ 2007. How to Use ---------- The classifier implementations are loosely based on those found in the [scikit-learn](http://scikit-learn.org/stable/) library. First, construct a classifier with the desired parameters: >>> import misvm >>> classifier = misvm.MISVM(kernel='linear', C=1.0, max_iters=50) Use Python's `help` functionality as in `help(misvm.MISVM)` or read the documentation in the code to see which arguments each classifier takes. Then, call the `fit` function with some data: >>> classifier.fit(bags, labels) Here, the `bags` argument is a list of "array-like" (could be NumPy arrays, or a list of lists) objects representing each bag. Each (array-like) bag has m rows and f columns, which correspond to m instances, each with f features. Of course, m can be different across bags, but f must be the same. Then `labels` is an array-like object containing a label corresponding to each bag. *Each label must be either +1 or -1.* You will likely get strange results if you try using 0/1-valued labels. After training the classifier, you can call the `predict` function as: >>> labels = classifier.predict(bags) Here `bags` has the same format as for `fit`, and the function returns an array of real-valued predictions (use `numpy.sign(labels)` to get -1/+1 class predictions). In order to get instance-level predictions from a classifier, use the `instancePrediction` flag, as in: >>> bag_labels, instance_labels = classifier.predict(bags, instancePrediction=True) The `instancePrediction` flag is not available for bag-level classifiers such as the NSK. However, you can always predict the labels of "singleton" bags containing a single instance to assign a label to that instance. In this case, one should use caution in interpreting the label of an instance produced by a bag-level classifier, since these classifiers are designed to make predictions based on properties of an entire bag. An example script is included that trains classifiers on the [musk1 dataset](http://archive.ics.uci.edu/ml/datasets/Musk+(Version+1)); see: > Bache, K. & Lichman, M. (2013). UCI Machine Learning Repository > [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School > of Information and Computer Science. Install the package or add the `misvm` directory to the `PYTHONPATH` environment variable before attempting to run the example using `python example.py` within the `example` directory. Questions and Issues -------------------- If you find any bugs or have any questions about this code, please create an issue on [GitHub](https://github.com/garydoranjr/misvm/issues), or contact Gary Doran at . Of course, I cannot guarantee any support for this software.