# dami

**Repository Path**: lgnlgn/dami

## Basic Information

- **Project Name**: dami
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2021-04-17
- **Last Updated**: 2021-04-17

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

**dami**
=============

Scalable algorithms in **da**ta **mi**ning. (***I am shifting this project to feluca and will refactor it. so this project is deprecating***)

dami is writen in Java. Our goal is to make algorithms that can handle hundreds of millions of data with a limited memory PC 


Currently we have : 

- **utility**: Buffered vectors pool for dataset IO, High performance and simple text parser. (*More tests need*)

- **classification**: 
SGD for logistic regressions
	
- **recommendation**:
    SlopeOne, SVD, RSVD, itemneighborhood-SVD
    (see movielens_converter.py)
	
- **significant test**:
 swap randomization

- **graph**:
    Pagerank.

Future:


- **similarity**:
    simhash 


---------
>*2012/10/22 Release Notes:*

> - L1 & L2 logistic regression

> - memory cost estimation

> - simple commandline integration for LR

>*2012/7/22 Release Notes:*

> - Asynchronous vector buffer for dataset IO 

> - High performance and simple text parser(only for digital related chars)

> - small refactoring.

>*2012/7/12 Release Notes:*

> - code refactoring for recommendation and IO

> - To run RMSE for recommendation, you first need to see *`movielens_convert.py`* for converting and/or splitting movielens data, and see *`CFDataConverter`* and *`TestSVD`*

----------
To achieve computation efficiency and memory utilization, two ways we have just adopted.
 
*1: Using "id" as index of array for fetching data.*

*2: Only maintaining model in memory and saving data to converted bytes for IO*

So it's highly recommemded you use continuous ids for the algorithms :)

My Chinese blog : [http://blog.csdn.net/lgnlgn](http://blog.csdn.net/lgnlgn)      
E-mail : gnliang10 [at] 126.com