# data_mine **Repository Path**: sliver-king/data_mine ## Basic Information - **Project Name**: data_mine - **Description**: Apriori and fp-growth implement of python - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 1 - **Forks**: 0 - **Created**: 2021-01-14 - **Last Updated**: 2021-09-30 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # data_mine 求star!求star!求star! ## introduce In this repository implemente 6 class of Association rule data mining algorithm 1.Apriori (apriori.py) >apriori algorithm 2.Apriori_compress(apriori_compress.py) >transaction compression processing for apriori algorithm 3.Apriori_hash(apriori_hash.py) >hash method for apriori algorithm 4.Apriori_plus(apriori_plus.py) >transaction compress + dataset compress+hash + apriori 5.Fp_growth(fp_growth.py) >fp-growth algorithm 6.Fp_growth_plus(fp_growth_plus.py) > dataset compress + fp_growth - running progress ![](imgs/run_progress.png) - the result of association rule data mining ![](imgs/result.png) ## how to use - download the repository ``` git clone https://github.com/blackAndrechen/data_mine ``` - into this folder ``` cd data_mine ``` - write your own code,take apriori algorithm for example ``` from apriori import * data=[[l1,l2,l3,l4], [l1,l3,l5], [l1,l3,l4]] min_support=2 min_confident=0.6 apr=Apriori() rule_list=apr.generate_R(data,min_support,min_confident) ``` ## tips - if you want use others algorithm,the use method is same,for example ``` from fp_growth import * fp=Fp_growth() rule_list=fp.generate_R(data,min_support,min_confident) ``` - in my code ,i use `groceries.csv`and`药方.xls`data file,you can try running it ``` filename="groceries.csv" min_support=25 min_conf=0.7 # filename="药方.xls" # min_support=600 # min_conf=0.9 import os current_path=os.getcwd() path=current_path+"/dataset/"+filename #path='/home/czpchen/文档/github/data_mine/dataset/groceries.csv' data=load_data(path) apr=Apriori() rule_list=apr.generate_R(data,min_support,min_conf) ``` - if you want use youself dataset,suggest you rewrite a function to read youself dataset,And make sure your data set looks like this. ``` data=[[l1,l2,l3,l4], [l1,l3,l5], [l1,l3,l4]] ``` - if you want save the result of Association rule data ``` save_path=save_path=current_path+"/log/"+filename.split(".")[0]+"_apriori.txt" #save_path='/home/czpchen/文档/github/data_mine/log/groceries_apriori.txt' save_rule(rule_list,save_path) ``` ## Performance analysis simple analyse of my dataset ![](imgs/base_4.png) ![](imgs/improve_2.png) ## Reference [数据挖掘 第三版](https://book.douban.com/subject/11542972/)