This is the official repository for the 2019 Recommender Systems course at Polimi.
Developed by Maurizio Ferrari Dacrema, Postdoc researcher at Politecnico di Milano. See our website for more information on our research group and available thesis. The introductory slides are available here. For Installation instructions see the following section Installation.
A simple wrapper of scikit-optimize allowing for a simple and fast parameter tuning. The BayesianSkoptSearch object will save the following files:
Cython code is already compiled for Linux and Windows x86 (your usual personal computer architecture) and ppc64 (IBM Power PC). To recompile the code just run the cython compilaton script as described in the installation section. The code has beend developed for Linux and Windows.
Note that this repository requires Python 3.6
First we suggest you create an environment for this project using virtualenv (or another tool like conda)
First checkout this repository, then enter in the repository folder and run this commands to create and activate a new environment:
If you are using virtualenv:
virtualenv -p python3 RecSysFramework
source RecSysFramework/bin/activate
If you are using conda:
conda create -n RecSysFramework python=3.6 anaconda
conda activate RecSysFramework
Then install all the requirements and dependencies using the following command.
pip install -r requirements.txt
At this point you have to compile all Cython algorithms. In order to compile you must first have installed: gcc and python3 dev. Under Linux those can be installed with the following commands:
sudo apt install gcc
sudo apt-get install python3-dev
If you are using Windows as operating system, the installation procedure is a bit more complex. You may refer to THIS guide.
Now you can compile all Cython algorithms by running the following command. The script will compile within the current active environment. The code has been developed for Linux and Windows platforms. During the compilation you may see some warnings.
python run_compile_all_cython.py
Contains some basic modules and the base classes for different Recommender types.
The Evaluator class is used to evaluate a recommender object. It computes various metrics:
The evaluator takes as input the URM against which you want to test the recommender, then a list of cutoff values (e.g., 5, 20) and, if necessary, an object to compute diversity. The function evaluateRecommender will take as input only the recommender object you want to evaluate and return both a dictionary in the form {cutoff: results}, where results is {metric: value} and a well-formatted printable string.
from Base.Evaluation.Evaluator import EvaluatorHoldout
evaluator_test = EvaluatorHoldout(URM_test, [5, 20])
results_run_dict, results_run_string = evaluator_test.evaluateRecommender(recommender_instance)
print(results_run_string)
The similarity module allows to compute the item-item or user-user similarity. It is used by calling the Compute_Similarity class and passing which is the desired similarity and the sparse matrix you wish to use.
It is able to compute the following similarities: Cosine, Adjusted Cosine, Jaccard, Tanimoto, Pearson and Euclidean (linear and exponential)
similarity = Compute_Similarity(URM_train, shrink=shrink, topK=topK, normalize=normalize, similarity = "cosine")
W_sparse = similarity.compute_similarity()
All recommenders inherit from BaseRecommender, therefore have the same interface. You must provide the data when instantiating the recommender and then call the fit function to build the corresponding model.
Each recommender has a _compute_item_score function which, given an array of user_id, computes the prediction or score for all items. Further operations like removing seen items and computing the recommendation list of the desired length are done by the recommend function of BaseRecommender
As an example:
user_id = 158
recommender_instance = ItemKNNCFRecommender(URM_train)
recommender_instance.fit(topK=150)
recommended_items = recommender_instance.recommend(user_id, cutoff = 20, remove_seen_flag=True)
recommender_instance = SLIM_ElasticNet(URM_train)
recommender_instance.fit(topK=150, l1_ratio=0.1, alpha = 1.0)
recommended_items = recommender_instance.recommend(user_id, cutoff = 20, remove_seen_flag=True)
DataReader objects read the dataset from its original file and save it as a sparse matrix.
DataSplitter objects take as input a DataReader and split the corresponding dataset in the chosen way. At each step the data is automatically saved in a folder, though it is possible to prevent this by setting save_folder_path = False when calling load_data. If a DataReader or DataSplitter is called for a dataset which was already processed, the saved data is loaded.
DataPostprocessing can also be applied between the dataReader and the dataSplitter and nested in one another.
When you have bilt the desired combination of dataset/preprocessing/split, get the data calling load_data.
dataset = Movielens1MReader()
dataset = DataPostprocessing_K_Cores(dataset, k_cores_value=25)
dataset = DataPostprocessing_User_sample(dataset, user_quota=0.3)
dataset = DataPostprocessing_Implicit_URM(dataset)
dataSplitter = DataSplitter_leave_k_out(dataset)
dataSplitter.load_data()
URM_train, URM_validation, URM_test = dataSplitter.get_holdout_split()