# FAQ-semantic-matching

**Repository Path**: cunzhonshi/FAQ-semantic-matching

## Basic Information

- **Project Name**: FAQ-semantic-matching
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2020-10-10
- **Last Updated**: 2020-12-19

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README


## Semantic Matching FAQ Application

### Problem Description

Given 50 frequently asked questions (FAQs) about hummingbirds, the task is to implement a naïve approach and a more sophisticated approach using Natural Language Processing techniques to retrieve the most related FAQs when presented with a user’s question about hummingbirds. For instance, when the user asks “How many years will a hummingbird live?”, the system must attempt to provide a ranked list of the most relevant FAQs. The naïve approach should treat the questions and answers as bags of words and match the word tokens against a user’s question. The other approach must use tokens, lemmas, stems, parts of speech, parse trees, and Wordnet to find matches in a more intelligent way.


### Various modules in solution:

Collection of features
Processing of features
Calculation of Weights
Learning weights
Final score calculation
Ranking of answers

A details description of each module can be found in [final report]( project_report.pdf)

### Programming Tools:
* Python 3

	Python 3 is the primary programming language used for this project.
* Java

	While no Java code was written by the team members, jar libraries from the Stanford Parser were used for dependency parsing.
* NLTK

	Natural Language Toolkit. This is a library for Python that provides a host of natural language processing tools. These include access to wordnet, various corpora, and dependency parsers.
* Stanford Parser

	Stanford’s dependency parser was used to get dependency trees from questions and answers. We used a python library that wrapped the java libraries.
* Wordnet

	The wordnet corpus was used via NLTK. We collected synsets via the Lesk algorithm and used their similarities and definitions and examples. We also used the Brown corpus for information content that was exposed via Wordnet.
* Brown Corpus

	We used the Brown corpus that was provided with Wordnet for the information content when we took the JCN similarity of various synsets.
* Numpy

	Numpy is a Python library that we used for linear algebra operations such as taking the norm of a vector.
* Sklearn

	Sklearn is a Python library that we used for cosine similarity calculations.
* Brown Corpus

	We used the Brown corpus that was provided with Wordnet for the information content when we took the JCN similarity of various synsets.

### Output:
Please refer [final report]( project_report.pdf).