# milvus-retrieval
**Repository Path**: liusssyang/milvus-retrieval
## Basic Information
- **Project Name**: milvus-retrieval
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2024-11-05
- **Last Updated**: 2024-11-05
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# Hybrid Search App
This document provides an overview of a Streamlit application designed for hybrid search functionality, utilizing the Milvus database for efficient data retrieval.
## Table of Contents
1. [Overview](#overview)
2. [Requirements](#requirements)
3. [Code Explanation](#code-explanation)
4. [Usage](#usage)
5. [Configuration](#configuration)
## Overview
The Hybrid Search App allows users to input queries and retrieve relevant search results using a combination of text matching techniques. It leverages the Milvus database for handling and querying high-dimensional vectors.
## Requirements
Before running the application, ensure you have the following libraries installed:
- Streamlit
- Milvus
- Any other dependencies specific to your project structure
You can install the required libraries using pip:
```bash
pip install streamlit pymilvus
```
## Code Explanation
The main components of the code are outlined below:
```python
import streamlit as st
import os
from milvus.milvus_op import MilvusOP
from preprocess.preprocess_data import image_saved_dir
from time import time
```
1. **Imports**: The necessary libraries are imported, including Streamlit for the web app, OS for file handling, and Milvus operations for database interactions.
```python
config = {'ns': 1.0, 'ts': 1.0, 'e': 1.0, 'td': 0.5}
milvus_op = MilvusOP(db_name='state_vector_db', collection_name="hybrid3")
```
2. **Initialization**: A configuration dictionary is set up for hybrid search parameters, and an instance of `MilvusOP` is created for database operations.
```python
st.title("Hybrid Search App")
st.sidebar.header("检索配置")
```
3. **User Interface**: The title and sidebar header of the Streamlit app are defined.
```python
query = st.sidebar.text_area("输入查询:", placeholder="在此输入您的查询...", height=100, max_chars=500).strip()
query_button = st.sidebar.button("查询", type='primary')
```
4. **Query Input**: A text area is provided for users to input their search query, along with a button to submit the query.
```python
config['ns'] = st.sidebar.slider("NS (文件名BM25匹配)", min_value=0.0, max_value=1.0, value=config['ns'], step=0.01)
config['ts'] = st.sidebar.slider("TS (正文BM25匹配)", min_value=0.0, max_value=1.0, value=config['ts'], step=0.01)
config['td'] = st.sidebar.slider("TD (正文语义匹配)", min_value=0.0, max_value=1.0, value=config['td'], step=0.01)
```
5. **Configuration Sliders**: Sliders are used to adjust the weights of various matching techniques, including filename matching, text matching, and semantic matching.
```python
config['e'] = 1 if st.sidebar.checkbox("额外字段匹配", value=True) else 0
limit_options = [3, 5, 10, 20, 50]
limit = st.sidebar.selectbox("选择结果数量:", limit_options, index=1)
```
6. **Additional Configurations**: Users can choose to include extra field matching and set a limit for the number of results returned.
```python
if query_button or query:
stime = time()
search_res = milvus_op.hybrid_search([query], limit=limit, config=config)
etime = time()
```
7. **Search Execution**: When the query button is clicked or if there's an input query, the hybrid search is executed, and the duration of the search is calculated.
```python
search_duration = etime - stime
if search_duration < 0.5:
color = 'green'
elif search_duration < 1:
color = 'yellow'
else:
color = 'red'
st.sidebar.markdown(f"检索耗时: {search_duration:.3f} 秒", unsafe_allow_html=True)
```
8. **Search Duration Display**: The app displays the duration of the search, color-coded for quick reference.
```python
for j, i in enumerate(search_res):
st.write('-' * 45)
file_name = f"{i['name']}{i['type']}"
st.markdown(f"Result {j + 1}: {file_name}", unsafe_allow_html=True)
st.write(f"ID: {i['id']}")
st.write(f"Score: {i['score']} ({i['remark']})")
```
9. **Results Display**: The search results are iterated through and displayed, including the file name, ID, and score.
```python
if len(i['image_path']) > 0:
image_path = os.path.join(image_saved_dir, i['image_path'])
st.image(image_path, caption=image_path)
else:
if '|-|' not in i['text']:
st.markdown(i['text'].replace('\n', '
'), unsafe_allow_html=True)
else:
st.markdown(i['text'])
```
10. **Image Handling**: If the result contains an image path, the image is displayed. Otherwise, the text is shown, formatted appropriately.
```python
else:
st.sidebar.warning("请输入查询以进行检索。")
```
11. **No Query Warning**: If no query is entered, a warning is displayed in the sidebar.
## Usage
1. Run the Streamlit application using the command:
```bash
streamlit run app.py
```
2. Access the app through the provided local URL (typically `http://localhost:8501`).
3. Enter a search query and adjust the configuration sliders as needed.
4. Click the "查询" button to initiate the search and view the results.
## Configuration
- **NS**: Weight for filename BM25 matching.
- **TS**: Weight for text BM25 matching.
- **TD**: Weight for text semantic matching.
- **E**: Option for additional field matching (enabled by default).
- **Limit**: Select the maximum number of results to return.
Feel free to customize the parameters and experiment with different queries to maximize the effectiveness of the hybrid search capabilities!