# milvus-retrieval **Repository Path**: liusssyang/milvus-retrieval ## Basic Information - **Project Name**: milvus-retrieval - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2024-11-05 - **Last Updated**: 2024-11-05 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Hybrid Search App This document provides an overview of a Streamlit application designed for hybrid search functionality, utilizing the Milvus database for efficient data retrieval. ## Table of Contents 1. [Overview](#overview) 2. [Requirements](#requirements) 3. [Code Explanation](#code-explanation) 4. [Usage](#usage) 5. [Configuration](#configuration) ## Overview The Hybrid Search App allows users to input queries and retrieve relevant search results using a combination of text matching techniques. It leverages the Milvus database for handling and querying high-dimensional vectors. ## Requirements Before running the application, ensure you have the following libraries installed: - Streamlit - Milvus - Any other dependencies specific to your project structure You can install the required libraries using pip: ```bash pip install streamlit pymilvus ``` ## Code Explanation The main components of the code are outlined below: ```python import streamlit as st import os from milvus.milvus_op import MilvusOP from preprocess.preprocess_data import image_saved_dir from time import time ``` 1. **Imports**: The necessary libraries are imported, including Streamlit for the web app, OS for file handling, and Milvus operations for database interactions. ```python config = {'ns': 1.0, 'ts': 1.0, 'e': 1.0, 'td': 0.5} milvus_op = MilvusOP(db_name='state_vector_db', collection_name="hybrid3") ``` 2. **Initialization**: A configuration dictionary is set up for hybrid search parameters, and an instance of `MilvusOP` is created for database operations. ```python st.title("Hybrid Search App") st.sidebar.header("检索配置") ``` 3. **User Interface**: The title and sidebar header of the Streamlit app are defined. ```python query = st.sidebar.text_area("输入查询:", placeholder="在此输入您的查询...", height=100, max_chars=500).strip() query_button = st.sidebar.button("查询", type='primary') ``` 4. **Query Input**: A text area is provided for users to input their search query, along with a button to submit the query. ```python config['ns'] = st.sidebar.slider("NS (文件名BM25匹配)", min_value=0.0, max_value=1.0, value=config['ns'], step=0.01) config['ts'] = st.sidebar.slider("TS (正文BM25匹配)", min_value=0.0, max_value=1.0, value=config['ts'], step=0.01) config['td'] = st.sidebar.slider("TD (正文语义匹配)", min_value=0.0, max_value=1.0, value=config['td'], step=0.01) ``` 5. **Configuration Sliders**: Sliders are used to adjust the weights of various matching techniques, including filename matching, text matching, and semantic matching. ```python config['e'] = 1 if st.sidebar.checkbox("额外字段匹配", value=True) else 0 limit_options = [3, 5, 10, 20, 50] limit = st.sidebar.selectbox("选择结果数量:", limit_options, index=1) ``` 6. **Additional Configurations**: Users can choose to include extra field matching and set a limit for the number of results returned. ```python if query_button or query: stime = time() search_res = milvus_op.hybrid_search([query], limit=limit, config=config) etime = time() ``` 7. **Search Execution**: When the query button is clicked or if there's an input query, the hybrid search is executed, and the duration of the search is calculated. ```python search_duration = etime - stime if search_duration < 0.5: color = 'green' elif search_duration < 1: color = 'yellow' else: color = 'red' st.sidebar.markdown(f"检索耗时: {search_duration:.3f} 秒", unsafe_allow_html=True) ``` 8. **Search Duration Display**: The app displays the duration of the search, color-coded for quick reference. ```python for j, i in enumerate(search_res): st.write('-' * 45) file_name = f"{i['name']}{i['type']}" st.markdown(f"Result {j + 1}: {file_name}", unsafe_allow_html=True) st.write(f"ID: {i['id']}") st.write(f"Score: {i['score']} ({i['remark']})") ``` 9. **Results Display**: The search results are iterated through and displayed, including the file name, ID, and score. ```python if len(i['image_path']) > 0: image_path = os.path.join(image_saved_dir, i['image_path']) st.image(image_path, caption=image_path) else: if '|-|' not in i['text']: st.markdown(i['text'].replace('\n', '
'), unsafe_allow_html=True) else: st.markdown(i['text']) ``` 10. **Image Handling**: If the result contains an image path, the image is displayed. Otherwise, the text is shown, formatted appropriately. ```python else: st.sidebar.warning("请输入查询以进行检索。") ``` 11. **No Query Warning**: If no query is entered, a warning is displayed in the sidebar. ## Usage 1. Run the Streamlit application using the command: ```bash streamlit run app.py ``` 2. Access the app through the provided local URL (typically `http://localhost:8501`). 3. Enter a search query and adjust the configuration sliders as needed. 4. Click the "查询" button to initiate the search and view the results. ## Configuration - **NS**: Weight for filename BM25 matching. - **TS**: Weight for text BM25 matching. - **TD**: Weight for text semantic matching. - **E**: Option for additional field matching (enabled by default). - **Limit**: Select the maximum number of results to return. Feel free to customize the parameters and experiment with different queries to maximize the effectiveness of the hybrid search capabilities!