# kg-gen **Repository Path**: cntony/kg-gen ## Basic Information - **Project Name**: kg-gen - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: MINE - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-02-20 - **Last Updated**: 2025-02-20 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # t2kg: Knowledge Graph Generation from Any Text Welcome! `t2kg` helps you generate knowledge graphs from any source text using AI. It can process both small and large text inputs, and it can also handle messages in a conversation format. Why generate knowledge graphs? `t2kg` is great if you want to: - Create a graph to assist with RAG (Retrieval-Augmented Generation) - Create graph synthetic data for model training and testing - Structure any text into a graph - Analyze the relationships between concepts in your source text We support all model providers supported by [LiteLLM](https://docs.litellm.ai/docs/providers). We also use [DSPy](https://dspy.ai/) for structured output generation. ## Quick start Install the module: ```bash pip install t2kg ``` Then import and use `t2kg`. You can provide your text input in one of two formats: 1. A single string 2. A list of Message objects (each with a role and content) Below are some example snippets: ```python from t2kg import T2KG # Initialize T2KG with optional configuration kg = T2KG( model="openai/gpt-4o", # Default model temperature=0.0, # Default temperature api_key="YOUR_API_KEY" # Optional if set in environment ) # EXAMPLE 1: Single string with context text_input = "Linda is Josh's mother. Ben is Josh's brother. Andrew is Josh's father." graph_1 = kg.generate( input_data=text_input, context="Family relationships" ) # Output: # entities={'Linda', 'Ben', 'Andrew', 'Josh'} # edges={'is brother of', 'is father of', 'is mother of'} # relations={('Ben', 'is brother of', 'Josh'), # ('Andrew', 'is father of', 'Josh'), # ('Linda', 'is mother of', 'Josh')} # EXAMPLE 2: Large text with chunking and clustering with open('large_text.txt', 'r') as f: large_text = f.read() # Example input text: # """ # Neural networks are a type of machine learning model. Deep learning is a subset of machine learning # that uses multiple layers of neural networks. Supervised learning requires training data to learn # patterns. Machine learning is a type of AI technology that enables computers to learn from data. # AI, also known as artificial intelligence, is related to the broader field of artificial intelligence. # Neural nets (NN) are commonly used in ML applications. Machine learning (ML) has revolutionized # many fields of study. # ... # """ graph_2 = kg.generate( input_data=large_text, chunk_size=5000, # Process text in chunks of 5000 chars cluster=True # Cluster similar entities and relations ) # Output: # entities={'neural networks', 'deep learning', 'machine learning', 'AI', 'artificial intelligence', # 'supervised learning', 'unsupervised learning', 'training data', ...} # edges={'is type of', 'requires', 'is subset of', 'uses', 'is related to', ...} # relations={('neural networks', 'is type of', 'machine learning'), # ('deep learning', 'is subset of', 'machine learning'), # ('supervised learning', 'requires', 'training data'), # ('machine learning', 'is type of', 'AI'), # ('AI', 'is related to', 'artificial intelligence'), ...} # entity_clusters={ # 'artificial intelligence': {'AI', 'artificial intelligence'}, # 'machine learning': {'machine learning', 'ML'}, # 'neural networks': {'neural networks', 'neural nets', 'NN'} # ... # } # edge_clusters={ # 'is type of': {'is type of', 'is a type of', 'is a kind of'}, # 'is related to': {'is related to', 'is connected to', 'is associated with' # ...} # } # EXAMPLE 3: Messages array messages = [ {"role": "user", "content": "What is the capital of France?"}, {"role": "assistant", "content": "The capital of France is Paris."} ] graph_3 = kg.generate(input_data=messages) # Output: # entities={'Paris', 'France'} # edges={'has capital'} # relations={('France', 'has capital', 'Paris')} # EXAMPLE 4: Combining multiple graphs text1 = "Linda is Joe's mother. Ben is Joe's brother." # Input text 2: also goes by Joe." text2 = "Andrew is Joseph's father. Judy is Andrew's sister. Joseph also goes by Joe." graph4_a = kg.generate(input_data=text1) graph4_b = kg.generate(input_data=text2) # Combine the graphs combined_graph = kg.aggregate([graph4_a, graph4_b]) # Optionally cluster the combined graph clustered_graph = kg.cluster( combined_graph, context="Family relationships" ) # Output: # entities={'Linda', 'Ben', 'Andrew', 'Joe', 'Joseph', 'Judy'} # edges={'is mother of', 'is father of', 'is brother of', 'is sister of'} # relations={('Linda', 'is mother of', 'Joe'), # ('Ben', 'is brother of', 'Joe'), # ('Andrew', 'is father of', 'Joe'), # ('Judy', 'is sister of', 'Andrew')} # entity_clusters={ # 'Joe': {'Joe', 'Joseph'}, # ... # } # edge_clusters={ ... } ``` ## Features ### Chunking Large Texts For large texts, you can specify a `chunk_size` parameter to process the text in smaller chunks: ```python graph = kg.generate( input_data=large_text, chunk_size=5000 # Process in chunks of 5000 characters ) ``` ### Clustering Similar Entities and Relations You can cluster similar entities and relations either during generation or afterwards: ```python # During generation graph = kg.generate( input_data=text, cluster=True, context="Optional context to guide clustering" ) # Or after generation clustered_graph = kg.cluster( graph, context="Optional context to guide clustering" ) ``` ### Aggregating Multiple Graphs You can combine multiple graphs using the aggregate method: ```python graph1 = kg.generate(input_data=text1) graph2 = kg.generate(input_data=text2) combined_graph = kg.aggregate([graph1, graph2]) ``` ## API Reference ### T2KG Class #### Constructor Parameters - `model`: str = "openai/gpt-4o" - The model to use for generation - `temperature`: float = 0.0 - Temperature for model sampling - `api_key`: Optional[str] = None - API key for model access #### generate() Method Parameters - `input_data`: Union[str, List[Dict]] - Text string or list of message dicts - `model`: Optional[str] - Override the default model - `api_key`: Optional[str] - Override the default API key - `context`: str = "" - Description of data context - `chunk_size`: Optional[int] - Size of text chunks to process - `cluster`: bool = False - Whether to cluster the graph after generation - `temperature`: Optional[float] - Override the default temperature - `output_folder`: Optional[str] - Path to save partial progress #### cluster() Method Parameters - `graph`: Graph - The graph to cluster - `context`: str = "" - Description of data context - `model`: Optional[str] - Override the default model - `temperature`: Optional[float] - Override the default temperature - `api_key`: Optional[str] - Override the default API key #### aggregate() Method Parameters - `graphs`: List[Graph] - List of graphs to combine ## License The MIT License.