{"id":69241,"date":"2025-01-16T10:05:11","date_gmt":"2025-01-16T04:35:11","guid":{"rendered":"https:\/\/www.tothenew.com\/blog\/?p=69241"},"modified":"2025-01-16T11:27:22","modified_gmt":"2025-01-16T05:57:22","slug":"how-to-develop-a-rag-based-search-system-and-connect-it-to-drupal-cms","status":"publish","type":"post","link":"https:\/\/www.tothenew.com\/blog\/how-to-develop-a-rag-based-search-system-and-connect-it-to-drupal-cms\/","title":{"rendered":"How to Develop a RAG-Based Search System and Connect It to Drupal CMS"},"content":{"rendered":"<h2>Introduction<\/h2>\n<p>In today\u2019s information-driven world, delivering precise and contextually relevant search results is a critical challenge. This is where <strong>RAG (Retrieval-Augmented Generation)<\/strong> systems shine, combining the power of retrieval-based approaches with advanced generation models to enhance search accuracy and relevance.<\/p>\n<p>In this blog, we\u2019ll explore how to build a robust RAG-based search system using a <strong>Vector Database<\/strong>\u2014a cutting-edge technology designed for handling high-dimensional data like embeddings from machine learning models. We\u2019ll also demonstrate how to integrate this advanced search system seamlessly with <strong>Drupal CMS<\/strong>, a popular content management platform, ensuring a smooth user experience for both developers and end-users.<\/p>\n<p>By the end of this guide, you\u2019ll gain insights into:<\/p>\n<ul style=\"list-style-type: square;\">\n<li>The fundamentals of RAG-based search systems and their benefits.<\/li>\n<li>How vector databases work and why they are ideal for RAG.<\/li>\n<li>Step-by-step integration of a RAG system with Drupal CMS to optimize content discovery.<\/li>\n<\/ul>\n<h2>What is a RAG System?<\/h2>\n<p>A <em>Retrieval-Augmented Generation<\/em> (RAG) system uses two main components:<\/p>\n<ol>\n<li>Retrieval Component: Fetches relevant documents or data from a dataset based on a query.<\/li>\n<li>Generation Component: Uses the retrieved documents to generate a response that is contextually accurate and relevant.<\/li>\n<\/ol>\n<h2>Why Use Qdrant?<\/h2>\n<p>Qdrant is an open-source vector database designed for real-time search and scalable applications. It\u2019s optimized for handling large-scale data with high performance and offers features like:<\/p>\n<ul style=\"list-style-type: square;\">\n<li>High-speed vector similarity search<\/li>\n<li>Scalability and distribution<\/li>\n<li>Advanced filtering and querying capabilities<\/li>\n<\/ul>\n<h2>Integrating Qdrant with a RAG System:<\/h2>\n<h3>Step 1: Setting Up Qdrant<\/h3>\n<ul>\n<li><strong>Installation<\/strong>: Follow the <a href=\"https:\/\/qdrant.tech\/documentation\/quickstart\/\">Qdrant installation<\/a> guide to set up Qdrant on your server.<\/li>\n<li><strong>Configuration<\/strong>: Configure Qdrant based on your requirements, such as the number of shards and replicas for distributed setups.<\/li>\n<\/ul>\n<h3>Step 2: Prepare Your Data<\/h3>\n<p>To make your RAG system accessible, you can wrap its functionality in a Flask-based REST API. Flask is lightweight, easy to use, and ideal for deploying APIs.<\/p>\n<p><strong>API Structure<\/strong><br \/>\nWe&#8217;ll create two endpoints:<\/p>\n<ul>\n<li><strong>Feed API:<\/strong> Accepts data, generates embeddings, and uploads them to Qdrant.<\/li>\n<li><strong>Ask API:<\/strong> Accepts a user query, retrieves relevant context from Qdrant, and returns an AI-generated response.<\/li>\n<\/ul>\n<p>Here\u2019s how you can implement the Flask API (main.py):<\/p>\n<pre>from fastapi import FastAPI, Body, HTTPException\r\nfrom pydantic import BaseModel, ValidationError\r\nfrom fastapi import FastAPI, Request\r\nimport logging\r\nfrom vector_db_manager import QdrantManager\r\nfrom config import Config\r\nimport numpy as np\r\nfrom knowledge_graph import neo4j\r\nfrom data_chunk_manager import DataChunkManager\r\n\r\nlogging.basicConfig(level=logging.DEBUG)\r\n\r\napp = FastAPI()\r\napp_logger = logging.getLogger(__name__)\r\n\r\n# function to chunk the data\r\ndef add_docs(\r\n        data: str,\r\n):\r\n    chunk_manager = DataChunkManager()\r\n    document = chunk_manager.create_document([data])\r\n    embeddings = chunk_manager.generate_embeddings(document)\r\n    embeddings_array = np.array(embeddings)\r\n    embeddings_shape = embeddings_array.shape\r\n    vector_size = embeddings_shape[1]\r\n    if vector_size &gt; 0 : \r\n        qdrant_manager = QdrantManager(collection_name=Config.collection_name, vector_size=vector_size)\r\n        doc_uuids = qdrant_manager.upload_documents(embeddings, document)\r\n    else :\r\n        app_logger.logger.info(\"Unable to create the embeddings. The embeddings array is empty or malformed.\")\r\n    return doc_uuids\r\n\r\ndef find_docs(query):\r\n    chunk_manager = DataChunkManager()\r\n    query_embeddings = chunk_manager.query_embeddings(query)\r\n    qdrant_manager = QdrantManager()\r\n    query_context = qdrant_manager.search_query(query_embeddings)\r\n    #create prompt\r\n    query_prompts = chunk_manager.create_prompt(query_context,query)\r\n    AI_answer = qdrant_manager.get_ai_response(query_prompts)\r\n    return AI_answer\r\n\r\nclass FeedData(BaseModel):\r\n    data: str\r\nclass SearchData(BaseModel):\r\n    data: str\r\n\r\n@app.post(\"\/ask\")\r\ndef search_startup(SearchData:SearchData):\r\n    query = SearchData.data\r\n    ai_answer = find_docs(query)\r\n    return {\"response\": ai_answer}\r\n\r\n@app.post(\"\/feed\/add\")\r\ndef feed_add(feed_data: FeedData):\r\n    try:\r\n        data = feed_data.data\r\n        ids = add_docs(data=data)\r\n        return {\"response\": \"Document successfully added.\", \"doc_uuids\": ids}\r\n    except ValidationError as e:\r\n        raise HTTPException(status_code=400, detail=\"Invalid input data.\")\r\n\r\nif __name__ == \"__main__\":\r\n    import uvicorn\r\n\r\n    #uvicorn.run(app, host=\"0.0.0.0\", port=8000)\r\n    uvicorn.run(app, host=Config.HOST, port=Config.PORT, reload=Config.DEBUG)\r\n<\/pre>\n<h2>Understanding Chunking and Embedding in RAG<\/h2>\n<p>Chunking and embedding are critical steps in building a Retrieval-Augmented Generation (RAG) system. These processes allow large documents to be broken down into manageable pieces and transformed into vector representations for efficient similarity search.<\/p>\n<h4>What is Chunking?<\/h4>\n<p>Chunking involves breaking down a large piece of text into smaller, more manageable sections (or &#8220;chunks&#8221;). This ensures that:<\/p>\n<ul>\n<li>The data remains contextually relevant.<\/li>\n<li>It avoids exceeding token limits when generating embeddings or querying a model.<\/li>\n<li>It improves retrieval precision by associating queries with specific parts of the document.<\/li>\n<\/ul>\n<h4><strong>How Chunking Works<\/strong><\/h4>\n<h4><span style=\"text-decoration: underline;\">Splitting the Text<\/span><\/h4>\n<ul style=\"list-style-type: circle;\">\n<li>Text is divided into smaller chunks based on a predefined strategy, such as sentence boundaries, paragraph lengths, or token counts.<\/li>\n<li>For instance, a chunk might be limited to 500 tokens to ensure compatibility with models like OpenAI embeddings.<\/li>\n<\/ul>\n<h5><span style=\"text-decoration: underline;\">Preserving Context<\/span><\/h5>\n<ul style=\"list-style-type: circle;\">\n<li>Overlap between chunks is sometimes added to retain context between sections.<\/li>\n<\/ul>\n<p>Here\u2019s how you can implement the chunking and embeddings (data_chunk_manager.py):<\/p>\n<pre>from langchain.text_splitter import RecursiveCharacterTextSplitter\r\nfrom langchain.vectorstores.utils import filter_complex_metadata\r\nfrom sentence_transformers import SentenceTransformer\r\nfrom config import Config\r\nimport openai\r\n\r\nclass DataChunkManager:\r\n\r\n    def __init__(self):\r\n        self.text_splitter = RecursiveCharacterTextSplitter(chunk_size=900, chunk_overlap=150)\r\n\r\n    def generate_chunk(self, document):\r\n        chunks = self.text_splitter.split_documents(document)\r\n        chunks = filter_complex_metadata(chunks)\r\n\r\n        return chunks\r\n\r\n    def create_document(self, text):\r\n        document = self.text_splitter.create_documents(text)\r\n        return document\r\n\r\n    def generate_embeddings(self,texts):\r\n        \r\n        openai.api_key = Config.api_key\r\n        model_name = Config.embedding_model\r\n        texts_content  = [doc.page_content for doc in texts]\r\n        response = openai.Embedding.create(\r\n            model=model_name,\r\n            input=texts_content\r\n        )\r\n\r\n        embeddings = [item['embedding'] for item in response['data']]\r\n        return embeddings\r\n    \r\n    def query_embeddings(self, query):\r\n        \r\n        model_name=Config.embedding_model\r\n        openai.api_key = Config.api_key\r\n\r\n        query_response = openai.Embedding.create(\r\n            model=model_name,\r\n            input=query\r\n        )\r\n        query_embedding = query_response['data'][0]['embedding']\r\n        return query_embedding\r\n   \r\n\r\n<\/pre>\n<h3><strong>Step 3: Upload Data to Qdrant &amp; Query the Database:<\/strong><\/h3>\n<h4>Upload Data to Qdrant<\/h4>\n<p>Use the upload_documents function to:<\/p>\n<ul>\n<li>Create a unique ID for each chunk.<\/li>\n<li>Store the vector embeddings and the associated text in Qdrant.<\/li>\n<\/ul>\n<h4>When a user query is received<\/h4>\n<p>Generate its vector embedding.<\/p>\n<ul>\n<li>Perform a similarity search in Qdrant using search_query.<\/li>\n<li>Retrieve the most relevant text chunks for context.<\/li>\n<\/ul>\n<p>(vector_db_manager.py):<\/p>\n<pre>from langchain_qdrant import QdrantVectorStore\r\nfrom qdrant_client import QdrantClient\r\nfrom qdrant_client.http.models import Distance, VectorParams, PointStruct\r\nimport uuid\r\nimport openai\r\nfrom config import Config\r\nimport json\r\n\r\ncollection_name = Config.collection_name\r\nclient = QdrantClient(url=Config.qdrant_host)\r\n\r\nclass QdrantManager:\r\n    def __init__(self, collection_name=Config.collection_name, vector_size=Config.vector_size_ai_model, url=Config.qdrant_host):\r\n        self.client = QdrantClient(url=url)\r\n        self.collection_name = collection_name\r\n        self.vector_size = vector_size\r\n        self._ensure_collection_exists()\r\n    def _ensure_collection_exists(self):\r\n        try:\r\n            self.client.get_collection(self.collection_name)\r\n            print(f\"Collection '{self.collection_name}' already exists.\")\r\n        except:\r\n            self.client.create_collection(\r\n                collection_name=self.collection_name,\r\n                vectors_config=VectorParams(size=self.vector_size, distance=Distance.COSINE)\r\n            )\r\n            print(f\"Collection '{self.collection_name}' created successfully.\")\r\n\r\n    def upload_documents(self, embeddings, texts):\r\n\r\n        texts_content = [doc.page_content for doc in texts]\r\n        payloads = [{'text': text} for text in texts_content]\r\n        uploaded_ids = []\r\n        for embedding, payload in zip(embeddings, payloads):\r\n            unique_id = str(uuid.uuid4())\r\n            self.client.upsert(\r\n                collection_name=self.collection_name,\r\n                points=[\r\n                    PointStruct(\r\n                        id=unique_id,\r\n                        vector=embedding,\r\n                        payload=payload\r\n                    )\r\n                ]\r\n            )\r\n            uploaded_ids.append(unique_id)\r\n        print(\"Documents and embeddings uploaded successfully.\")\r\n        return json.dumps(uploaded_ids)\r\n\r\n    def search_query(self, query_embedding, limit=3):\r\n        search_results = self.client.search(\r\n            collection_name=self.collection_name,\r\n            query_vector=query_embedding,\r\n            limit=limit\r\n        )\r\n        # Retrieve and concatenate the contexts from the search results\r\n        retrieved_context = \"\\n\\n\".join([result.payload['text'] for result in search_results])\r\n        return retrieved_context\r\n\r\n    def get_ai_response(self,prompt: str, max_tokens: int = 150, temperature: float = 0.2) -&gt; str:\r\n        if Config.api_key:\r\n            openai.api_key = Config.api_key\r\n        else:\r\n            raise ValueError(\"API key must be provided\")\r\n\r\n        model = Config.ai_model\r\n        response = openai.ChatCompletion.create(\r\n            model=model,\r\n            messages=[\r\n                {\"role\": \"user\", \"content\": prompt}\r\n            ],\r\n            max_tokens=max_tokens,\r\n            temperature=temperature,\r\n            top_p=1,\r\n            n=1\r\n        )\r\n\r\n        # Extract the answer from the response\r\n        answer = response['choices'][0]['message']['content'].strip()\r\n        return answer \r\n\r\n<\/pre>\n<h3><strong>Step 4: Test Your RAG Workflow<\/strong><\/h3>\n<ul>\n<li><strong>Add Data:<\/strong> Use the <em>\/feed\/add<\/em> API to upload documents.<\/li>\n<li><strong>Ask Questions: <\/strong>Use the <em>\/ask<\/em> API to query the system and receive AI-generated answers.<\/li>\n<\/ul>\n<h3><strong>Step 5: Integrating RAG API with Drupal<\/strong><\/h3>\n<p>Integrating your Retrieval-Augmented Generation (RAG) system with Drupal allows you to leverage its content management capabilities while enhancing it with intelligent retrieval and response generation. By connecting your Flask API to a Drupal instance, you can enable dynamic querying and content enhancement directly from your RAG system.<\/p>\n<p>Edit or create the rag_integration.module file in your custom module folder<\/p>\n<pre>&lt;?php use Drupal\\Core\\Entity\\EntityInterface; \/** * Implements hook_entity_presave(). *\/ function rag_integration_entity_presave(EntityInterface $entity) { \/\/ Check if the entity is of type 'node' and the content type is 'article'. if ($entity-&gt;getEntityTypeId() === 'node' &amp;&amp; $entity-&gt;bundle() === 'article') {\r\n    \/\/ Extract the body field value.\r\n    $body = $entity-&gt;get('body')-&gt;value;\r\n\r\n    \/\/ Ensure the body is not empty before calling the API.\r\n    if (!empty($body)) {\r\n      \/\/ Call the \/feed\/add API.\r\n      $flask_api_url = 'http:\/\/:\/feed\/add';\r\n      $client = \\Drupal::httpClient();\r\n\r\n      try {\r\n        $response = $client-&gt;post($flask_api_url, [\r\n          'json' =&gt; ['data' =&gt; $body],\r\n        ]);\r\n        $result = json_decode($response-&gt;getBody(), TRUE);\r\n\r\n        \/\/ Optionally, log the response or handle errors.\r\n        if (isset($result['response'])) {\r\n          \\Drupal::logger('rag_integration')-&gt;info('Content added to RAG: @response', ['@response' =&gt; $result['response']]);\r\n        } else {\r\n          \\Drupal::logger('rag_integration')-&gt;error('Failed to add content to RAG.');\r\n        }\r\n      } catch (\\Exception $e) {\r\n        \\Drupal::logger('rag_integration')-&gt;error('Error calling \/feed\/add API: @message', ['@message' =&gt; $e-&gt;getMessage()]);\r\n      }\r\n    }\r\n  }\r\n}\r\n<\/pre>\n<p>Similarly, we can create a page in Drupal and use \/ask API to fetch the response from Qdrant DB,<\/p>\n<h3>RAG Architecture<\/h3>\n<p>RAG (Retrieval-Augmented Generation) integrates information retrieval with language generation, enabling AI to fetch relevant data from external sources and generate accurate, context-rich responses. It combines a retriever model for knowledge fetching and a generator model for response creation.<\/p>\n<div id=\"attachment_69406\" style=\"width: 635px\" class=\"wp-caption aligncenter\"><img aria-describedby=\"caption-attachment-69406\" decoding=\"async\" loading=\"lazy\" class=\"wp-image-69406 size-large\" src=\"https:\/\/www.tothenew.com\/blog\/wp-ttn-blog\/uploads\/2025\/01\/rag_diagram-1024x583.jpg\" alt=\"RAG Architecture\" width=\"625\" height=\"356\" srcset=\"\/blog\/wp-ttn-blog\/uploads\/2025\/01\/rag_diagram-1024x583.jpg 1024w, \/blog\/wp-ttn-blog\/uploads\/2025\/01\/rag_diagram-300x171.jpg 300w, \/blog\/wp-ttn-blog\/uploads\/2025\/01\/rag_diagram-768x437.jpg 768w, \/blog\/wp-ttn-blog\/uploads\/2025\/01\/rag_diagram-1536x875.jpg 1536w, \/blog\/wp-ttn-blog\/uploads\/2025\/01\/rag_diagram-624x355.jpg 624w, \/blog\/wp-ttn-blog\/uploads\/2025\/01\/rag_diagram.jpg 1672w\" sizes=\"(max-width: 625px) 100vw, 625px\" \/><p id=\"caption-attachment-69406\" class=\"wp-caption-text\">RAG Architecture<\/p><\/div>\n<h2>Conclusion<\/h2>\n<p>Integrating a Retrieval-Augmented Generation (RAG) system with Flask APIs and Drupal allows seamless content ingestion, semantic search, and dynamic content augmentation. By leveraging tools like Qdrant, LangChain, and Drupal, we created an efficient system for enhanced data retrieval and personalized user experiences, enabling scalable AI-driven content delivery.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Introduction In today\u2019s information-driven world, delivering precise and contextually relevant search results is a critical challenge. This is where RAG (Retrieval-Augmented Generation) systems shine, combining the power of retrieval-based approaches with advanced generation models to enhance search accuracy and relevance. In this blog, we\u2019ll explore how to build a robust RAG-based search system using a [&hellip;]<\/p>\n","protected":false},"author":1509,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"iawp_total_views":377},"categories":[5871],"tags":[6963,4862,6966,6965,6964,6408],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts\/69241"}],"collection":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/users\/1509"}],"replies":[{"embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/comments?post=69241"}],"version-history":[{"count":17,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts\/69241\/revisions"}],"predecessor-version":[{"id":69422,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts\/69241\/revisions\/69422"}],"wp:attachment":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/media?parent=69241"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/categories?post=69241"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/tags?post=69241"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}