{"id":65053,"date":"2025-03-16T18:14:27","date_gmt":"2025-03-16T12:44:27","guid":{"rendered":"https:\/\/www.tothenew.com\/blog\/?p=65053"},"modified":"2025-05-07T19:19:53","modified_gmt":"2025-05-07T13:49:53","slug":"retrieval-augmented-generation-in-java","status":"publish","type":"post","link":"https:\/\/www.tothenew.com\/blog\/retrieval-augmented-generation-in-java\/","title":{"rendered":"Step-by-Step Guide to Implementing RAG with Spring Boot and PostgreSQL"},"content":{"rendered":"<h2><span style=\"text-decoration: underline;\">Introduction<\/span><\/h2>\n<p>Generative AI (Gen AI) has revolutionized how machines generate text, code, images, and more by leveraging deep learning models. However, one of its key limitations is its reliance on pre-trained knowledge, which may become outdated or lack domain-specific insights. This is where <strong>Retrieval-Augmented Generation(RAG)<\/strong> comes into play. RAG enhances Gen AI models capabilities by integrating real time retrieval mechanisms, allowing them to fetch relevant external knowledge before generating their final response. This significantly improves accuracy, reduces hallucinations, and ensures responses remain contextually relevant.<\/p>\n<div id=\"attachment_71749\" style=\"width: 390px\" class=\"wp-caption aligncenter\"><img aria-describedby=\"caption-attachment-71749\" decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-71749\" src=\"https:\/\/www.tothenew.com\/blog\/wp-ttn-blog\/uploads\/2025\/05\/RAG3.png\" alt=\"RAG Flow - Document Upload and Indexing\" width=\"380\" height=\"281\" srcset=\"\/blog\/wp-ttn-blog\/uploads\/2025\/05\/RAG3.png 380w, \/blog\/wp-ttn-blog\/uploads\/2025\/05\/RAG3-300x222.png 300w\" sizes=\"(max-width: 380px) 100vw, 380px\" \/><p id=\"caption-attachment-71749\" class=\"wp-caption-text\">RAG Flow &#8211; Document Upload and Indexing<\/p><\/div>\n<div id=\"attachment_71750\" style=\"width: 351px\" class=\"wp-caption aligncenter\"><img aria-describedby=\"caption-attachment-71750\" decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-71750\" src=\"https:\/\/www.tothenew.com\/blog\/wp-ttn-blog\/uploads\/2025\/05\/RAG4.png\" alt=\"RAG Flow - Query and Response\" width=\"341\" height=\"389\" srcset=\"\/blog\/wp-ttn-blog\/uploads\/2025\/05\/RAG4.png 341w, \/blog\/wp-ttn-blog\/uploads\/2025\/05\/RAG4-263x300.png 263w\" sizes=\"(max-width: 341px) 100vw, 341px\" \/><p id=\"caption-attachment-71750\" class=\"wp-caption-text\">RAG Flow &#8211; Query and Response<\/p><\/div>\n<p>This blog aims to demonstrate how to integrate RAG with any existing LLM based Chatbot using Spring Boot and PostgreSQL . Specifically, we will:<\/p>\n<ul>\n<li>Implement a RAG pipeline using a Java based Spring Boot application.<\/li>\n<li>Integrate a Vector Database, PostgreSQL, to enhance search efficiency.<\/li>\n<li>Enable Vector Search in PostgreSQL.<\/li>\n<li>Showcase the benefits of using RAG through a comparative approach.<\/li>\n<\/ul>\n<h2><span style=\"text-decoration: underline;\">Key Concepts around RAG<\/span><\/h2>\n<ul>\n<li><strong>Vector Database<br \/>\n<\/strong>To achieve RAG, we need an efficient way to store and retrieve high-dimensional representations of data. This is where Vector Databases come into the picture. A Vector database stores embeddings\u2014numerical representations of text, images, or other data\u2014that facilitate similarity searches. When a query is made, the database retrieves the most relevant embeddings, improving the relevance of the generated content.<\/li>\n<li><strong>Embeddings<br \/>\n<\/strong>Embeddings are dense vector representations of text, images, or videos\u2014arrays of floating-point numbers\u2014that capture semantic meaning and relationships in a lower-dimensional space using pre-trained language models.<\/li>\n<li><strong>Vector Search<br \/>\n<\/strong>A technique used to find similar items by comparing embeddings.<\/li>\n<li><strong>System Prompt<br \/>\n<\/strong>It is the predefined instruction or the prompt provided to the AI Model that guides its behavior and response generation. Below is one of the example we&#8217;ll be utilizing in our application.<\/li>\n<\/ul>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-71868 size-full\" src=\"https:\/\/www.tothenew.com\/blog\/wp-ttn-blog\/uploads\/2025\/03\/systemContext.jpg\" alt=\"\" width=\"1606\" height=\"170\" srcset=\"\/blog\/wp-ttn-blog\/uploads\/2025\/03\/systemContext.jpg 1606w, \/blog\/wp-ttn-blog\/uploads\/2025\/03\/systemContext-300x32.jpg 300w, \/blog\/wp-ttn-blog\/uploads\/2025\/03\/systemContext-1024x108.jpg 1024w, \/blog\/wp-ttn-blog\/uploads\/2025\/03\/systemContext-768x81.jpg 768w, \/blog\/wp-ttn-blog\/uploads\/2025\/03\/systemContext-1536x163.jpg 1536w, \/blog\/wp-ttn-blog\/uploads\/2025\/03\/systemContext-624x66.jpg 624w\" sizes=\"(max-width: 1606px) 100vw, 1606px\" \/><\/p>\n<h2><span style=\"text-decoration: underline;\"><br \/>\nLet&#8217;s look at implementing RAG with Spring Boot and PostgreSQL<\/span><\/h2>\n<p>Suppose we have an LLM based Chatbot integrated to our Campaign Management Tool to help the user for any queries and we ask the Chatbot, <strong>&#8220;How to create Campaigns based on a target segment in our Campaign Management Tool?&#8221;<\/strong>\u2014 while the LLM may provide a generic response about creating Marketing, Social Media, or Political Campaigns in general but would not able to explain about the steps to follow in our Campaign Management Tool to create a Campaign since the LLM Model is not aware about the steps for the same.<\/p>\n<p><strong><em>Initial Postman Response Image (Without RAG) Returning Generic Answer<\/em><\/strong><\/p>\n<div id=\"attachment_71752\" style=\"width: 823px\" class=\"wp-caption aligncenter\"><img aria-describedby=\"caption-attachment-71752\" decoding=\"async\" loading=\"lazy\" class=\" wp-image-71752\" src=\"https:\/\/www.tothenew.com\/blog\/wp-ttn-blog\/uploads\/2025\/05\/campaign3.jpg\" alt=\"Search Without RAG\" width=\"813\" height=\"484\" srcset=\"\/blog\/wp-ttn-blog\/uploads\/2025\/05\/campaign3.jpg 1378w, \/blog\/wp-ttn-blog\/uploads\/2025\/05\/campaign3-300x179.jpg 300w, \/blog\/wp-ttn-blog\/uploads\/2025\/05\/campaign3-1024x611.jpg 1024w, \/blog\/wp-ttn-blog\/uploads\/2025\/05\/campaign3-768x458.jpg 768w, \/blog\/wp-ttn-blog\/uploads\/2025\/05\/campaign3-624x372.jpg 624w\" sizes=\"(max-width: 813px) 100vw, 813px\" \/><p id=\"caption-attachment-71752\" class=\"wp-caption-text\">Search Without RAG<\/p><\/div>\n<p>With the help of RAG, we can pass additional information to the LLM so that it is able to retrieve the correct information and provide relevant knowledge to the user.<\/p>\n<p>We will build a Spring Boot application using RAG to explore its capabilities.<\/p>\n<p><strong>Prerequisites<\/strong><br \/>\n&#8211; Open AI API key<br \/>\n&#8211; Postgres 13+ with the vector extension<\/p>\n<p><strong>Technology Stack<\/strong><br \/>\n&#8211; JDK 21<br \/>\n&#8211; Maven<br \/>\n&#8211; Spring Boot 3.4.5<\/p>\n<p>The application will have two key APIs, one for inserting content embeddings and another for performing semantic search on the stored embeddings, as outlined below:<\/p>\n<ol>\n<li>An <strong>\/insert-pdf API<\/strong> that accepts either a file path or a multipart file in the request body. It reads the Campaign FAQs PDF, generates embeddings for each chunk, and stores them in a PostgreSQL table.<\/li>\n<li>When the <strong>\/query API<\/strong> is invoked, the request body includes two keys: &#8216;<strong>language<\/strong>&#8216; and &#8216;<strong>input<\/strong>&#8216;.<\/li>\n<\/ol>\n<ul>\n<li style=\"list-style-type: none;\">\n<ul>\n<li><strong>language<\/strong> &#8211; This is the provided language by the user, for e.g. English in this case.<\/li>\n<li><strong>Input<\/strong> &#8211; User Prompt provided by the user to the Chatbot<\/li>\n<li>The API converts &#8216;input&#8217; into embeddings, performs a similarity search against the Vector column in the table, and retrieves the top N most relevant results\u2014this is the <strong>Retrieval step<\/strong>. These results update the <strong>{pdf_extract}<\/strong> variable in the system context, forming the <strong>Augmentation step<\/strong>. The {<strong>language<\/strong>} variable is also updated based on the request body (optional, but useful for generating responses in the user&#8217;s preferred language).<\/li>\n<li>With both {<strong>pdf_extract<\/strong>} and {<strong>language<\/strong>} updated in the system context, we then pass &#8216;Input&#8217; as the user context and call ChatClient&#8217;s(provided by Spring AI) call() method to generate a response using any underlying LLM \u2014this is called the <strong>Generation step<\/strong>. This ensures that the output is a precise, contextually relevant response derived from the company&#8217;s internal knowledge base.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p>Now, Let&#8217;s walk through the detailed steps to build your Spring Boot application with RAG integration.<\/p>\n<h3>Step 1 &#8211; Enabling Vector Search in PostgreSQL<\/h3>\n<p>For our implementation, we are using PostgreSQL as our database and will be enabling its Vector Database capabilities by installing pgvector with PostgreSQL<\/p>\n<p><em>You can refer to the following link for the installation guide: <a href=\"https:\/\/github.com\/pgvector\/pgvector\" target=\"_blank\" rel=\"noopener\">pgvector GitHub Repository<\/a><\/em><\/p>\n<p><strong>Add Vector Extension In Postgres Schema<\/strong><\/p>\n<ol>\n<li>Open PgAdmin, right-click on the schema, select &#8220;Create,&#8221; and then choose &#8220;Extension.&#8221;<\/li>\n<\/ol>\n<div id=\"attachment_65639\" style=\"width: 507px\" class=\"wp-caption aligncenter\"><img aria-describedby=\"caption-attachment-65639\" decoding=\"async\" loading=\"lazy\" class=\" wp-image-65639\" src=\"https:\/\/www.tothenew.com\/blog\/wp-ttn-blog\/uploads\/2024\/09\/pgadmin.jpg\" alt=\"System Context\" width=\"497\" height=\"304\" srcset=\"\/blog\/wp-ttn-blog\/uploads\/2024\/09\/pgadmin.jpg 826w, \/blog\/wp-ttn-blog\/uploads\/2024\/09\/pgadmin-300x183.jpg 300w, \/blog\/wp-ttn-blog\/uploads\/2024\/09\/pgadmin-768x470.jpg 768w, \/blog\/wp-ttn-blog\/uploads\/2024\/09\/pgadmin-624x382.jpg 624w\" sizes=\"(max-width: 497px) 100vw, 497px\" \/><p id=\"caption-attachment-65639\" class=\"wp-caption-text\">pgadmin<\/p><\/div>\n<p>2. Search for the &#8216;Vector&#8217; extension and add it. Once added, the &#8216;vector&#8217; extension will be available in your schema, allowing you to use it as a datatype to store embeddings of your data.<\/p>\n<div id=\"attachment_65640\" style=\"width: 400px\" class=\"wp-caption aligncenter\"><img aria-describedby=\"caption-attachment-65640\" decoding=\"async\" loading=\"lazy\" class=\" wp-image-65640\" src=\"https:\/\/www.tothenew.com\/blog\/wp-ttn-blog\/uploads\/2024\/09\/pgadmin-vector.jpg\" alt=\"pgadmin-vector\" width=\"390\" height=\"429\" srcset=\"\/blog\/wp-ttn-blog\/uploads\/2024\/09\/pgadmin-vector.jpg 622w, \/blog\/wp-ttn-blog\/uploads\/2024\/09\/pgadmin-vector-273x300.jpg 273w\" sizes=\"(max-width: 390px) 100vw, 390px\" \/><p id=\"caption-attachment-65640\" class=\"wp-caption-text\">pgadmin-vector<\/p><\/div>\n<h3><strong>Step 2 &#8211; Create Vector Embeddings For Our Custom Data<\/strong><\/h3>\n<p>Consider that our steps to create a Campaign is available in the help guide. So we will create embeddings of our guide and then create vectors for the text.<\/p>\n<p>We will use our Spring Boot application to insert embeddings to our Vector Database i.e. PostgreSQL.<\/p>\n<ol>\n<li>Let&#8217;s specify the Open AI api-key we will be using to connect to our LLM Model in the application.properties file.\n<div style=\"background-color: #f5f7fa; border-radius: 6px; font-family: 'Courier New', monospace; font-size: 14px; line-height: 1.6; border-left: 4px solid #4CAF50;\">\n<pre><code>\r\n spring.ai.openai.api-key=${OPENAI_KEY}\r\n spring.ai.openai.chat.model=gpt-4o\r\n\r\n<\/code><\/pre>\n<\/div>\n<p>Here, OPENAI_KEY is fetched from an environment variable and the LLM Model being used is GPT-4o<\/li>\n<li>Convert the Help Guide PDF paragraph chunks into embeddings using the Open AI Embedding Model.\n<div style=\"background-color: #f5f7fa; border-radius: 6px; font-family: 'Courier New', monospace; font-size: 14px; line-height: 1.6; border-left: 4px solid #4CAF50;\">\n<pre><code>\r\n  @Autowired\r\n  private EmbeddingModel embeddingModel;\r\n    private void storeTextChunksInPostgres(String chunk, long sequenceNumber) throws \r\n  IOException {\r\n    float[] vector = embeddingModel.embed(chunk);\r\n    pgVectorService.insertRecord(sequenceNumber, chunk, vector);\r\n    }\r\n\r\n<\/code><\/pre>\n<\/div>\n<\/li>\n<li>Add the embeddings of each chunk into table.\n<div style=\"background-color: #f5f7fa; border-radius: 6px; font-family: 'Courier New', monospace; font-size: 14px; line-height: 1.6; border-left: 4px solid #4CAF50;\">\n<pre><code>\r\n  public void insertRecord(Long id, String content, float[] contentEmbeddings) {\r\n   List&lt;Float&gt; floatList = new ArrayList&lt;&gt;();\r\n   for (float value : contentEmbeddings) {\r\n     floatList.add(value);\r\n   }\r\n   String contentEmbeddingsStr = floatList.toString().replace(\"[\", \"{\").replace(\"]\", \"}\");\r\n   jdbcClient.sql(\"INSERT INTO campaign_embeddings (id, content, content_embeddings) VALUES (:id, :content, :content_embeddings::double precision[])\")\r\n   .param(\"id\", id)\r\n   .param(\"content\", content)\r\n   .param(\"content_embeddings\", contentEmbeddingsStr)\r\n   .update();\r\n   }\r\n\r\n<\/code><\/pre>\n<\/div>\n<\/li>\n<\/ol>\n<p>This concludes the steps for converting our PDF file to vector embeddings.<\/p>\n<h3>Step 3 &#8211; Performing Similarity Search in our Vector Database (PostgreSQL)<\/h3>\n<p>Since the required data is already added to our Vector Database we can now use it for performing Similarity Search from our Spring Boot Application on the vector table. This step can also be called the R in RAG.<\/p>\n<ol>\n<li>A user can provide any Search prompt to the Chatbot API.<\/li>\n<li>The prompt is provided to the searchCampaignEmbeddings Method for performing the Similarity Search.<\/li>\n<li>The Open AI Embedding Model converts the user input into embeddings \/ vector.<\/li>\n<li>These embeddings \/ vectors are then further used to perform a similarity search with the vector column in the table.<\/li>\n<li>The output ranges between 0 and 1, with lower numbers indicating higher similarity.<\/li>\n<li>A threshold condition (MATCH_THRESHOLD) ensures only sufficiently similar results are included.<\/li>\n<li>The results are sorted by similarity and limited to a set number (MATCH_CNT).<\/li>\n<li>The query is executed, returning a list of matching content strings.<\/li>\n<\/ol>\n<p>In summary, this method retrieves the most relevant content from the campaign_embeddings table by comparing vector representations of the input prompt and stored data.<\/p>\n<div style=\"background-color: #f5f7fa; border-radius: 6px; font-family: 'Courier New', monospace; font-size: 14px; line-height: 1.6; border-left: 4px solid #4CAF50;\">\n<pre> \r\n  public List&lt;String&gt; searchCampaignEmbeddings(String prompt) {\r\n   float[] promptEmbedding = embeddingModel.embed(prompt);\r\n   List&lt;Float&gt; userPromptEmbeddings = new ArrayList&lt;&gt;();\r\n     for (float value : promptEmbedding) {\r\n     userPromptEmbeddings.add(value);\r\n   } \r\n\r\n   JdbcClient.StatementSpec query = jdbcClient.sql(\r\n   \"SELECT content \" +\r\n   \"FROM campaign_embeddings WHERE 1 - (content_embeddings :user_promt::vector) &gt; :match_threshold \"\r\n   +\r\n   \"ORDER BY content_embeddings :user_promt::vector LIMIT :match_cnt\")\r\n   .param(\"user_promt\", userPromptEmbeddings.toString())\r\n   .param(\"match_threshold\", MATCH_THRESHOLD)\r\n   .param(\"match_cnt\", MATCH_CNT);\r\n \r\n   return query.query(String.class).list();\r\n   }\r\n\r\n<\/pre>\n<\/div>\n<h3>Step 4 &#8211; Calling the LLM Model and provide the RAG context to it.<\/h3>\n<ol>\n<li>The top N similarity text returned from above Step 3 are passed as context in the &#8220;pdf_extract&#8221; variable in our system prompt.<\/li>\n<li>The system context is then updated by setting &#8216;pdf_extract&#8217; with the similarity results and &#8216;language&#8217; with the specified method parameter. This step can also be called the A in RAG.<\/li>\n<li>The Spring AI&#8217;s ChatClient object calls the LLM model (In our case it is Open AI), using the updated system context as the system prompt and the user&#8217;s input as the user prompt.<\/li>\n<li>The output will then provide an answer to the user&#8217;s query. This step is called the G in RAG.<\/li>\n<\/ol>\n<div style=\"background-color: #f5f7fa; border-radius: 6px; font-family: 'Courier New', monospace; font-size: 14px; line-height: 1.6; border-left: 4px solid #4CAF50;\">\n<pre><code>\r\n   public String searchIndex(String input, String language) throws IOException{\r\n        List&lt;String&gt; contextList = pgVectorService.searchCampaignEmbeddings(input);\r\n        String context=contextList.stream().collect(Collectors.joining(\"\/n \"));\r\n        return chatClient.prompt()\r\n                .system(s -&gt; {\r\n                    s.text(extKnowledgeBasePdf);\r\n                    s.param(\"pdf_extract\",context);\r\n                    s.param(\"language\",language);\r\n                })\r\n                .user(u -&gt; {\r\n                    u.text(input);\r\n                })\r\n                .call().content();\r\n     }\r\n\r\n<\/code><\/pre>\n<\/div>\n<p><b><i>Final Postman Response Image Displaying Information Retrieved by LLM Using RAG<\/i><\/b><\/p>\n<div id=\"attachment_71753\" style=\"width: 842px\" class=\"wp-caption aligncenter\"><img aria-describedby=\"caption-attachment-71753\" decoding=\"async\" loading=\"lazy\" class=\" wp-image-71753\" src=\"https:\/\/www.tothenew.com\/blog\/wp-ttn-blog\/uploads\/2025\/05\/campaign4.jpg\" alt=\"Search with RAG\" width=\"832\" height=\"354\" srcset=\"\/blog\/wp-ttn-blog\/uploads\/2025\/05\/campaign4.jpg 1387w, \/blog\/wp-ttn-blog\/uploads\/2025\/05\/campaign4-300x128.jpg 300w, \/blog\/wp-ttn-blog\/uploads\/2025\/05\/campaign4-1024x436.jpg 1024w, \/blog\/wp-ttn-blog\/uploads\/2025\/05\/campaign4-768x327.jpg 768w, \/blog\/wp-ttn-blog\/uploads\/2025\/05\/campaign4-624x265.jpg 624w\" sizes=\"(max-width: 832px) 100vw, 832px\" \/><p id=\"caption-attachment-71753\" class=\"wp-caption-text\">Search with RAG<\/p><\/div>\n<h2><span style=\"text-decoration: underline;\">Conclusion<\/span><\/h2>\n<p>By combining Gen AI with RAG and vector databases like PostgreSQL with pgvector, we enhance AI-generated content&#8217;s accuracy and relevance. This setup enables real-time external knowledge retrieval, reducing hallucinations and improving response quality. Whether for chatbots, search engines, or domain-specific AI, RAG bridges the gap between static and dynamic knowledge.<\/p>\n<h2>GitHub Repositories Reference<\/h2>\n<p>For a hands-on implementation of RAG using PostgreSQL as a Vector Database, check out the following repository :<\/p>\n<ul>\n<li>GitHub: <a href=\"https:\/\/github.com\/slashadarsh\/semantic-search\">semantic-search<\/a><\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Introduction Generative AI (Gen AI) has revolutionized how machines generate text, code, images, and more by leveraging deep learning models. However, one of its key limitations is its reliance on pre-trained knowledge, which may become outdated or lack domain-specific insights. This is where Retrieval-Augmented Generation(RAG) comes into play. RAG enhances Gen AI models capabilities by [&hellip;]<\/p>\n","protected":false},"author":1909,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"iawp_total_views":1197},"categories":[5867,446],"tags":[4844,6925,1114,942,6408,4841,2072,6962],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts\/65053"}],"collection":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/users\/1909"}],"replies":[{"embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/comments?post=65053"}],"version-history":[{"count":159,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts\/65053\/revisions"}],"predecessor-version":[{"id":71887,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts\/65053\/revisions\/71887"}],"wp:attachment":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/media?parent=65053"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/categories?post=65053"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/tags?post=65053"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}