{"id":75380,"date":"2025-09-17T11:28:41","date_gmt":"2025-09-17T05:58:41","guid":{"rendered":"https:\/\/www.tothenew.com\/blog\/?p=75380"},"modified":"2025-09-23T10:50:08","modified_gmt":"2025-09-23T05:20:08","slug":"my-experiments-with-genai-2","status":"publish","type":"post","link":"https:\/\/www.tothenew.com\/blog\/my-experiments-with-genai-2\/","title":{"rendered":"Finding the Right GenAI Model for Right Task"},"content":{"rendered":"<p><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-75381\" src=\"https:\/\/www.tothenew.com\/blog\/wp-ttn-blog\/uploads\/2025\/09\/Generative-AI-1.webp\" alt=\"Generative AI\" width=\"978\" height=\"550\" srcset=\"\/blog\/wp-ttn-blog\/uploads\/2025\/09\/Generative-AI-1.webp 978w, \/blog\/wp-ttn-blog\/uploads\/2025\/09\/Generative-AI-1-300x169.webp 300w, \/blog\/wp-ttn-blog\/uploads\/2025\/09\/Generative-AI-1-768x432.webp 768w, \/blog\/wp-ttn-blog\/uploads\/2025\/09\/Generative-AI-1-624x351.webp 624w\" sizes=\"(max-width: 978px) 100vw, 978px\" \/><\/p>\n<h1><strong>Where It All Began<\/strong><\/h1>\n<p>The inspiration for exploring this topic arose while developing a POC to generate accurate graphical reports and charts from quantitative data.<\/p>\n<p>Naturally, the first thought was GPT. It\u2019s everywhere\u2014the \u201cdefault\u201d AI for almost any task. ChatGPT was given a try. It worked to an extent, displaying text-based charts or even generating HTML\/Python code to create one. But here\u2019s the catch: it wouldn\u2019t provide the final chart image. Copying code elsewhere just to get the actual chart wasn\u2019t the intended outcome.<\/p>\n<p>Next came DALL\u00b7E. Since it\u2019s also from OpenAI, perhaps it could do the job. Well\u2026 no. Beautiful, creative pictures? Absolutely. Structured data-based charts? Not really. What seemed like an easy win quickly turned into a mini quest.<\/p>\n<hr \/>\n<h1><strong>Hunting for the Right Model<\/strong><\/h1>\n<p>Several options went down the list\u2014GPT-3.5, DALL\u00b7E, Gemini 1.5 Flash, DeepSeek R1. None produced the desired result.<\/p>\n<p>Then Claude by Anthropic entered the picture, specifically Claude 4 Sonnet. Expectations were low, but the first attempt was a surprise: the chart came out clean, accurate, properly labeled\u2014exactly what was needed.<\/p>\n<p>Even with GPT-5 making waves, Claude remains the go-to choice for this type of work. Claude Opus 4 in particular shows a careful approach\u2014checking dataset structure, avoiding mismatched axes, and even explaining why a certain type of chart is chosen.<\/p>\n<p>Of course, with AI evolving so rapidly, a new model could take the lead at any moment.<\/p>\n<hr \/>\n<h1><strong>The PDF Text Challenge<\/strong><\/h1>\n<p>Another challenge involved extracting raw text from PDFs\u2014word for word. Sounds simple, but GPT consistently summarized instead of returning the full text, and sometimes even cut sections short.<\/p>\n<p>After multiple trials, Gemini 1.5 Flash proved to be the most effective, with around 95% accuracy in tests. The trade-off: large files had to be split into batches. DeepSeek R1 was too slow at the time and carried unresolved security issues, making it less viable.<\/p>\n<p>This reinforced a key realization: every AI model has its own sweet spot. Choosing the right tool for the right task can save enormous frustration.<\/p>\n<hr \/>\n<h1><strong>Why Agentic AI Works Better for Coding<\/strong><\/h1>\n<p>For coding, \u201cagent-style\u201d tools often outperform general-purpose AI models. Cursor stands out as a strong option\u2014it handles context with minimal explanation required, enabling focus on the bigger picture.<\/p>\n<p>GitHub Copilot is solid, but Cursor integrates more naturally into certain workflows. While GPT and others can assist with code generation, Cursor feels more like an assistant who already understands the context.<\/p>\n<hr \/>\n<h1><strong>A Favourite GenAI Win \u2013 Automated Reporting<\/strong><\/h1>\n<p>One of the biggest successes with GenAI came in automating a reporting process that previously consumed hours or even days.<\/p>\n<p>Here\u2019s the approach:<\/p>\n<ul style=\"list-style-type: circle;\">\n<li>Train on Domain Data \u2013 Past global reports were provided as context.<\/li>\n<li>Pick Examples \u2013 From that set, the top 3 relevant examples were identified.<\/li>\n<li>Add Custom KPIs \u2013 KPI values from the user were integrated.<\/li>\n<li>Generate the Report \u2013 The AI produced a tailored report, charts included.<\/li>\n<\/ul>\n<p>The result? Reports were delivered in seconds instead of days, with noticeable improvements in quality.<\/p>\n<p>A key lesson emerged: prompt order matters. Change the sequence, and results can shift dramatically. For precise output, it\u2019s critical to specify exact requirements, ideally in the final line of the prompt.<\/p>\n<hr \/>\n<h1><strong>Quick Recap \u2013 Model Strengths<\/strong><\/h1>\n<p>From these experiments, here\u2019s how the strengths stack up:<\/p>\n<ul style=\"list-style-type: circle;\">\n<li>GPT-5 \/ GPT-4 \u2013 Excellent all-rounders, strong at reasoning and structured content.<\/li>\n<li>Claude Opus 4 \u2013 Best choice for structured data and accurate visualizations.<\/li>\n<li>Gemini 1.5 Flash \u2013 Reliable for extracting large chunks of plain text.<\/li>\n<li>DALL\u00b7E \u2013 Brilliant for creative\/artistic imagery, less so for data visuals.<\/li>\n<li>DeepSeek R1 \u2013 Still maturing; slower and less stable in testing.<\/li>\n<\/ul>\n<hr \/>\n<h1><strong>Wrapping Up<\/strong><\/h1>\n<p>The key takeaway \u2014 no single AI model is the best at everything. Each has a niche.<\/p>\n<ul style=\"list-style-type: circle;\">\n<li>Need charts from data? Use Claude.<\/li>\n<li>Need full text from PDFs? Gemini.<\/li>\n<li>Want creative images? DALL\u00b7E.<\/li>\n<li>Need coding help? Cursor.<\/li>\n<\/ul>\n<p>The AI landscape shifts quickly, so any list like this may become outdated soon. But the principle holds\u2014don\u2019t chase the trend, pick the tool that works for the task at hand.<\/p>\n<p>That\u2019s when GenAI moves beyond \u201ccool tech\u201d and becomes a true productivity booster.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Where It All Began The inspiration for exploring this topic arose while developing a POC to generate accurate graphical reports and charts from quantitative data. Naturally, the first thought was GPT. It\u2019s everywhere\u2014the \u201cdefault\u201d AI for almost any task. ChatGPT was given a try. It worked to an extent, displaying text-based charts or even generating [&hellip;]<\/p>\n","protected":false},"author":1532,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"iawp_total_views":30},"categories":[5871],"tags":[7392,7757,7433,5929,7536,7758,7030,6598,5733,5918,5475],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts\/75380"}],"collection":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/users\/1532"}],"replies":[{"embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/comments?post=75380"}],"version-history":[{"count":10,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts\/75380\/revisions"}],"predecessor-version":[{"id":76565,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts\/75380\/revisions\/76565"}],"wp:attachment":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/media?parent=75380"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/categories?post=75380"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/tags?post=75380"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}