{"id":77575,"date":"2026-02-12T16:36:00","date_gmt":"2026-02-12T11:06:00","guid":{"rendered":"https:\/\/www.tothenew.com\/blog\/?p=77575"},"modified":"2026-02-13T14:42:01","modified_gmt":"2026-02-13T09:12:01","slug":"solving-etl-dependency-bottlenecks-with-github-actions","status":"publish","type":"post","link":"https:\/\/www.tothenew.com\/blog\/solving-etl-dependency-bottlenecks-with-github-actions\/","title":{"rendered":"Solving ETL Dependency Bottlenecks with GitHub Actions"},"content":{"rendered":"<div class=\"markdown prose dark:prose-invert w-full wrap-break-word light markdown-new-styling\">\n<h2 data-start=\"547\" data-end=\"562\">Introduction<\/h2>\n<p data-start=\"564\" data-end=\"877\">In modern data platforms, ETL pipelines are rarely independent. They are deeply interconnected\u2014one pipeline\u2019s output becomes another pipeline\u2019s input. In one of our production projects, we faced a classic orchestration challenge. Pipelines were scheduled based on time rather than actual completion i.e real-time problem with predictable execution time of pipelines.<\/p>\n<p data-start=\"1058\" data-end=\"1085\">This article will walks us through:<\/p>\n<ul data-start=\"1086\" data-end=\"1278\">\n<li data-start=\"1086\" data-end=\"1119\">\n<p data-start=\"1088\" data-end=\"1119\">The real problem we encountered<\/p>\n<\/li>\n<li data-start=\"1120\" data-end=\"1164\">\n<p data-start=\"1122\" data-end=\"1164\">Why ETL schedulers alone were insufficient<\/p>\n<\/li>\n<li data-start=\"1165\" data-end=\"1212\">\n<p data-start=\"1167\" data-end=\"1212\">How external orchestration changed everything<\/p>\n<\/li>\n<li data-start=\"1213\" data-end=\"1242\">\n<p data-start=\"1215\" data-end=\"1242\">Why we chose GitHub Actions<\/p>\n<\/li>\n<li data-start=\"1243\" data-end=\"1278\">\n<p data-start=\"1245\" data-end=\"1278\">How we implemented it in practice<\/p>\n<\/li>\n<\/ul>\n<hr data-start=\"1280\" data-end=\"1283\" \/>\n<h2 data-start=\"1285\" data-end=\"1333\">The Problem: Time-Based Scheduling vs Reality<\/h2>\n<p data-start=\"1335\" data-end=\"1438\">We were using an ETL tool and had multiple pipelines scheduled based on dependencies.<\/p>\n<p data-start=\"1440\" data-end=\"1491\">A simplified version of our setup looked like this:<\/p>\n<ul data-start=\"1493\" data-end=\"1648\">\n<li data-start=\"1493\" data-end=\"1574\">\n<p data-start=\"1495\" data-end=\"1510\"><strong data-start=\"1495\" data-end=\"1510\">Pipeline P1<\/strong><\/p>\n<ul data-start=\"1513\" data-end=\"1574\">\n<li data-start=\"1513\" data-end=\"1539\">\n<p data-start=\"1515\" data-end=\"1539\">Scheduled at <strong data-start=\"1528\" data-end=\"1539\">1:00 PM<\/strong><\/p>\n<\/li>\n<li data-start=\"1542\" data-end=\"1574\">\n<p data-start=\"1544\" data-end=\"1574\">Typical runtime: <strong data-start=\"1561\" data-end=\"1574\">1\u20132 hours<\/strong><\/p>\n<\/li>\n<\/ul>\n<\/li>\n<li data-start=\"1575\" data-end=\"1648\">\n<p data-start=\"1577\" data-end=\"1592\"><strong data-start=\"1577\" data-end=\"1592\">Pipeline P2<\/strong><\/p>\n<ul data-start=\"1595\" data-end=\"1648\">\n<li data-start=\"1595\" data-end=\"1621\">\n<p data-start=\"1597\" data-end=\"1621\">Scheduled at <strong data-start=\"1610\" data-end=\"1621\">3:00 PM<\/strong><\/p>\n<\/li>\n<li data-start=\"1624\" data-end=\"1648\">\n<p data-start=\"1626\" data-end=\"1648\">Depends on P1\u2019s output<\/p>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h3 data-start=\"1650\" data-end=\"1683\">Where Things Started Breaking<\/h3>\n<p data-start=\"1685\" data-end=\"1707\">This approach assumes:<\/p>\n<ul data-start=\"1708\" data-end=\"1777\">\n<li data-start=\"1708\" data-end=\"1745\">\n<p data-start=\"1710\" data-end=\"1745\">P1 will <em data-start=\"1718\" data-end=\"1726\">always<\/em> finish before 3 PM<\/p>\n<\/li>\n<li data-start=\"1746\" data-end=\"1777\">\n<p data-start=\"1748\" data-end=\"1777\">Execution time is predictable<\/p>\n<\/li>\n<\/ul>\n<p data-start=\"1779\" data-end=\"1809\">But in real-world ETL systems:<\/p>\n<ul data-start=\"1810\" data-end=\"1926\">\n<li data-start=\"1810\" data-end=\"1841\">\n<p data-start=\"1812\" data-end=\"1841\">Source data volume fluctuates<\/p>\n<\/li>\n<li data-start=\"1842\" data-end=\"1867\">\n<p data-start=\"1844\" data-end=\"1867\">Network latency changes<\/p>\n<\/li>\n<li data-start=\"1868\" data-end=\"1898\">\n<p data-start=\"1870\" data-end=\"1898\">Downstream systems slow down<\/p>\n<\/li>\n<li data-start=\"1899\" data-end=\"1926\">\n<p data-start=\"1901\" data-end=\"1926\">Unexpected retries happen<\/p>\n<\/li>\n<\/ul>\n<p data-start=\"1928\" data-end=\"1965\">This led to two major inefficiencies:<\/p>\n<h3 data-start=\"1967\" data-end=\"1996\">Scenario 1:<\/h3>\n<p data-start=\"1998\" data-end=\"2129\">If P1 finished in <strong data-start=\"2016\" data-end=\"2026\">1 hour<\/strong>, it completed at <strong data-start=\"2044\" data-end=\"2052\">2 PM<\/strong>.<br data-start=\"2053\" data-end=\"2056\" \/>But P2 would still wait until <strong data-start=\"2086\" data-end=\"2094\">3 PM<\/strong>, wasting a full hour of idle time.<\/p>\n<h3 data-start=\"2131\" data-end=\"2166\">Scenario 2:<\/h3>\n<p data-start=\"2168\" data-end=\"2247\">If P1 ran longer\u2014say <strong data-start=\"2189\" data-end=\"2202\">2.5 hours<\/strong>\u2014it was still running at 3 PM.<br data-start=\"2232\" data-end=\"2235\" \/>As a result:<\/p>\n<ul data-start=\"2248\" data-end=\"2311\">\n<li data-start=\"2248\" data-end=\"2266\">\n<p data-start=\"2250\" data-end=\"2266\">P2 either failed<\/p>\n<\/li>\n<li data-start=\"2267\" data-end=\"2283\">\n<p data-start=\"2269\" data-end=\"2283\">Or was blocked<\/p>\n<\/li>\n<li data-start=\"2284\" data-end=\"2311\">\n<p data-start=\"2286\" data-end=\"2311\">Or ran on incomplete data<\/p>\n<\/li>\n<\/ul>\n<p data-start=\"2313\" data-end=\"2383\">This created operational instability and required manual intervention.<\/p>\n<hr data-start=\"2385\" data-end=\"2388\" \/>\n<h2 data-start=\"2390\" data-end=\"2433\">Why Native ETL Schedulers Weren\u2019t Enough<\/h2>\n<p data-start=\"2435\" data-end=\"2478\">Most ETL tools offer:<\/p>\n<ul data-start=\"2479\" data-end=\"2575\">\n<li data-start=\"2479\" data-end=\"2502\">\n<p data-start=\"2481\" data-end=\"2502\">Cron-based scheduling<\/p>\n<\/li>\n<li data-start=\"2503\" data-end=\"2530\">\n<p data-start=\"2505\" data-end=\"2530\">Basic dependency handling<\/p>\n<\/li>\n<li data-start=\"2531\" data-end=\"2575\">\n<p data-start=\"2533\" data-end=\"2575\">Trigger-on-success options (within limits)<\/p>\n<\/li>\n<\/ul>\n<p data-start=\"2577\" data-end=\"2615\">However, these features struggle when:<\/p>\n<ul data-start=\"2616\" data-end=\"2782\">\n<li data-start=\"2616\" data-end=\"2654\">\n<p data-start=\"2618\" data-end=\"2654\">Pipelines have <strong data-start=\"2633\" data-end=\"2654\">variable runtimes<\/strong><\/p>\n<\/li>\n<li data-start=\"2655\" data-end=\"2700\">\n<p data-start=\"2657\" data-end=\"2700\">Dependencies span <strong data-start=\"2675\" data-end=\"2700\">multiple environments<\/strong><\/p>\n<\/li>\n<li data-start=\"2701\" data-end=\"2737\">\n<p data-start=\"2703\" data-end=\"2737\">You need <strong data-start=\"2712\" data-end=\"2737\">dynamic orchestration<\/strong><\/p>\n<\/li>\n<li data-start=\"2738\" data-end=\"2782\">\n<p data-start=\"2740\" data-end=\"2782\">Monitoring and retries must be centralized<\/p>\n<\/li>\n<\/ul>\n<p data-start=\"2784\" data-end=\"2884\">At their core, ETL schedulers are <strong data-start=\"2818\" data-end=\"2833\">time-driven<\/strong>.<br data-start=\"2834\" data-end=\"2837\" \/>Our problem needed an <strong>event-driven<\/strong> solution.<\/p>\n<hr data-start=\"2886\" data-end=\"2889\" \/>\n<h2 data-start=\"2891\" data-end=\"2938\">The Core Insight: Orchestration \u2260 Scheduling<\/h2>\n<p data-start=\"2940\" data-end=\"2974\">This was the turning point for us.<\/p>\n<p data-start=\"2976\" data-end=\"3001\"><strong data-start=\"2976\" data-end=\"2999\">Scheduling answers:<\/strong><\/p>\n<blockquote data-start=\"3002\" data-end=\"3034\">\n<p data-start=\"3004\" data-end=\"3034\"><em data-start=\"3004\" data-end=\"3034\">When should something start?<\/em><\/p>\n<\/blockquote>\n<p data-start=\"3036\" data-end=\"3064\"><strong data-start=\"3036\" data-end=\"3062\">Orchestration answers:<\/strong><\/p>\n<blockquote data-start=\"3065\" data-end=\"3120\">\n<p data-start=\"3067\" data-end=\"3120\"><em data-start=\"3067\" data-end=\"3120\">What should happen next, and under what conditions?<\/em><\/p>\n<\/blockquote>\n<p data-start=\"3122\" data-end=\"3191\">We didn\u2019t need better schedules.<br data-start=\"3154\" data-end=\"3157\" \/>We needed a system that could say:<\/p>\n<blockquote data-start=\"3193\" data-end=\"3278\">\n<p data-start=\"3195\" data-end=\"3278\">\u201cStart Pipeline P2 <strong data-start=\"3214\" data-end=\"3228\">only after<\/strong> Pipeline P1 has actually completed successfully.\u201d<\/p>\n<\/blockquote>\n<hr data-start=\"3280\" data-end=\"3283\" \/>\n<h2 data-start=\"3285\" data-end=\"3319\">Why External Orchestration Wins<\/h2>\n<p data-start=\"3321\" data-end=\"3377\">By moving orchestration outside the ETL tool, we gained:<\/p>\n<ul data-start=\"3379\" data-end=\"3516\">\n<li data-start=\"3379\" data-end=\"3412\">\n<p data-start=\"3381\" data-end=\"3412\">Decoupling from rigid schedules<\/p>\n<\/li>\n<li data-start=\"3413\" data-end=\"3434\">\n<p data-start=\"3415\" data-end=\"3434\">Centralized control<\/p>\n<\/li>\n<li data-start=\"3435\" data-end=\"3460\">\n<p data-start=\"3437\" data-end=\"3460\">Clear dependency graphs<\/p>\n<\/li>\n<li data-start=\"3461\" data-end=\"3488\">\n<p data-start=\"3463\" data-end=\"3488\">Better failure visibility<\/p>\n<\/li>\n<li data-start=\"3489\" data-end=\"3516\">\n<p data-start=\"3491\" data-end=\"3516\">Easier retries and alerts<\/p>\n<\/li>\n<\/ul>\n<p data-start=\"3518\" data-end=\"3576\">We evaluated several options\u2014but GitHub Actions stood out.<\/p>\n<hr data-start=\"3578\" data-end=\"3581\" \/>\n<h2 data-start=\"3583\" data-end=\"3609\">What Is GitHub Actions?<\/h2>\n<p data-start=\"3611\" data-end=\"3687\"><strong data-start=\"3611\" data-end=\"3629\">GitHub Actions<\/strong> is a workflow automation tool built directly into GitHub.<\/p>\n<p data-start=\"3689\" data-end=\"3713\">It is commonly used for:<\/p>\n<ul data-start=\"3714\" data-end=\"3769\">\n<li data-start=\"3714\" data-end=\"3731\">\n<p data-start=\"3716\" data-end=\"3731\">CI\/CD pipelines<\/p>\n<\/li>\n<li data-start=\"3732\" data-end=\"3751\">\n<p data-start=\"3734\" data-end=\"3751\">Automated testing<\/p>\n<\/li>\n<li data-start=\"3752\" data-end=\"3769\">\n<p data-start=\"3754\" data-end=\"3769\">Code deployment<\/p>\n<\/li>\n<\/ul>\n<p data-start=\"3771\" data-end=\"3856\">Basically, GitHub Actions is a <strong data-start=\"3811\" data-end=\"3846\">general-purpose workflow engine<\/strong> that can:<\/p>\n<ul data-start=\"3858\" data-end=\"3993\">\n<li data-start=\"3858\" data-end=\"3881\">\n<p data-start=\"3860\" data-end=\"3881\">Run jobs sequentially<\/p>\n<\/li>\n<li data-start=\"3882\" data-end=\"3917\">\n<p data-start=\"3884\" data-end=\"3917\">Trigger external systems via APIs<\/p>\n<\/li>\n<li data-start=\"3918\" data-end=\"3939\">\n<p data-start=\"3920\" data-end=\"3939\">Wait for conditions<\/p>\n<\/li>\n<li data-start=\"3940\" data-end=\"3960\">\n<p data-start=\"3942\" data-end=\"3960\">Fail fast or retry<\/p>\n<\/li>\n<li data-start=\"3961\" data-end=\"3993\">\n<p data-start=\"3963\" data-end=\"3993\">Log everything with timestamps<\/p>\n<\/li>\n<\/ul>\n<p data-start=\"3995\" data-end=\"4057\">That makes it powerful for <strong data-start=\"4035\" data-end=\"4056\">ETL orchestration<\/strong>.<\/p>\n<hr data-start=\"4059\" data-end=\"4062\" \/>\n<h2 data-start=\"4064\" data-end=\"4114\">Why GitHub Actions Worked for Our ETL Pipelines<\/h2>\n<ul data-start=\"4170\" data-end=\"4374\">\n<li data-start=\"4170\" data-end=\"4207\">\n<p data-start=\"4172\" data-end=\"4207\"><strong data-start=\"4172\" data-end=\"4196\">Sequential execution<\/strong><\/p>\n<\/li>\n<li data-start=\"4208\" data-end=\"4255\">\n<p data-start=\"4210\" data-end=\"4255\"><strong data-start=\"4210\" data-end=\"4230\">Conditional flow<\/strong> based on success\/failure<\/p>\n<\/li>\n<li data-start=\"4256\" data-end=\"4284\">\n<p data-start=\"4258\" data-end=\"4284\"><strong data-start=\"4258\" data-end=\"4284\">Event-driven execution<\/strong><\/p>\n<\/li>\n<li data-start=\"4285\" data-end=\"4325\">\n<p data-start=\"4287\" data-end=\"4325\"><strong data-start=\"4287\" data-end=\"4306\">Full visibility<\/strong> into pipeline runs with logs<\/p>\n<\/li>\n<li data-start=\"4326\" data-end=\"4374\">\n<p data-start=\"4328\" data-end=\"4374\"><strong data-start=\"4328\" data-end=\"4348\">External control<\/strong>, without ETL-tool lock-in<\/p>\n<\/li>\n<\/ul>\n<p data-start=\"4376\" data-end=\"4462\">Instead of guessing execution times, we started reacting to <strong data-start=\"4436\" data-end=\"4461\">actual pipeline state<\/strong>.<\/p>\n<hr data-start=\"4464\" data-end=\"4467\" \/>\n<h2 data-start=\"4469\" data-end=\"4495\">High-Level Architecture<\/h2>\n<p data-start=\"4497\" data-end=\"4541\">Our new orchestration flow looked like this:<\/p>\n<p>GitHub Actions Workflow -&gt; Trigger Pipeline P1 -&gt; Poll P1 status (RUNNING \u2192 SUCCESS) -&gt; Trigger Pipeline P2 -&gt; Trigger P3 \u2192 P4 \u2192 &#8230;<\/p>\n<p data-start=\"4773\" data-end=\"4848\">Each pipeline starts <strong data-start=\"4794\" data-end=\"4808\">only after<\/strong> the previous one finishes successfully.<\/p>\n<p data-start=\"4850\" data-end=\"4893\">No clocks.<br data-start=\"4860\" data-end=\"4863\" \/>No guessing.<br data-start=\"4875\" data-end=\"4878\" \/>No wasted time.<\/p>\n<hr data-start=\"4895\" data-end=\"4898\" \/>\n<h2 data-start=\"4900\" data-end=\"4924\">How We Implemented It<\/h2>\n<h3 data-start=\"4926\" data-end=\"4958\">Step 1: Using APIs<\/h3>\n<p data-start=\"4960\" data-end=\"5007\">Tools provide REST APIs that allow:<\/p>\n<ul data-start=\"5008\" data-end=\"5076\">\n<li data-start=\"5008\" data-end=\"5022\">\n<p data-start=\"5010\" data-end=\"5022\">Trigger jobs<\/p>\n<\/li>\n<li data-start=\"5023\" data-end=\"5047\">\n<p data-start=\"5025\" data-end=\"5047\">Fetch execution status<\/p>\n<\/li>\n<li data-start=\"5048\" data-end=\"5076\">\n<p data-start=\"5050\" data-end=\"5076\">Capture success or failure<\/p>\n<\/li>\n<\/ul>\n<p data-start=\"5078\" data-end=\"5109\">Each pipeline is identified by:<\/p>\n<ul data-start=\"5110\" data-end=\"5167\">\n<li data-start=\"5110\" data-end=\"5119\">\n<p data-start=\"5112\" data-end=\"5119\">Project<\/p>\n<\/li>\n<li data-start=\"5120\" data-end=\"5133\">\n<p data-start=\"5122\" data-end=\"5133\">Environment<\/p>\n<\/li>\n<li data-start=\"5134\" data-end=\"5144\">\n<p data-start=\"5136\" data-end=\"5144\">Job name<\/p>\n<\/li>\n<li data-start=\"5145\" data-end=\"5167\">\n<p data-start=\"5147\" data-end=\"5167\">Authentication token<\/p>\n<\/li>\n<\/ul>\n<hr data-start=\"5169\" data-end=\"5172\" \/>\n<h3 data-start=\"5174\" data-end=\"5220\">Step 2: Creating a GitHub Actions Workflow<\/h3>\n<p data-start=\"5222\" data-end=\"5272\">We created a workflow YAML file in our repository:<\/p>\n<div class=\"contain-inline-size rounded-2xl corner-superellipse\/1.1 relative bg-token-sidebar-surface-primary\">\n<div class=\"overflow-y-auto p-4\" dir=\"ltr\">\n<p><code class=\"whitespace-pre! language-yaml\"><span class=\"hljs-attr\">name:<\/span> <span class=\"hljs-string\">ETL<\/span> <span class=\"hljs-string\">Orchestration<\/span><\/code><\/p>\n<p><span class=\"hljs-attr\">on:<\/span><br \/>\n<span class=\"hljs-attr\">\u00a0 \u00a0 workflow_dispatch:<\/span><br \/>\n<span class=\"hljs-attr\">\u00a0 \u00a0 schedule:<\/span><br \/>\n<span class=\"hljs-bullet\">\u00a0 \u00a0 \u00a0 \u00a0 &#8211;<\/span> <span class=\"hljs-attr\">cron:<\/span> <span class=\"hljs-string\">&#8220;0 13 * * *&#8221;<\/span><\/p>\n<p><span class=\"hljs-attr\">jobs:<\/span><br \/>\n<span class=\"hljs-attr\">\u00a0 \u00a0 run-etl:<\/span><br \/>\n<span class=\"hljs-attr\">\u00a0 \u00a0 \u00a0 \u00a0 runs-on:<\/span> <span class=\"hljs-string\">ubuntu-latest<\/span><\/p>\n<p><span class=\"hljs-attr\">\u00a0 \u00a0 \u00a0 \u00a0 steps:<\/span><br \/>\n<span class=\"hljs-bullet\">\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 &#8211;<\/span> <span class=\"hljs-attr\">name:<\/span> <span class=\"hljs-string\">Trigger<\/span> <span class=\"hljs-string\">Pipeline<\/span> <span class=\"hljs-string\">P1<\/span><br \/>\n<span class=\"hljs-attr\">\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 run:<\/span> <span class=\"hljs-string\">|<br \/>\ncurl -X POST &#8220;$P1_API&#8221;<br \/>\n<\/span><br \/>\n<span class=\"hljs-bullet\">\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0&#8211;<\/span> <span class=\"hljs-attr\">name:<\/span> <span class=\"hljs-string\">Wait<\/span> <span class=\"hljs-string\">for<\/span> <span class=\"hljs-string\">P1<\/span> <span class=\"hljs-string\">to<\/span> <span class=\"hljs-string\">complete<\/span><br \/>\n<span class=\"hljs-attr\">\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0run:<\/span> <span class=\"hljs-string\">|<br \/>\n.\/check_status.sh P1<br \/>\n<\/span><br \/>\n<span class=\"hljs-bullet\">\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0&#8211;<\/span>\u00a0<span class=\"hljs-attr\">name:<\/span> <span class=\"hljs-string\">Trigger<\/span> <span class=\"hljs-string\">Pipeline<\/span> <span class=\"hljs-string\">P2<\/span><br \/>\n<span class=\"hljs-attr\">\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0run:<\/span> <span class=\"hljs-string\">|<br \/>\ncurl -X POST &#8220;$P2_API&#8221;<br \/>\n<\/span><\/p>\n<\/div>\n<\/div>\n<hr data-start=\"5701\" data-end=\"5704\" \/>\n<h3 data-start=\"5706\" data-end=\"5739\">Step 3: Dependency Validation<\/h3>\n<p data-start=\"5741\" data-end=\"5793\">Instead of triggering the next pipeline, we:<\/p>\n<ul data-start=\"5794\" data-end=\"5887\">\n<li data-start=\"5794\" data-end=\"5818\">\n<p data-start=\"5796\" data-end=\"5818\">Polled API<\/p>\n<\/li>\n<li data-start=\"5819\" data-end=\"5844\">\n<p data-start=\"5821\" data-end=\"5844\">Checked execution state<\/p>\n<\/li>\n<li data-start=\"5845\" data-end=\"5887\">\n<p data-start=\"5847\" data-end=\"5887\">Proceeded only when status was <code data-start=\"5878\" data-end=\"5887\">SUCCESS<\/code><\/p>\n<\/li>\n<\/ul>\n<p data-start=\"5889\" data-end=\"5910\">If a pipeline failed:<\/p>\n<ul data-start=\"5911\" data-end=\"6019\">\n<li data-start=\"5911\" data-end=\"5942\">\n<p data-start=\"5913\" data-end=\"5942\">The GitHub Actions job failed<\/p>\n<\/li>\n<li data-start=\"5943\" data-end=\"5984\">\n<p data-start=\"5945\" data-end=\"5984\">Downstream pipelines were not triggered<\/p>\n<\/li>\n<li data-start=\"5985\" data-end=\"6019\">\n<p data-start=\"5987\" data-end=\"6019\">Alerts could be sent immediately\u00a0 via PagerDuty, Slack, etc<\/p>\n<\/li>\n<\/ul>\n<hr data-start=\"6021\" data-end=\"6024\" \/>\n<h3 data-start=\"6026\" data-end=\"6061\">Step 4: Observability &amp; Control<\/h3>\n<p data-start=\"6063\" data-end=\"6087\">GitHub Actions provided:<\/p>\n<ul data-start=\"6088\" data-end=\"6199\">\n<li data-start=\"6088\" data-end=\"6110\">\n<p data-start=\"6090\" data-end=\"6110\">Clear execution logs<\/p>\n<\/li>\n<li data-start=\"6111\" data-end=\"6137\">\n<p data-start=\"6113\" data-end=\"6137\">Start and end timestamps<\/p>\n<\/li>\n<li data-start=\"6138\" data-end=\"6165\">\n<p data-start=\"6140\" data-end=\"6165\">Pipeline-level visibility<\/p>\n<\/li>\n<li data-start=\"6166\" data-end=\"6199\">\n<p data-start=\"6168\" data-end=\"6199\">Single platform to monitor everything<\/p>\n<\/li>\n<li data-start=\"6166\" data-end=\"6199\">Manage the cases, whether to run the next step in failure or stop.<\/li>\n<\/ul>\n<p data-start=\"6201\" data-end=\"6221\">We could easily add:<\/p>\n<ul data-start=\"6222\" data-end=\"6281\">\n<li data-start=\"6222\" data-end=\"6243\">\n<p data-start=\"6224\" data-end=\"6243\">Slack notifications \/ Pager Duty Calls<\/p>\n<\/li>\n<li data-start=\"6244\" data-end=\"6257\">\n<p data-start=\"6246\" data-end=\"6257\">Retry logic<\/p>\n<\/li>\n<li data-start=\"6258\" data-end=\"6281\">\n<p data-start=\"6260\" data-end=\"6281\">Conditional branching<\/p>\n<\/li>\n<\/ul>\n<hr data-start=\"6283\" data-end=\"6286\" \/>\n<h2 data-start=\"6288\" data-end=\"6333\">Why This Was Better Than Native Scheduling<\/h2>\n<div class=\"TyagGW_tableContainer\">\n<div class=\"group TyagGW_tableWrapper flex flex-col-reverse w-fit\">\n<table class=\"w-fit min-w-(--thread-content-width)\" data-start=\"6335\" data-end=\"6663\">\n<thead data-start=\"6335\" data-end=\"6379\">\n<tr data-start=\"6335\" data-end=\"6379\">\n<th data-start=\"6335\" data-end=\"6345\" data-col-size=\"sm\">Feature<\/th>\n<th data-start=\"6345\" data-end=\"6361\" data-col-size=\"sm\">ETL Scheduler<\/th>\n<th data-start=\"6361\" data-end=\"6379\" data-col-size=\"sm\">GitHub Actions<\/th>\n<\/tr>\n<\/thead>\n<tbody data-start=\"6419\" data-end=\"6663\">\n<tr data-start=\"6419\" data-end=\"6450\">\n<td data-start=\"6419\" data-end=\"6437\" data-col-size=\"sm\">Time-based runs<\/td>\n<td data-start=\"6437\" data-end=\"6443\" data-col-size=\"sm\">Yes<\/td>\n<td data-start=\"6443\" data-end=\"6450\" data-col-size=\"sm\">Yes<\/td>\n<\/tr>\n<tr data-start=\"6451\" data-end=\"6495\">\n<td data-start=\"6451\" data-end=\"6475\" data-col-size=\"sm\">State-based execution<\/td>\n<td data-start=\"6475\" data-end=\"6485\" data-col-size=\"sm\">Limited<\/td>\n<td data-start=\"6485\" data-end=\"6495\" data-col-size=\"sm\">Strong<\/td>\n<\/tr>\n<tr data-start=\"6496\" data-end=\"6536\">\n<td data-start=\"6496\" data-end=\"6519\" data-col-size=\"sm\">Dynamic dependencies<\/td>\n<td data-start=\"6519\" data-end=\"6526\" data-col-size=\"sm\">Weak<\/td>\n<td data-start=\"6526\" data-end=\"6536\" data-col-size=\"sm\">Native<\/td>\n<\/tr>\n<tr data-start=\"6537\" data-end=\"6580\">\n<td data-start=\"6537\" data-end=\"6558\" data-col-size=\"sm\">Sequential control<\/td>\n<td data-start=\"6558\" data-end=\"6568\" data-col-size=\"sm\">Limited<\/td>\n<td data-start=\"6568\" data-end=\"6580\" data-col-size=\"sm\">Built-in<\/td>\n<\/tr>\n<tr data-start=\"6581\" data-end=\"6626\">\n<td data-start=\"6581\" data-end=\"6602\" data-col-size=\"sm\">Failure visibility<\/td>\n<td data-start=\"6602\" data-end=\"6613\" data-col-size=\"sm\">Moderate<\/td>\n<td data-start=\"6613\" data-end=\"6626\" data-col-size=\"sm\">Excellent<\/td>\n<\/tr>\n<tr data-start=\"6627\" data-end=\"6663\">\n<td data-start=\"6627\" data-end=\"6651\" data-col-size=\"sm\">Central orchestration<\/td>\n<td data-start=\"6651\" data-end=\"6656\" data-col-size=\"sm\">No<\/td>\n<td data-start=\"6656\" data-end=\"6663\" data-col-size=\"sm\">Yes<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<div class=\"sticky h-0 select-none end-(--thread-content-margin) self-end\">\n<div class=\"absolute end-0 flex items-end\" style=\"height: 33px;\"><\/div>\n<\/div>\n<\/div>\n<\/div>\n<hr data-start=\"6665\" data-end=\"6668\" \/>\n<h2 data-start=\"6670\" data-end=\"6695\">Real Production Impact<\/h2>\n<p data-start=\"6697\" data-end=\"6731\">After switching to GitHub Actions:<\/p>\n<ul data-start=\"6733\" data-end=\"6889\">\n<li data-start=\"6733\" data-end=\"6790\">\n<p data-start=\"6735\" data-end=\"6790\">Pipelines started <strong data-start=\"6753\" data-end=\"6790\">as soon as dependencies were completed<\/strong><\/p>\n<\/li>\n<li data-start=\"6791\" data-end=\"6821\">\n<p data-start=\"6793\" data-end=\"6821\">No idle waiting between jobs<\/p>\n<\/li>\n<li data-start=\"6822\" data-end=\"6845\">\n<p data-start=\"6824\" data-end=\"6845\">No blocked executions<\/p>\n<\/li>\n<li data-start=\"6846\" data-end=\"6866\">\n<p data-start=\"6848\" data-end=\"6866\">No manual restarts<\/p>\n<\/li>\n<\/ul>\n<p data-start=\"6891\" data-end=\"6970\">We moved from a <strong data-start=\"6907\" data-end=\"6933\">time-driven ETL system<\/strong> to a <strong data-start=\"6939\" data-end=\"6969\">state-driven or event-driven data pipeline<\/strong>.<\/p>\n<hr data-start=\"6972\" data-end=\"6975\" \/>\n<h2 data-start=\"6977\" data-end=\"6993\">Key Takeaways<\/h2>\n<ul data-start=\"6995\" data-end=\"7312\">\n<li data-start=\"6995\" data-end=\"7060\">\n<p data-start=\"6997\" data-end=\"7060\">ETL schedulers are good, but for complex dependencies, one can use GitHub Actions for a better high-level architecture<\/p>\n<\/li>\n<li data-start=\"7061\" data-end=\"7119\">\n<p data-start=\"7063\" data-end=\"7119\">Time-based scheduling is not very predictable with variable runtimes<\/p>\n<\/li>\n<li data-start=\"7120\" data-end=\"7179\">\n<p data-start=\"7122\" data-end=\"7179\">External orchestration brings flexibility and reliability<\/p>\n<\/li>\n<li data-start=\"7180\" data-end=\"7250\">\n<p data-start=\"7182\" data-end=\"7250\">GitHub Actions is not just for CI\/CD\u2014it\u2019s a powerful workflow engine<\/p>\n<\/li>\n<li data-start=\"7251\" data-end=\"7312\">\n<p data-start=\"7253\" data-end=\"7312\">State-driven orchestration leads to faster, safer pipelines<\/p>\n<\/li>\n<\/ul>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Introduction In modern data platforms, ETL pipelines are rarely independent. They are deeply interconnected\u2014one pipeline\u2019s output becomes another pipeline\u2019s input. In one of our production projects, we faced a classic orchestration challenge. Pipelines were scheduled based on time rather than actual completion i.e real-time problem with predictable execution time of pipelines. This article will walks [&hellip;]<\/p>\n","protected":false},"author":2069,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"iawp_total_views":24},"categories":[6194],"tags":[4458,5627],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts\/77575"}],"collection":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/users\/2069"}],"replies":[{"embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/comments?post=77575"}],"version-history":[{"count":15,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts\/77575\/revisions"}],"predecessor-version":[{"id":77796,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts\/77575\/revisions\/77796"}],"wp:attachment":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/media?parent=77575"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/categories?post=77575"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/tags?post=77575"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}