{"id":60388,"date":"2024-02-23T10:29:11","date_gmt":"2024-02-23T04:59:11","guid":{"rendered":"https:\/\/www.tothenew.com\/blog\/?p=60388"},"modified":"2024-03-01T10:34:37","modified_gmt":"2024-03-01T05:04:37","slug":"headless-browser-heaven-a-deep-dive-into-puppeteer-and-its-possibilities","status":"publish","type":"post","link":"https:\/\/www.tothenew.com\/blog\/headless-browser-heaven-a-deep-dive-into-puppeteer-and-its-possibilities\/","title":{"rendered":"Headless Browser Heaven: A Deep Dive into Puppeteer and its Possibilities"},"content":{"rendered":"<p><span style=\"font-weight: 400;\">A headless browser refers to a web browser that functions without a graphical user interface (GUI), making it suitable for activities like web scraping, automated testing, and other web-related tasks which you can automate. Headless browsers are typically used for tasks such as web scraping, automated testing, and other web-related activities requiring little or no human interaction. The advantage of using a headless browser is that it can perform tasks more efficiently and faster, as it does not have to render web pages visually. Headless browsers are commonly employed in web development, testing, and data scraping applications where automation is necessary.<\/span><\/p>\n<h2><span style=\"font-weight: 400;\"><strong>What is puppeteer<\/strong> <\/span><\/h2>\n<p>Puppeteer is a versatile Node.js library that provides developers with a sophisticated API to efficiently manage headless browsers, such as Chromium, through the DevTools Protocol. Developers can configure it to use full (non-headless) Chrome or Chromium during development. Its capabilities include automating tasks like webpage interaction, form filling, and data scraping, as well as features for capturing screenshots, generating PDFs, and conducting tests within a headless browsing environment.<\/p>\n<p>The appeal of Puppeteer lies in its developer-friendly APIs, making it a preferred choice among web developers and testers. With Puppeteer, users can replicate user interactions on webpages, conduct automated testing, submit forms, and perform data scraping with ease.<\/p>\n<p>Puppeteer is actively maintained by the Chrome team at Google, positioning it as the go-to choice for browser automation in the Node.js ecosystem. It simplifies the complexities of interacting with headless browsers, expanding the possibilities for web automation and testing, thus serving as an essential tool for many web development projects.<\/p>\n<h2>Some use cases of Puppeteer<\/h2>\n<p>Puppeteer is a powerful tool that allows for the automation of various tasks in a web browser, mirroring human actions. Here are a few examples to get you started utilizing Puppeteer for automation.<\/p>\n<ol>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Generate screenshots or PDFs of web pages.<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Data scraping.\u00a0<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Take a screenshot of websites daily or save HTML to create a history of website UI.<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Automate form submissions like feedback forms and contact us forms.\u00a0 <\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">UI testing or screenshots testing.\u00a0<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Emulate keyboard and mouse interactions. Though not recommended, some YouTubers use headless browsers to increase their views.\u00a0<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Performance testing of web pages.\u00a0<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Monitor uptime\/downtime.\u00a0<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Test Chrome Extensions.<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Sitemap generation.\u00a0<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">A\/B Testing using multiple versions of chromium. <\/span><\/li>\n<\/ol>\n<h2><strong>Growth<\/strong><\/h2>\n<div id=\"attachment_60384\" style=\"width: 718px\" class=\"wp-caption alignnone\"><img aria-describedby=\"caption-attachment-60384\" decoding=\"async\" loading=\"lazy\" class=\"wp-image-60384 size-full\" src=\"\/blog\/wp-ttn-blog\/uploads\/2024\/02\/pupeeter-year-growth.png\" alt=\"Puppeteer 5 years growth\" width=\"708\" height=\"360\" srcset=\"\/blog\/wp-ttn-blog\/uploads\/2024\/02\/pupeeter-year-growth.png 708w, \/blog\/wp-ttn-blog\/uploads\/2024\/02\/pupeeter-year-growth-300x153.png 300w, \/blog\/wp-ttn-blog\/uploads\/2024\/02\/pupeeter-year-growth-624x317.png 624w\" sizes=\"(max-width: 708px) 100vw, 708px\" \/><p id=\"caption-attachment-60384\" class=\"wp-caption-text\">Puppeteer 5 years growth<\/p><\/div>\n<p>Over the past five years, the demand for Puppeteer appears to be on an upward trajectory. With the power of AI, we can expect a further increase in puppeteer and headless browsers.<\/p>\n<h2><span style=\"font-weight: 400;\"><strong>Competitors<\/strong> <\/span><\/h2>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-60385\" src=\"\/blog\/wp-ttn-blog\/uploads\/2024\/02\/competetiors-download.png\" alt=\"puppeteer's competitors\" width=\"850\" height=\"339\" srcset=\"\/blog\/wp-ttn-blog\/uploads\/2024\/02\/competetiors-download.png 850w, \/blog\/wp-ttn-blog\/uploads\/2024\/02\/competetiors-download-300x120.png 300w, \/blog\/wp-ttn-blog\/uploads\/2024\/02\/competetiors-download-768x306.png 768w, \/blog\/wp-ttn-blog\/uploads\/2024\/02\/competetiors-download-624x249.png 624w\" sizes=\"(max-width: 850px) 100vw, 850px\" \/><\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-60386\" src=\"\/blog\/wp-ttn-blog\/uploads\/2024\/02\/competetiors-stats.png\" alt=\"puppeteer's competitors stats\" width=\"850\" height=\"214\" srcset=\"\/blog\/wp-ttn-blog\/uploads\/2024\/02\/competetiors-stats.png 850w, \/blog\/wp-ttn-blog\/uploads\/2024\/02\/competetiors-stats-300x76.png 300w, \/blog\/wp-ttn-blog\/uploads\/2024\/02\/competetiors-stats-768x193.png 768w, \/blog\/wp-ttn-blog\/uploads\/2024\/02\/competetiors-stats-624x157.png 624w\" sizes=\"(max-width: 850px) 100vw, 850px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">From nearly\u00a0 1 million downloads in 2019 to close to 5 million downloads, it&#8217;s quite evident that the use of browser automation is in high demand. Meanwhile, the demand for selenium seems to be on a downward trend.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Let&#8217;s compare some of the puppeteer&#8217;s competitors using the below table.<\/span><\/p>\n<table>\n<tbody>\n<tr>\n<td><b>Feature<\/b><\/td>\n<td><b>Puppeteer<\/b><\/td>\n<td><b>Cypress<\/b><\/td>\n<td><b>Playwright<\/b><\/td>\n<td><b>Selenium<\/b><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">Type<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Node.js library for controlling headless browsers<\/span><\/td>\n<td><span style=\"font-weight: 400;\">End-to-end testing framework for web applications<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Node.js library for cross-browser testing<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Open-source automation testing framework<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">Browser Support<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Primarily supports Chromium<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Mainly supports Chromium<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Supports Chromium, WebKit, and Firefox<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Supports multiple browsers including Firefox, Chrome, IE, Safari<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">Automation Type<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Ideal for web scraping and automation tasks<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Focuses on end-to-end testing of web applications<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Designed for cross-browser testing<\/span><\/td>\n<td><span style=\"font-weight: 400;\">General-purpose automation testing<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">Testing<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Does not have built-in testing features<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Includes a built-in testing framework<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Provides built-in testing capabilities<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Offers extensive testing capabilities<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">Performance<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Fast and efficient<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Fast and efficient<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Fast and efficient<\/span><\/td>\n<td><span style=\"font-weight: 400;\">May have slower performance due to overhead<\/span><\/td>\n<\/tr>\n<tr>\n<td><span style=\"font-weight: 400;\">Community<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Active community support<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Backed by a strong community<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Active community support<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Established community support<\/span><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>&nbsp;<\/p>\n<h2><strong>Puppeteer architecture<\/strong><\/h2>\n<p><span style=\"font-weight: 400;\">The Puppeteer architecture can be categorized as a pyramid, symbolizing its hierarchical organization and components. Here&#8217;s how the Puppeteer architecture aligns with a pyramid model:<\/span><\/p>\n<h3><b>Base Layer &#8211; Node.js Environment<\/b><\/h3>\n<p><span style=\"font-weight: 400;\">Node.js makes the foundation at the pyramid&#8217;s base, serving as the fundamental framework for Puppeteer. Node.js provides the runtime environment that is required to execute Puppeteer scripts and facilitate communication with the browser.<\/span><\/p>\n<h3><b>Middle Layer &#8211; DevTools Protocol and Headless Browser<\/b><\/h3>\n<p><span style=\"font-weight: 400;\">The middle layer consists of the DevTools Protocol and the headless browser (basically the Chromium driver), which Puppeteer controls. Puppeteer communicates with the browser using DevTools protocol to enable automated interactions with web pages.<\/span><\/p>\n<h3><b>Upper Layer &#8211; Page and Browser Instances<\/b><\/h3>\n<p><span style=\"font-weight: 400;\">At the top, The Puppeteer creates an instance of the browser and returns a page. Using the browser instance, developers can manipulate web page elements, emulate mouse\/keyboard operations, navigate websites, and manage browser actions using Puppeteer&#8217;s intuitive API.<\/span><\/p>\n<h2><strong>Puppeteer Deep dive<\/strong><\/h2>\n<p><span style=\"font-weight: 400;\">Enough of basics for now; it&#8217;s time to delve deeper into development using Puppeteer. Let&#8217;s learn how to install Puppeteer, initialize it, create a PDF, capture a screenshot, and explore some basic optimizations, tips, and tricks.<\/span><\/p>\n<ul>\n<li>Install puppeteer.<\/li>\n<\/ul>\n<pre><code> npm install puppeteer<\/code><\/pre>\n<ul>\n<li>Launch puppeteer.<\/li>\n<\/ul>\n<pre><code>\r\nconst puppeteer = require('puppeteer');\r\n(async () =&gt; {\r\n\r\nconst browser = await puppeteer.launch();\r\n\r\nconst page = await browser.newPage();\r\nawait page.goto('https:\/\/www.tothenew.com\/about-us');\r\n\r\n})();\r\n<\/code><\/pre>\n<p>This is the minimum code required to open a webpage using Puppeteer. However, do not worry if you cannot see anything after trying the above code. This is because Puppeteer will open Chromium in headless mode. Let&#8217;s try again with headful mode.<\/p>\n<pre><code> const browser = await puppeteer.launch({\r\n        headless: false, \/\/ Headful mode\r\n    });<\/code><\/pre>\n<p><span style=\"font-weight: 400;\">If you see the below output, congratulations, you have won half the battle. If not, ensure that the installation was successful. If you face issues downloading Chromium drivers, you can pass the executablePath in the launch options. <\/span><\/p>\n<p><span style=\"font-weight: 400;\">Refer to <\/span><a href=\"https:\/\/pptr.dev\/api\/puppeteer.launchoptions\"><span style=\"font-weight: 400;\">https:\/\/pptr.dev\/api\/puppeteer.launchoptions<\/span><\/a><span style=\"font-weight: 400;\"> for more information.<\/span><\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-60387 size-full\" src=\"\/blog\/wp-ttn-blog\/uploads\/2024\/02\/pupeeter-to-the-new.png\" alt=\"pupeeter-to-the-new-webpage\" width=\"564\" height=\"360\" srcset=\"\/blog\/wp-ttn-blog\/uploads\/2024\/02\/pupeeter-to-the-new.png 564w, \/blog\/wp-ttn-blog\/uploads\/2024\/02\/pupeeter-to-the-new-300x191.png 300w\" sizes=\"(max-width: 564px) 100vw, 564px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">Do not forget to close the browser after completing your automation tasks. To do so, ensure that you include the following line at the end of your script:<\/span><\/p>\n<pre><code> await browser.close();\r\n<\/code><\/pre>\n<h3><span style=\"font-weight: 400;\"><strong>Create a PDF of a webpage<\/strong> <\/span><\/h3>\n<p>To generate a webpage PDF, simply add the code line below, and Puppeteer will save it to the path you provide. Optionally, you can omit the path if you prefer to work with buffers.<\/p>\n<pre><code> await page.pdf({ path: 'firstPdf.pdf' }) <\/code><\/pre>\n<h3><span style=\"font-weight: 400;\"><strong>Take Screenshots of the webpage<\/strong> <\/span><\/h3>\n<p><span style=\"font-weight: 400;\">You can capture a screenshot of the entire webpage by including fullPage in the screenshot options. In this tutorial, we will take a screenshot of a specific webpage section. To achieve this, you can either scroll through the document or utilize the native HTML scrollIntoView method with the element.<\/span><\/p>\n<pre><code> \r\n(async () =&gt; {\r\n    const browser = await puppeteer.launch({\r\n        headless: false\r\n    });\r\n\r\n    const page = await browser.newPage();\r\n    await page.goto('https:\/\/www.tothenew.com');\r\n\r\n    await page.waitForNetworkIdle(0)\r\n    await page.evaluate(async () =&gt; {\r\n        document.querySelector('#block-homepage address').scrollIntoView();\r\n    });\r\n    await new Promise((resolve) =&gt; setTimeout(resolve, 1500))\r\n    await page.screenshot({ path: 'screenshot.png' })\r\n\r\n    await browser.close()\r\n\r\n})();\r\n<\/code><\/pre>\n<p>In the above code, with <code>page.waitForNetworkIdle(0)<\/code>, we are waiting for the network to become idle, ensuring that the webpage has finished loading all files, including images.<\/p>\n<p>The <code>page.evaluate<\/code> function evaluates a function in the page&#8217;s context and returns the result. Essentially, it allows running JavaScript scripts within the browser console. In this case, we are using the scrollIntoView method to scroll our webpage to the provided selector.<\/p>\n<p>Additionally, a deliberate 1500 ms delay has been added to ensure that the webpage scrolling is completed before taking the screenshot.<\/p>\n<h2><strong>Conclusion<\/strong><\/h2>\n<p><span style=\"font-weight: 400;\">There is a lot you can do with Puppeteer. You can emulate a mouse and keyboard, which will help you fill out forms, click on buttons, and interact with web pages. You can refer to Puppeteer for prebuilt methods at https:\/\/pptr.dev\/, or you can write your custom methods.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Puppeteer is a powerful Node.js library that offers a wide range of high-level APIs for controlling headless browsers. By leveraging Puppeteer, you can automate tasks such as capturing webpage screenshots, web scraping, creating PDF documents effortlessly, interacting with web pages, and testing your application. The ability to run Puppeteer in headful mode allows you to visually observe browser interactions, making it a versatile tool for web development, testing, and automation. Additionally, Puppeteer&#8217;s strong community support and efficient handling of asynchronous operations through events and promises enhance its programming capabilities to interact with web pages.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Overall, Puppeteer seems to be the go-to solution for a broad range of web-related tasks. While there may be potential drawbacks to using Puppeteer, and in some cases, its competitors may offer better solutions, I believe there is still much to explore and achieve with Puppeteer.<\/span><\/p>\n<p>Stay tuned for more upcoming blogs on similar topics.<\/p>\n<div class=\"ap-custom-wrapper\"><\/div><!--ap-custom-wrapper-->","protected":false},"excerpt":{"rendered":"<p>A headless browser refers to a web browser that functions without a graphical user interface (GUI), making it suitable for activities like web scraping, automated testing, and other web-related tasks which you can automate. Headless browsers are typically used for tasks such as web scraping, automated testing, and other web-related activities requiring little or no [&hellip;]<\/p>\n","protected":false},"author":1722,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"iawp_total_views":87},"categories":[4684,1185,1],"tags":[5663,5665,5661,5662,5666,5664,5667,5660],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts\/60388"}],"collection":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/users\/1722"}],"replies":[{"embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/comments?post=60388"}],"version-history":[{"count":3,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts\/60388\/revisions"}],"predecessor-version":[{"id":60557,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts\/60388\/revisions\/60557"}],"wp:attachment":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/media?parent=60388"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/categories?post=60388"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/tags?post=60388"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}