Saturday, 1 February 2020

Headless (UI-less) Web Browsing, Crawling with Node.js, Puppeteer and Chromium

Top web automation library for Node.js:
  • Puppeteer, no doubts, by Google, optimal and fast
  • Installation:
    • npm install puppeteer
    • Or:
    • npm install puppeteer-core
Puppeteer options:
  1. Use Google Chrome?
    It's not just 1-line command to install Google Chrome, required to create repo file for Linux package managers to install, use Chromium simpler and better.
  2. Use Chromium binary installed together with puppeteer:
    npm install puppeteer
    Seems not working on CentOS
  3. Use separate Chromium binary:
    sudo yum install chromium -y #CentOS
    sudo apt install chromium-browser -y #Ubuntu
    npm install puppeteer-core

    Works OK on all Linuxes
Chromium binary paths:
  1. CentOS
    /usr/lib64/chromium-browser/chromium-browser
    Or just:
    /usr/bin/chromium-browser
  2. Ubuntu
    /usr/bin/chromium-browser
Test Puppeteer with Chromium binary:

//Libs
import puppeteer_core from "puppeteer-core";

//Shortcuts
var log       = console.log;
var puppeteer = puppeteer_core;

//Constants
//Browser path
var BPATH ="/usr/bin/chromium-browser";

//TEST
(async ()=>{
  var Browser = await puppeteer.launch({executablePath:BPATH});
  var Page    = await Browser.newPage();

  //Load a page and add jQuery
  await Page.goto("https://google.com");
  await Page.addScriptTag({
    url: "https://code.jquery.com/jquery-3.2.1.min.js"
  });

  //Wait 5s for load
  await new Promise(Resolve => setTimeout(Resolve,5000));

  //Evaluate some JS
  var Result = await Page.evaluate(`$.fn.jquery`);
  log(Result);

  //Close browser
  Browser.close();
})();
//EOF

No comments:

Post a Comment