Tuesday, April 29, 2025

Block Scripts, Styles, Media in Playwright

 Playwright is a powerful browser automation tool from Microsoft, used for testing, scraping, or automating web interactions. But sometimes, you don’t want to load everything, especially when you're scraping content or speeding up test execution.

Unnecessary resources like JavaScript, stylesheets, images, videos, and even ads can:

  1. Slow down page loading
  2. Consume extra bandwidth
  3. Add noise to your scraping data
Block Scripts, Styles, Media in Playwright




 Syntax to block CSS file in playwright

await page.route('**/*.css', (route) => {
  // and abort the request
  route.abort();
});

 Syntax to block JS file in playwright

await context.route('**/*.js', (route) => route.abort());

 Block Requests by Domain

await page.route('**/*', (route) => {
  // block all traffic from the offending domain
  if (route.request().url().includes('www.yahoo.com')) {
    return route.abort();
  }

  // allow all other traffic through
  route.continue();
});


 Block Requests by Content Type

await page.route('**/*', (route) => {
  if (route.request().resourceType() === 'image') {
    return route.abort();
  }

  route.continue();
});

 Block Requests by Arbitrary Logic
await page.route('**/*', (route) => {
  const req = route.request();

  // block by method
  if (req.method() === 'DELETE') {
    return route.abort();
  }

  // block by header
  if (req.allHeaders()['X-Source']?.includes('dangerous')) {
    return route.abort();
  }

  // block by body
  if (req.postDataJSON()?.length >= 3) {
    return route.abort();
  }

  route.continue();
});

Block Requests for a Single Page

In this example we will see how to block request for single page. Playwright Page class provides a method for monitoring traffic and using that we can control the traffic in single page.

import { test, expect } from '@playwright/test';
import { chromium } from 'playwright';

const browser = await pw.chromium.launch();
const context = await browser.newContext();
const page = await context.newPage();

// watch traffic matching a pattern
await page.route('**/*.css', (route) => {
  // and abort the request
  route.abort();
});


Block Requests Across All Pages

In this cases, rather than setting route handlers on the Page object, you can instead set handlers on the Context object. This goes for route() as well as unroute(). But the syntax is exactly the same.
import { test, expect } from '@playwright/test';
import { chromium } from 'playwright';

const browser = await pw.chromium.launch();
const context = await browser.newContext();
const page = await context.newPage();

// watch the entire browser context
await context.route('**/*.js', (route) => route.abort());

// no JS loaded anywhere!
const page1 = await context.newPage();
await page1.goto('/');
const page2 = await context.newPage();
await page2.goto('/');

// enable JS on future requests
await context.unroute('**/*.js');


This is all about how to intercept and block these resources using Playwright's built-in request routing.

No comments:

Post a Comment