Playwright is a powerful browser automation tool from Microsoft, used for testing, scraping, or automating web interactions. But sometimes, you don’t want to load everything, especially when you're scraping content or speeding up test execution.
Unnecessary resources like JavaScript, stylesheets, images, videos, and even ads can:
- Slow down page loading
- Consume extra bandwidth
- Add noise to your scraping data
✅ Syntax to block CSS file in playwright
await page.route('**/*.css', (route) => { // and abort the request route.abort(); });
✅ Syntax to block JS file in playwright
await context.route('**/*.js', (route) => route.abort());
✅ Block Requests by Domain
await page.route('**/*', (route) => { // block all traffic from the offending domain if (route.request().url().includes('www.yahoo.com')) { return route.abort(); } // allow all other traffic through route.continue(); });
✅ Block Requests by Content Type
await page.route('**/*', (route) => { if (route.request().resourceType() === 'image') { return route.abort(); } route.continue(); });
✅ Block Requests by Arbitrary Logic
await page.route('**/*', (route) => { const req = route.request(); // block by method if (req.method() === 'DELETE') { return route.abort(); } // block by header if (req.allHeaders()['X-Source']?.includes('dangerous')) { return route.abort(); } // block by body if (req.postDataJSON()?.length >= 3) { return route.abort(); } route.continue(); });
Block Requests for a Single Page
In this example we will see how to block request for single page. Playwright Page class provides a method for monitoring traffic and using that we can control the traffic in single page.
import { test, expect } from '@playwright/test'; import { chromium } from 'playwright'; const browser = await pw.chromium.launch(); const context = await browser.newContext(); const page = await context.newPage(); // watch traffic matching a pattern await page.route('**/*.css', (route) => { // and abort the request route.abort(); });
Block Requests Across All Pages
In this cases, rather than setting route handlers on the Page object, you can instead set handlers on the Context object. This goes for route() as well as unroute(). But the syntax is exactly the same.
import { test, expect } from '@playwright/test'; import { chromium } from 'playwright'; const browser = await pw.chromium.launch(); const context = await browser.newContext(); const page = await context.newPage(); // watch the entire browser context await context.route('**/*.js', (route) => route.abort()); // no JS loaded anywhere! const page1 = await context.newPage(); await page1.goto('/'); const page2 = await context.newPage(); await page2.goto('/'); // enable JS on future requests await context.unroute('**/*.js');
This is all about how to intercept and block these resources using Playwright's built-in request routing.
No comments:
Post a Comment