Wednesday, May 21, 2025

Verifying PDF file data in Playwright

In this example we will explore how to verify pdf text in playwright. Playwright, an open-source automation framework, does not have built-in PDF validation capabilities, but you can validate PDF content using third-party Node.js libraries like pdf-parse, we can achieve this by importing pdf-parse library in playwright project.
Verifying PDF file data in Playwright

To install pdf-parse library 

we need to run below node comment to install the library in playwright project.

npm install pdf-parse


In this example we will Automate clicking a download button, save the PDF, and verify its text content or number of pages.

import pdfjs from "pdf-parse";
import { test, expect } from "playwright/test";
const fs = require("fs");

test("pdf verification example", async ({ page }) => {
  
  await page.goto("https://examplefile.com/document/pdf/1-mb-pdf");
  const filePath = "../download";
    
  // Start waiting for download before clicking.
  const downloadPromise = page.waitForEvent("download");
  await page.locator("[class='lnr lnr-download']").click();
  const download = await downloadPromise;
  await download.saveAs(filePath);
  
  const dataBuffer = fs.readFileSync(filePath);
  await pdfjs(dataBuffer).then((data) => {
    // PDF text
    console.log(data.text);
    // PDF info
    console.log(data.info);
    // PDF metdata
    console.log(data.metadata);
    // number of pages
    console.log(data.numpages);
    expect(data.text).toContain(`can save time and effort in your workflow. Download your free 1 MB sample PDF file today and start`);
    expect(data.numpages).toEqual(324);
  });
});


Output :

Verifying PDF file data in Playwright

This is all about pdf validation in playwright.


No comments:

Post a Comment