Scraping Dynamic Websites: When Playwright Is the Right Tool (And When It Isn't)

The TinyFish teamApr 21, 2026·Updated May 18, 2026·8 min read

Playwright is the correct answer to scraping JavaScript-heavy websites. Until it isn't.

For single tasks, small volumes, and sites you control or monitor closely, Playwright handles dynamic content better than any alternative. It executes JavaScript, waits for network requests to complete, handles SPAs, and gives you CDP-level control over every interaction. The answer to "how do I scrape a dynamic website?" is almost always Playwright or a managed version of it.

But there's a point where Playwright stops being the right tool—not because something better exists for the same problem, but because the problem itself has changed.

Playwright is right when:

You're scraping one site, up to a few hundred pages
The target doesn't require persistent authentication across runs
You have engineering time to maintain selectors when sites update
Concurrency of 2–5 browsers is sufficient

The problem has changed when:

You need 50+ concurrent sessions without infrastructure management
You need authenticated sessions that persist across multiple runs
Layout changes at the target keep breaking your scripts
You're running against bot-protected sites at production volume

Where Playwright Solves the Problem

The core issue with scraping dynamic websites is that HTTP requests don't get you the final DOM—they get you the HTML skeleton that JavaScript populates after load. Playwright launches a real Chromium instance, executes the JavaScript, and returns the fully-rendered DOM. This solves the fundamental problem.

For developer tools documentation, SaaS product pages, React-rendered content, and most modern sites that use frameworks to populate content client-side, Playwright is the right call. The API is clean, the tooling (Trace Viewer, codegen) is excellent, and Python and TypeScript bindings are both mature.

Crawl4AI, Selenium, and Splash all solve the same rendering problem with different trade-offs. Selenium has broader language support and legacy ecosystem depth—for a detailed side-by-side, see our Playwright vs Selenium comparison. CrawlAI outputs LLM-ready markdown natively. Splash integrates with Scrapy. Playwright tends to win on developer experience and execution speed for new projects.

Where the Problem Gets Harder

Rendering JavaScript is step one. The harder problems show up at step two.

Authentication and session state. Scraping authenticated portals—supplier pricing pages, internal dashboards, gated data—requires maintaining session state across runs. Playwright supports this through persistent browser profiles and context storage, but the implementation isn't trivial: you need to handle login flows, session expiry, re-authentication on timeout, and cookie management. Multiply this across 50 different portals with different session architectures and it becomes significant engineering work.

Selector maintenance. Every page.locator() call you write is a dependency on the current DOM structure of the target site. When the site redesigns—which happens on a schedule you don't control—your selectors break. This is the largest hidden cost of Playwright-based scraping at scale: not the initial development, but the ongoing maintenance of selectors against sites that update without notice. A 50-site scraping operation at realistic update frequencies means debugging broken selectors several times per week.

Concurrency economics. Playwright instances consume significant memory—each headless Chrome process runs 100–300MB depending on the page. Running 10 concurrent instances on a standard server is manageable. Running 50 starts to require dedicated infrastructure, session pooling, and process management. Running 200 in parallel requires a distributed architecture. This overhead is real and scales linearly with the number of concurrent tasks.

Anti-bot evolution. Sites protected by enterprise bot management systems update their detection logic on a schedule that isn't public. A Playwright script that worked last month may fail today not because you changed anything, but because the detection system got smarter. Managing this requires either ongoing tuning or a solution that handles infrastructure-level execution requirements for you.

What Works Beyond This Point

The threshold where Playwright stops being the right tool is roughly: when the engineering overhead of maintaining the scraping infrastructure costs more than the value of the data, or when you need concurrency and session management at a scale that requires dedicated infrastructure.

At that point, the options are:

Managed scraping services (Zyte, Scrapfly, Firecrawl): Handle browser rendering in the cloud. You still write extraction logic, but you don't manage browsers. Better for extraction at scale; still selector-dependent.

AI-powered extraction (Firecrawl /extract, Crawl4AI with AI models): Use LLMs to identify and extract data without explicit selectors. More resilient to layout changes. Works well when the target structure is consistent enough for a model to generalize.

AI web agents (TinyFish): You describe what you want in plain English; the agent handles navigation, rendering, authentication, and extraction. No selectors to maintain. Effective for multi-step workflows and authenticated portals where writing and maintaining explicit navigation logic is the primary cost. The trade-off is cost per task compared to optimized custom scripts.

The honest decision framework is about where your engineering time goes. If you're spending more time maintaining scrapers than using the data, the architecture has shifted.

A Practical Decision Point

Playwright stays the right tool when:

Target is public, mostly-static, or has a predictable DOM structure
Volume is low enough that selector maintenance is manageable
You need direct CDP control for specific interactions
Per-request cost needs to be minimized

The architecture shifts when:

Authentication is required across many targets
Selectors break frequently enough that maintenance becomes the primary engineering task
You need 50+ concurrent tasks without building infrastructure
The target has strict automation requirements that require ongoing counter-measures

TinyFish is designed for the second set of cases. Test it against your actual authenticated, protected, or multi-step targets: 500 free steps, no credit card, no setup.

Start your free trial →

FAQ

Can I scrape dynamic websites without Playwright?

Dynamic websites can be scraped without Playwright — the right approach depends on how content loads on the target site. If the site loads content via JavaScript after the initial page load, you need a tool that executes JavaScript—Playwright, Selenium, Puppeteer, or a managed service like Firecrawl. If the site just has complex DOM structure but renders server-side, you can use requests + BeautifulSoup without a full browser. The key question is whether the data you need exists in the initial HTML response or only after client-side JavaScript executes.

Why does my Playwright script work locally but fail in production?

Several common causes: headless Chrome behaves differently from headed Chrome in ways that some sites detect (missing GPU, different canvas fingerprint, different WebGL signatures). Production environments often have different IP ranges that are more likely to be flagged. Anti-bot systems that allow low-volume traffic from a single IP will block the same traffic at higher volume or from cloud IP ranges. Resource constraints in containerized environments also cause timing issues that don't appear locally.

What's the difference between Playwright and Selenium for dynamic scraping?

Both execute JavaScript in a real browser. Playwright is generally faster (async architecture, parallel contexts), has better developer tooling (Trace Viewer, codegen), and handles modern async patterns more cleanly. Selenium has broader language support (Java, C#, Ruby) and a larger ecosystem of existing test infrastructure. For new Python or TypeScript scraping projects, Playwright is usually the better choice. For Java enterprise teams or projects with existing Selenium investment, Selenium's advantages outweigh the migration cost.

How do I scrape a site that requires login?

With Playwright: save cookies/session storage after logging in and restore them on subsequent runs. Playwright supports this through page.context().storageState(). The challenge at scale is managing session expiry, re-authentication, and session state across many targets simultaneously. For authenticated scraping across many portals, AI agents that handle login as part of the goal description are often more practical than maintaining session management logic per-site.

When should I use Firecrawl instead of Playwright?

Firecrawl is better when your goal is extracting clean, LLM-ready content from public pages—documentation, blog posts, product pages. It abstracts browser management and outputs clean markdown natively. Playwright is better when you need fine-grained control over interactions, authenticated access, or behavior that Firecrawl's extraction layer doesn't support. For anything requiring multi-step navigation or login, neither Firecrawl nor standard Playwright is purpose-built—that's where agents are more practical.

Scraping Dynamic Websites: When Playwright Is the Right Tool (And When It Isn't)

Where Playwright Solves the Problem

Where the Problem Gets Harder

What Works Beyond This Point

A Practical Decision Point

FAQ

Can I scrape dynamic websites without Playwright?

Why does my Playwright script work locally but fail in production?

What's the difference between Playwright and Selenium for dynamic scraping?

How do I scrape a site that requires login?

When should I use Firecrawl instead of Playwright?

Related Reading

Start building.

Context vs. Tokens in Agentic Coding: When to Use Subagents, and When to Fork

TinyFish Vault: Your Web Agent Can Now Log In Without Touching Your Passwords

80% of your Web Fetch returns Junk