TinyFish
Search
Fast, structured web search
Fetch
Any URL to clean content
Agent
Multi-step web automation
Browser
Stealth Chromium sessions
All products share one API keyView docs →
Documentation
API reference and guides
Integrations
Connect with your stack
Blog
Product updates and insights
Cookbook
Open-source examples
Pricing
Overview
Enterprise-grade web data
Use Cases
What teams are building
Customers
See who builds with TinyFish
Log InContactContact
Products
SearchFast, structured web search
FetchAny URL to clean content
AgentMulti-step web automation
BrowserStealth Chromium sessions
Resources
DocumentationAPI reference and guides
IntegrationsConnect with your stack
BlogProduct updates and insights
CookbookOpen-source examples
PricingPlans, credits, and billing
Enterprise
OverviewEnterprise-grade web data
Use CasesWhat teams are building
CustomersSee who builds with TinyFish
Log InContact
TinyFish

Web APIs built for agents.

Product
  • Enterprise
  • Use Cases
  • Customers
  • Pricing
  • Integrations
  • Docs
  • Trust
Resources
  • Cookbook
  • Blog
  • Current
  • Accelerator
Connect
  • X/Twitter
  • LinkedIn
  • Discord
  • GitHub
  • Contact Us
© 2026 TinyFish·Privacy·Terms
Engineering

Scraping Dynamic Websites: When Playwright Is the Right Tool (And When It Isn't)

TinyFishie·TinyFish Observer·Apr 21, 2026·8 min read
Share
Scraping Dynamic Websites: When Playwright Is the Right Tool (And When It Isn't)

Playwright is the correct answer to scraping JavaScript-heavy websites. Until it isn't.

For single tasks, small volumes, and sites you control or monitor closely, Playwright handles dynamic content better than any alternative. It executes JavaScript, waits for network requests to complete, handles SPAs, and gives you CDP-level control over every interaction. The answer to "how do I scrape a dynamic website?" is almost always Playwright or a managed version of it.

But there's a point where Playwright stops being the right tool—not because something better exists for the same problem, but because the problem itself has changed.

Playwright is right when:

  1. You're scraping one site, up to a few hundred pages
  2. The target doesn't require persistent authentication across runs
  3. You have engineering time to maintain selectors when sites update
  4. Concurrency of 2–5 browsers is sufficient

The problem has changed when:

  1. You need 50+ concurrent sessions without infrastructure management
  2. You need authenticated sessions that persist across multiple runs
  3. Layout changes at the target keep breaking your scripts
  4. You're running against bot-protected sites at production volume

Where Playwright Solves the Problem

The core issue with scraping dynamic websites is that HTTP requests don't get you the final DOM—they get you the HTML skeleton that JavaScript populates after load. Playwright launches a real Chromium instance, executes the JavaScript, and returns the fully-rendered DOM. This solves the fundamental problem.

For developer tools documentation, SaaS product pages, React-rendered content, and most modern sites that use frameworks to populate content client-side, Playwright is the right call. The API is clean, the tooling (Trace Viewer, codegen) is excellent, and Python and TypeScript bindings are both mature.

Crawl4AI, Selenium, and Splash all solve the same rendering problem with different trade-offs. Selenium has broader language support and legacy ecosystem depth. CrawlAI outputs LLM-ready markdown natively. Splash integrates with Scrapy. Playwright tends to win on developer experience and execution speed for new projects.

Where the Problem Gets Harder

Rendering JavaScript is step one. The harder problems show up at step two.

Authentication and session state. Scraping authenticated portals—supplier pricing pages, internal dashboards, gated data—requires maintaining session state across runs. Playwright supports this through persistent browser profiles and context storage, but the implementation isn't trivial: you need to handle login flows, session expiry, re-authentication on timeout, and cookie management. Multiply this across 50 different portals with different session architectures and it becomes significant engineering work.

Selector maintenance. Every page.locator() call you write is a dependency on the current DOM structure of the target site. When the site redesigns—which happens on a schedule you don't control—your selectors break. This is the largest hidden cost of Playwright-based scraping at scale: not the initial development, but the ongoing maintenance of selectors against sites that update without notice. A 50-site scraping operation at realistic update frequencies means debugging broken selectors several times per week.

Concurrency economics. Playwright instances consume significant memory—each headless Chrome process runs 100–300MB depending on the page. Running 10 concurrent instances on a standard server is manageable. Running 50 starts to require dedicated infrastructure, session pooling, and process management. Running 200 in parallel requires a distributed architecture. This overhead is real and scales linearly with the number of concurrent tasks.

Anti-bot evolution. Sites protected by DataDome, Kasada, or Cloudflare Bot Management update their detection logic on a schedule that isn't public. A Playwright script that worked last month may fail today not because you changed anything, but because the detection system got smarter. Managing this requires either ongoing tuning or a solution that maintains the counter-detection layer for you.

Comparison diagram showing Playwright's maintenance loop versus agent-based approach

What Works Beyond This Point

The threshold where Playwright stops being the right tool is roughly: when the engineering overhead of maintaining the scraping infrastructure costs more than the value of the data, or when you need concurrency and session management at a scale that requires dedicated infrastructure.

At that point, the options are:

Managed scraping services (Zyte, Scrapfly, Firecrawl): Handle browser rendering in the cloud. You still write extraction logic, but you don't manage browsers. Better for extraction at scale; still selector-dependent.

AI-powered extraction (Firecrawl /extract, Crawl4AI with AI models): Use LLMs to identify and extract data without explicit selectors. More resilient to layout changes. Works well when the target structure is consistent enough for a model to generalize.

AI web agents (TinyFish): You describe what you want in plain English; the agent handles navigation, rendering, authentication, and extraction. No selectors to maintain. Effective for multi-step workflows and authenticated portals where writing and maintaining explicit navigation logic is the primary cost. The trade-off is cost per task compared to optimized custom scripts.

The honest decision framework is about where your engineering time goes. If you're spending more time maintaining scrapers than using the data, the architecture has shifted.

A Practical Decision Point

Playwright stays the right tool when:

  • Target is public, mostly-static, or has a predictable DOM structure
  • Volume is low enough that selector maintenance is manageable
  • You need direct CDP control for specific interactions
  • Per-request cost needs to be minimized

The architecture shifts when:

  • Authentication is required across many targets
  • Selectors break frequently enough that maintenance becomes the primary engineering task
  • You need 50+ concurrent tasks without building infrastructure
  • The target has serious bot protection that requires ongoing counter-measures

TinyFish is designed for the second set of cases. Test it against your actual authenticated, protected, or multi-step targets: 500 free steps, no credit card, no setup.

**Start your free trial →**

---

FAQ

Can I scrape dynamic websites without Playwright?

Yes, but the right alternative depends on what "dynamic" means for your target. If the site loads content via JavaScript after the initial page load, you need a tool that executes JavaScript—Playwright, Selenium, Puppeteer, or a managed service like Firecrawl. If the site just has complex DOM structure but renders server-side, you can use requests + BeautifulSoup without a full browser. The key question is whether the data you need exists in the initial HTML response or only after client-side JavaScript executes.

Why does my Playwright script work locally but fail in production?

Several common causes: headless Chrome behaves differently from headed Chrome in ways that some sites detect (missing GPU, different canvas fingerprint, different WebGL signatures). Production environments often have different IP ranges that are more likely to be flagged. Anti-bot systems that allow low-volume traffic from a single IP will block the same traffic at higher volume or from cloud IP ranges. Resource constraints in containerized environments also cause timing issues that don't appear locally.

What's the difference between Playwright and Selenium for dynamic scraping?

Both execute JavaScript in a real browser. Playwright is generally faster (async architecture, parallel contexts), has better developer tooling (Trace Viewer, codegen), and handles modern async patterns more cleanly. Selenium has broader language support (Java, C#, Ruby) and a larger ecosystem of existing test infrastructure. For new Python or TypeScript scraping projects, Playwright is usually the better choice. For Java enterprise teams or projects with existing Selenium investment, Selenium's advantages outweigh the migration cost.

How do I scrape a site that requires login?

With Playwright: save cookies/session storage after logging in and restore them on subsequent runs. Playwright supports this through page.context().storageState(). The challenge at scale is managing session expiry, re-authentication, and session state across many targets simultaneously. For authenticated scraping across many portals, AI agents that handle login as part of the goal description are often more practical than maintaining session management logic per-site.

When should I use Firecrawl instead of Playwright?

Firecrawl is better when your goal is extracting clean, LLM-ready content from public pages—documentation, blog posts, product pages. It abstracts browser management and outputs clean markdown natively. Playwright is better when you need fine-grained control over interactions, authenticated access, or behavior that Firecrawl's extraction layer doesn't support. For anything requiring multi-step navigation or login, neither Firecrawl nor standard Playwright is purpose-built—that's where agents are more practical.

Related Reading

  • Pillar: The Best Web Scraping Tools in 2026
  • Why Your Stealth Plugin Isn't Working (And What Actually Does)
  • Best Puppeteer Alternatives for Browser Automation in 2026
  • What Is a Web Agent? The Complete Guide
Get started

Start building.

No credit card. No setup. Run your first operation in under a minute.

Get 500 free creditsRead the docs
More Articles
Building a Browser for the Agent Era
Engineering

Building a Browser for the Agent Era

Max Luong·Apr 14, 2026
Production-Grade Web Fetching for AI Agents
Engineering

Production-Grade Web Fetching for AI Agents

Chenlu Ji·Apr 14, 2026
Why Stitched Web Stacks Fail in Production
Product and Integrations

Why Stitched Web Stacks Fail in Production

Keith Zhai·Apr 14, 2026