
Most web automation tool comparisons treat page volume as a footnote. It isn't.
The tool that handles 500 pages a day beautifully will silently degrade at 50,000. The infrastructure that's cost-effective at 10,000 pages becomes the most expensive option in the room at 500,000. And the free tier that feels like a reasonable starting point has a ceiling that catches most teams by surprise somewhere in the middle of a project.
Page volume and access requirement level are the two primary variables that determine your tool decision — more than AI capability, ease of use, or no-code vs. code. Get either wrong and you're either paying for infrastructure you don't need or running a pipeline that breaks under load exactly when it matters.
This guide maps each volume tier to the tools that actually work at that scale, with real cost estimates at each level so you can make the comparison with numbers rather than intuition.
Quick decision rules before the detail:
Before matching tools to volume tiers, you need an accurate number. Teams consistently underestimate this, and the underestimate is what causes mid-project tool switches.
The formula:
Daily pages = (number of target URLs) × (crawl frequency per day) × (pages per URL path)A few scenarios to calibrate against:
The number that matters for tool selection isn't the total — it's the peak load your pipeline needs to sustain, and whether you need it done in a tight time window or can spread it across the day.
One-off research pulls. Small recurring monitors. Proof-of-concept scrapes before committing to a larger pipeline. A freelancer pulling a client's competitor catalog. A researcher collecting data from academic directories.
At this volume, almost any tool works. The decision is about convenience and your technical comfort level, not about infrastructure.
Free options that are genuinely capable here:
What to watch for: Free tiers hide their ceilings. ParseHub's 200-page-per-run limit is the one most teams hit mid-project. If your target has 250 product pages, you're already over the limit. Verify the ceiling against your actual target page count before building a workflow around any free tier.
| Tool | Monthly cost at ~500 pages/day | Notes |
|---|---|---|
| Web Scraper Extension | $0 | No scheduling, uses your IP |
| ParseHub | $0 (free tier) | 200 pages/run limit |
| Octoparse | $0 (free tier) | Local runs only |
| TinyFish | $0 (free tier) | 500 credits total, not per day |
| Scrapy (self-hosted) | $0 + server cost (~$5–10/mo VPS) | Requires Python setup |
A small team's recurring data feed. Daily price monitoring across dozens of sites. A startup's competitive intelligence pipeline. Most "we scrape data to inform our product decisions" use cases live here.
This is where free tiers run out and you start paying for infrastructure. The key trade-off at this volume is between simplicity (managed cloud tools) and cost efficiency (self-hosted frameworks).
Managed cloud tools (simpler, higher per-page cost):
The Browser API and Web Agent share the same credit pool, so you can mix both within one plan depending on what each target requires.
Self-hosted frameworks (more work, lower marginal cost):
What to watch for: At 1,000–10,000 pages/day, you're large enough that sites with strict access requirements become a more significant factor. A managed tool that includes proxy rotation (like TinyFish) absorbs that cost into the subscription. A self-hosted Scrapy setup needs a separate proxy budget — residential proxies (e.g., Bright Data) run ~$8/GB PAYG at this tier, which adds $20–80/month depending on page weight.
Estimated monthly cost at 5,000 pages/day:
| Tool | Base cost | Proxy cost | Estimated total/mo |
|---|---|---|---|
| Scrapy (self-hosted) | ~$20 (server) | $30–80 (if needed) | $20–100 |
| Apify (pay-as-you-go) | ~$40–60 (compute) | Separate | $40–140 |
| TinyFish Starter | $15 | Included | $15 |
| TinyFish Pro | $150 | Included | $150 |
Note: TinyFish pricing includes browsers, proxies, and AI inference. Apify and Scrapy costs are compute only — add proxy costs separately for protected sites.
A mid-size company's market intelligence operation. An e-commerce brand monitoring pricing across hundreds of competitor sites. A SaaS product that needs fresh web data as a core feature. This is where scraping stops being a side project and becomes infrastructure.
At this volume, the hidden cost of scraping is no longer the tool subscription — it's engineering time. Selector-based scrapers break when target sites update. Proxy pools need management. Failure monitoring becomes a dedicated function. The teams that underestimate this end up with a part-time engineer whose primary job is keeping the scraping pipeline alive.
Managed infrastructure wins on total cost here:
Browser API (billed per time — 1 credit = 4 minutes, minimum 1 minute):
10 sec/page → rounds up to 1 min → 0.25 credits/page
50,000 pages/day × 0.25 credits × 30 days = 375,000 credits/month
→ PAYG: ~$5,625/month | Pro plan (overage at $0.012/credit): ~$4,452/month
Web Agent (billed per step — 1 credit = 1 step; for complex multi-step workflows):
~3 steps/page × 50,000 pages/day × 30 days = 4,500,000 steps/month
→ PAYG: ~$67,500/month | Not practical for bulk simple extraction at this volume.
Most bulk extraction pipelines at this tier use the Browser API. Web Agent pricing is designed for complex authenticated workflows where the automation value per workflow justifies the cost.*
Self-hosted at this volume:
What to watch for: This is the volume tier where silent failure becomes a serious business problem. A pipeline that silently returns empty results for three days at 50,000 pages/day is a data quality incident, not a minor inconvenience. Factor monitoring and alerting into your tool evaluation — not just happy-path performance.
Estimated monthly cost at 50,000 pages/day, assuming a mixed target set of simple and JS-heavy sites requiring managed browser infrastructure:
| Tool | Estimated total/mo | Selector maintenance | Failure visibility |
|---|---|---|---|
| Scrapy + proxies | $2,000–2,300 ⁽¹⁾ | High (you own it) | Manual |
| Apify (custom Actors) | $500–900 | Medium (Actor updates) | Dashboard |
| Bright Data (proxy infra) | $4,500–6,000+ ⁽²⁾ | High (your scrapers) | Manual |
| TinyFish Browser API (PAYG) | ~$5,625 ⁽³⁾ | None | Built-in |
| TinyFish Browser API (Pro) | ~$4,452 ⁽³⁾ | None | Built-in |
⁽¹⁾ Scrapy estimate: ~$200–500/month compute (industry estimate, no official source; based on 3–5 VPS instances + job queue) + ~$1,800/month residential proxy for ~30% protected pages (15,000 pages/day × 500KB × 30 days = 225GB × $8/GB). Compute only would be $200–500/month — proxy is the larger cost at this volume.
⁽²⁾ Bright Data: residential proxy at $8/GB PAYG (source: brightdata.com, April 2026). 750GB/month for a mixed site set × $8 = $6,000/month.
⁽³⁾ TinyFish: based on tinyfish.ai/pricing (April 2026) + assumed 10 sec/page (minimum 1 min billing = 0.25 credits/page). Actual costs vary with page load time. See calculation detail in the section above.
The TinyFish number looks higher than Scrapy until you add engineering time. At $150/hour for a developer, 20 hours/month of maintenance is $3,000 — not in the tool budget, but real cost.
Enterprise-scale data operations. Google-scale inventory aggregation. A rideshare company collecting millions of pricing variables monthly. Financial services firms monitoring hundreds of regulatory portals in real time. This is not a side project.
At this volume, you're buying infrastructure, not tools. The question is whether you build it or buy it.
Build: A custom distributed scraping stack — Scrapy or custom crawlers running on Kubernetes, Bright Data or a private proxy pool for IP management, a data pipeline for cleaning and delivery. Engineering cost to build: 3–6 months of a senior engineer's time. Ongoing maintenance: a dedicated team. Justified for organizations with highly specific data requirements, existing data engineering capacity, and volume that makes the economics work.
Buy: TinyFish's enterprise tier is designed for this. At this tier, the economics shift from per-page cost to total infrastructure cost — the platform is running production workflows at this scale across multiple enterprise customers. The value proposition at this tier isn't the per-page cost — it's that you're buying a system that's already been hardened at that scale, with the reliability and compliance requirements enterprise operations need. Custom pricing at this tier; contact sales for specifics.
What to watch for: At 100,000+ pages/day, the decision isn't really between tools — it's between building and buying. Both have merit depending on your engineering resources and how central web data collection is to your product. The right question isn't "which tool is cheapest per page?" It's "how much of our engineering capacity do we want this to consume?"
Volume alone doesn't determine your tool. Site complexity — how much infrastructure the target requires — is the other axis. This matrix combines both:
| Feature | Static / simple pages | JS-heavy, requires managed browser | Authenticated access (your own accounts) |
|---|---|---|---|
| < 1K pages/day | Free tools (ParseHub, Octoparse) | TinyFish free tier | TinyFish free tier |
| 1K–10K pages/day | Scrapy (self-hosted) or Apify | Apify or TinyFish Starter | TinyFish Starter/Pro |
| 10K–100K pages/day | Scrapy + infra, Apify, or TinyFish Pro | Apify or TinyFish Pro | TinyFish Pro |
| 100K+ pages/day | Custom stack or TinyFish Enterprise | TinyFish Enterprise | TinyFish Enterprise |
The pattern: at low volume on simple sites, almost anything works and the cheapest option wins. As volume or site complexity increases, the tools that don't require ongoing maintenance become progressively more cost-effective when you count engineering time.
Every tool comparison in this category focuses on subscription price. The number that actually determines total cost is:
Total cost = tool subscription + proxy costs + (engineering hours × hourly rate)
Scrapy is free. But if a developer spends 15 hours/month keeping selectors current, that's $2,250/month at $150/hour — more expensive than any managed tool at comparable volume. The teams that make this mistake are the ones who calculate tool cost from the pricing page and engineering time from zero.
The inversion point — where managed infrastructure becomes cheaper than self-hosted — happens somewhere between 5,000 and 20,000 pages/day for most teams, depending on target site complexity and how often sites update their frontend.
If you're not sure where your project falls, start with the TinyFish free tier (500 credits, no credit card). Run it against your actual target site. The results tell you three things at once: whether AI-based extraction handles your target's structure, what your step-per-page ratio looks like for cost projection, and whether there's strict access requirements you didn't know about.
That's a better calibration than any estimate you can make from a pricing page.
How much does web scraping cost?
It depends on volume and tool choice, but the honest answer is that the subscription price is rarely the whole number. At under 1,000 pages/day, free tiers from ParseHub, Octoparse, and TinyFish cover most use cases at zero cost. At 5,000 pages/day, expect $15–100/month depending on whether you need strict access handling. At 50,000 pages/day, total cost including infrastructure and proxy fees typically runs $2,000–5,600/month depending on tool and proxy requirements — and if you're on a self-hosted setup, add engineering maintenance time on top of that. The full formula is: tool subscription + proxy costs + (engineering hours × hourly rate). Teams that only look at the subscription line consistently underestimate real cost by 2–3x.
What counts as a "page" for automation tool pricing?
It depends on the tool. For Scrapy and most traditional scrapers, a page is one HTTP request. For AI web agents like TinyFish, the unit is a "step" — a discrete action (navigate, click, extract). A single page extraction might require 2–5 steps; a multi-step authenticated workflow might require 10–15. Always ask vendors for step-to-page ratios for your specific use case before committing to a plan.
Is Scrapy actually free at high volume?
The software is open source, but the infrastructure isn't free. At 50,000 pages/day you need distributed computing, job queues, monitoring, and proxy pools. A realistic total infrastructure cost is $400–800/month, plus ongoing engineering time. Scrapy is the most cost-efficient option when you have the engineering capacity to run it — it's not free, it's a trade of money for engineering time.
What happens if I exceed my plan's page or step limit?
Most managed tools handle this differently. Apify charges compute unit overages at the pay-as-you-go rate. TinyFish offers pay-as-you-go at $0.015/credit as an alternative to the monthly plan (Pro plan overages bill at $0.012/credit). Note that 1 credit covers 1 agent step, 4 minutes of browser session, or 15 page fetches — the effective per-page cost depends on which API you use. Scrapy has no limit — your infrastructure is the ceiling. Plan for overages before you hit them; discovering them during a critical run is a bad time to learn the policy.
How do I know if my volume estimate is accurate?
It usually isn't, in the direction of underestimation. The most common mistake: counting target URLs but not accounting for crawl frequency, or not including the pages you need to navigate through to reach the data (pagination, category pages, authentication flows). Add 30–50% to your estimate before selecting a plan tier.
Related reading:
📸 IMAGE — Matrix showing web automation tool recommendations by page volume and access requirement level
500 free steps, no credit card. The fastest way to test whether TinyFish fits your workflow.
No credit card. No setup. Run your first operation in under a minute.