TinyFish
Search
Fast, structured web search
Fetch
Any URL to clean content
Agent
Multi-step web automation
Browser
Stealth Chromium sessions
All products share one API keyView docs →
Documentation
API reference and guides
Integrations
Connect with your stack
Blog
Product updates and insights
Cookbook
Open-source examples
Pricing
Overview
Enterprise-grade web data
Use Cases
What teams are building
Customers
See who builds with TinyFish
Log InContactContact
Products
SearchFast, structured web search
FetchAny URL to clean content
AgentMulti-step web automation
BrowserStealth Chromium sessions
Resources
DocumentationAPI reference and guides
IntegrationsConnect with your stack
BlogProduct updates and insights
CookbookOpen-source examples
PricingPlans, credits, and billing
Enterprise
OverviewEnterprise-grade web data
Use CasesWhat teams are building
CustomersSee who builds with TinyFish
Log InContact
TinyFish

Web APIs built for agents.

Product
  • Enterprise
  • Use Cases
  • Customers
  • Pricing
  • Integrations
  • Docs
  • Trust
Resources
  • Cookbook
  • Blog
  • Current
  • Accelerator
Connect
  • X/Twitter
  • LinkedIn
  • Discord
  • GitHub
  • Contact Us
© 2026 TinyFish·Privacy·Terms
Technology

Anti-Bot Protection for Web Agents: How TinyFish Gets Past the Front Door

TinyFishie·TinyFish Observer·Apr 9, 2026·11 min read
Share
Anti-Bot Protection for Web Agents: How TinyFish Gets Past the Front Door

Anti-bot protection is a multi-layer detection system that analyzes IP reputation, TLS fingerprints, HTTP headers, browser fingerprints, behavioral patterns, and CAPTCHA challenges to distinguish bots from humans.

Your scraper works on localhost. You deploy it, run 200 requests, and every page throws a Cloudflare challenge. You add residential proxies — still blocked. You rotate user agents — still blocked. You patch navigator.webdriver — blocked again, but now for a different reason.

This is the current state of anti-bot protection in 2026. Sites don't check one signal. They check six or more, simultaneously. Fixing them one at a time is a losing game — beating it requires passing all layers at once.

How Modern Bot Detection Works: The Six Layers

Before talking about solutions, it helps to understand what web scraping bot detection actually looks like in 2026. Modern anti-bot systems — Cloudflare, PerimeterX (now HUMAN), DataDome, Akamai — don't rely on a single check. They layer detections so that passing one while failing another still gets you flagged. Understanding these layers is essential whether you're trying to bypass Cloudflare for web scraping or handle any other protection system.

Layer 1: IP reputation. Every request comes from an IP address. Anti-bot systems maintain reputation databases. Datacenter IPs (AWS, GCP, Azure) are flagged immediately — they're the obvious choice for automated traffic. Residential IPs have higher trust because they're associated with real users. Mobile IPs are even harder to detect.

Layer 2: TLS fingerprint. Before your HTTP request even reaches the server, your TLS handshake reveals what client you're using. Python's requests library, Go's net/http, Node's axios — each has a distinct TLS signature that looks nothing like a real browser. Cloudflare checks this in milliseconds.

Layer 3: HTTP headers and protocol. Real browsers send HTTP/2 with specific frame ordering and header patterns. Automation tools often default to HTTP/1.1 or send headers in the wrong order. Sites like Cloudflare flag these mismatches before you've even loaded a page.

Layer 4: Browser fingerprint. Headless Chrome has detectable properties: navigator.webdriver=true, missing plugins, inconsistent screen dimensions, no GPU renderer. Anti-bot systems check hundreds of these attributes and compare them against known browser signatures. A mismatch in any one of them raises your bot score.

Layer 5: Behavioral analysis. Real users move their mouse in non-linear paths, scroll at varying speeds, hesitate before clicking. Automated tools produce perfectly straight mouse paths or instant clicks. Modern systems use behavioral biometrics to detect this.

Layer 6: CAPTCHAs. When detection signals are ambiguous, the system throws a challenge. reCAPTCHA v3 doesn't even show a visible challenge — it scores your behavior silently and blocks you if the score is too low. Cloudflare Turnstile uses device fingerprinting and cryptographic challenges behind the scenes.

Anti-bot protection diagram showing detection layers like fingerprinting, CAPTCHA, IP filtering, and behavioral analysis

The critical insight: these layers compound. Fixing your IP while leaving your TLS fingerprint exposed still gets you blocked. Building a complete anti-bot stack means handling all six layers simultaneously.

The DIY Anti-Bot Stack: What It Actually Takes

If you're building anti-bot handling yourself, here's what a production-grade stack looks like:

Residential proxy provider. Datacenter IPs are dead for serious scraping. Residential proxies cost $3–15/GB depending on provider and geography. You need rotation logic, failover handling, and geographic targeting. Budget: $200–2,000+/month for production volume.

Fortified browser. Standard headless Chrome leaks automation signals everywhere. You need patches for WebDriver detection, plugin simulation, canvas fingerprint randomization, WebGL rendering consistency, and HTTP/2 frame ordering. Libraries like playwright-extra with stealth plugins help, but they're in a constant arms race with detection systems.

TLS fingerprint matching. Tools like curl-impersonate replicate Chrome's exact TLS handshake. But Cloudflare has adapted to detect its specific patterns. You need to stay current with each browser version's TLS signature.

Behavioral simulation. For heavily protected sites, you need mouse movement simulation with natural acceleration/deceleration curves, variable scroll speeds, and realistic timing between actions. Basic randomization doesn't work — modern systems use Fitts' Law models to detect synthetic movement.

CAPTCHA solving service. When prevention fails, you need a fallback. Services like 2Captcha ($1–3/1K solves) and CapSolver ($0.40–0.90/1K) either use human workers or AI models. Token-based solvers add 15–30 seconds per solve. Browser-integrated solvers are faster but cost more.

Retry and adaptation logic. When a request fails, your system needs to diagnose why (IP burned? fingerprint detected? behavioral flag?) and adapt — switch proxy, rotate fingerprint, change timing pattern. This is the orchestration layer most teams underestimate.

Total cost for a production DIY stack: $500–5,000/month in services alone, plus ongoing engineering time to keep it working as detection systems evolve. The real cost isn't the proxies — it's the engineer who maintains the stack.

When building your own makes sense: If your team needs full control over every component — for compliance auditing, custom fingerprint logic, or integration with existing infrastructure — the DIY path is the right one. It's also more cost-effective at very high volume (100K+ requests/month) where per-step pricing exceeds the fixed cost of maintaining your own stack. The key question is whether you have the dedicated engineering bandwidth to keep it running as detection systems evolve.

How TinyFish Handles It

TinyFish takes a different approach. Instead of exposing anti-bot components for you to assemble, it handles all six detection layers at the infrastructure level.

From your side, the entire configuration is one parameter:

curl -N -X POST https://agent.tinyfish.ai/v1/automation/run-sse \
  -H "X-API-Key: $TINYFISH_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://protected-site.com",
    "goal": "Extract all product names and prices as JSON",
    "browser_profile": "stealth",
    "proxy_config": {
      "enabled": true,
      "country_code": "US"
    }
  }'

browser_profile: "stealth" activates the full anti-bot stack. proxy_config routes through residential proxies in a specific country. That's it.

What happens behind the scenes: TinyFish runs a real Chromium-based browser session with a native stealth layer that handles fingerprint consistency, proxy rotation, and detection evasion automatically. If a request gets blocked, the system detects the block and auto-reconfigures — switching proxy, adjusting session parameters — without any input from you.

This auto-reconfiguration matters. In the Mind2Web benchmark, Task #197 on kaggle.com initially failed on an anti-bot block. On a subsequent run, TinyFish automatically reconfigured and passed Cloudflare on its own. You can watch the full execution trace — every step is public.

The difference isn't just having anti-bot tooling. It's having a system that detects blocks and adapts in real time without human input.

Comparison of standard web scraping vs Tinyfish's web agent infrastructure with higher success rates

What TinyFish Can and Can't Handle

Honesty about limitations builds more trust than claiming universal coverage.

What works well

The following success rates are based on internal production workload analysis across enterprise customer deployments (Q4 2025 – Q1 2026). Ranges reflect variation across different site configurations within each category:

Site CategoryExamplesSuccess Rate
Global e-commerceAmazon, eBay95–100%
European electronics retailMediaMarkt (DE/AT/PL/ES/IT)91–95%
Professional platformsLinkedIn, GitHub90–100%
Regional e-commerceOtto, Alternate, Coolblue88–100%
Specialty marketsChrono24, MercadoLibre90–100%
Video platformsYouTube, Bilibili87–99%
Regional real estate99.co, EdgeProp (Singapore)85–100%

For independent verification of TinyFish's web automation accuracy, see the Mind2Web benchmark results — 90% across 136 live websites, all 300 execution traces published publicly.

Cloudflare-protected sites, PerimeterX (HUMAN) systems, and most standard anti-bot configurations are handled automatically in stealth mode.

What has limitations

DataDome and hCaptcha. These are among the most aggressive protection systems. TinyFish can get through in some configurations, but success rates are lower and less consistent than with Cloudflare or PerimeterX. If your target site uses DataDome, test with TinyFish's free tier first. If success rates don't meet your threshold, consider pairing TinyFish with a dedicated CAPTCHA solving service like CapSolver or 2Captcha for those specific sites, or evaluate Bright Data's proxy network which has the widest IP pool for DataDome-heavy targets.

Full hard blocks. Some sites implement IP-level blocking that cannot be bypassed by any browser automation tool. If a site has decided to block all automated access, no amount of fingerprint sophistication will help.

CAPTCHAs requiring human solving. TinyFish's stealth layer is designed to prevent CAPTCHAs from being triggered in the first place. When prevention works, CAPTCHAs never appear. When a CAPTCHA does appear on a heavily protected site, the current system has limited ability to solve it automatically. This is the layer TinyFish is investing the most in right now.

Rate-sensitive sites. Sites that track request frequency over time (not just per-session) may flag even legitimate-looking traffic if volume is too high. For these sites, adding pacing to your goal description helps: "Wait 3 seconds between each action".

When to use stealth vs lite mode

TinyFish offers two browser profiles:

  • stealth — Full anti-bot stack active. Use for any site with Cloudflare, PerimeterX, or visible bot protection. Slightly slower due to additional handling.
  • lite — Faster execution, minimal anti-bot handling. Use for sites without protection, internal tools, or public APIs. Lower cost per step because tasks complete faster.

Default to stealth for any production workflow against external sites. Switch to lite only when you've confirmed the target site has no bot protection.

Prevention vs Solving: The Economics

The scraping industry has two schools of thought on CAPTCHAs: solve them or prevent them. The economics strongly favor prevention.

Solving CAPTCHAs: $1–3 per 1,000 solves via token-based services like 2Captcha. Each solve adds 15–30 seconds of latency. At 10,000 requests/day with a 20% CAPTCHA trigger rate, that's $2–6/day in solving costs plus 8–16 hours of cumulative latency.

Preventing CAPTCHAs: Make your traffic look human enough that CAPTCHAs never appear. Zero cost per prevented CAPTCHA. Zero latency added. The investment is in the stealth infrastructure, not per-solve fees.

TinyFish's approach is prevention-first. The stealth layer is designed to keep your bot score low enough that challenges are never triggered. When that works — and across most sites in the success rate table above, it does — you get faster execution and lower costs than any solve-based approach.

For sites where prevention isn't enough, dedicated CAPTCHA solving services can work alongside TinyFish. You'd handle the solving logic in your application layer and feed the results back into your workflow.

The Assembly Cost

Here's a side-by-side of what each approach costs for a team running 10,000 requests/month against Cloudflare-protected sites:

ComponentDIY StackTinyFish
Residential proxies$200–500/moIncluded
Browser infrastructure$50–200/mo (cloud VMs)Included
Stealth browser libraryFree (OSS) + maintenance timeIncluded
CAPTCHA solver (fallback)$10–30/moPrevention-first approach
LLM for agent reasoning$50–200/moIncluded
Engineering maintenance10–20 hrs/mo ongoingZero
Full control & auditability✅ Every component inspectableManaged — limited visibility into infra
Cost ceiling at very high volumePredictable — no per-step billingPer-step cost scales linearly
Total estimated cost$310–930/mo + eng time$150/mo (Pro plan, 16,500 steps)

Cost estimates based on market rates for residential proxy providers (Bright Data, Oxylabs, IPRoyal), cloud compute (AWS/GCP on-demand), and CAPTCHA solving services (2Captcha, CapSolver) as of Q1 2026. Ranges reflect variation by provider and volume tier.

The Pro plan includes 16,500 steps/month — browser execution, residential proxy, LLM inference, and anti-bot handling all bundled at $0.012/step on overage. No separate line items.

A fair assessment: if your team already has a working DIY stack and the engineering resources to maintain it, the cost math may favor staying on it — especially at very high volume where per-step pricing adds up. The DIY path gives you full control over every component, which matters for compliance, auditing, and edge-case customization.

Where TinyFish wins is for teams that don't have (or don't want to maintain) that stack. A DIY anti-bot system is a living system — detection methods evolve, browser patches need updates, proxy pools need rotation. Someone on your team is maintaining this. TinyFish moves that burden to infrastructure you don't manage.

Try It Against Your Hardest Site

The best way to evaluate is to test against the site that's actually giving you trouble.

500 free steps. No credit card. Set browser_profile: "stealth", point it at your target, and see what comes back.

👉 Start free on TinyFish

FAQ

What protection systems does TinyFish handle?

TinyFish's stealth mode handles Cloudflare (including Turnstile), PerimeterX (HUMAN), and most standard anti-bot configurations automatically. Success rates range from 85–100% depending on site category and protection aggressiveness. DataDome and hCaptcha are handled in some configurations but with lower consistency. Full hard blocks at the IP level cannot be bypassed by any tool.

Do I need to configure proxies separately?

No. Residential proxy rotation is included in every TinyFish plan at no extra cost ($0/GB). Add proxy_config: { enabled: true, country_code: "US" } to your request to route through a specific country. Supported countries: US, GB, CA, DE, FR, JP, AU.

What happens when TinyFish gets blocked?

The system detects blocks automatically and attempts to reconfigure — switching proxy, adjusting session parameters — without your input. If reconfiguration succeeds, the task continues. If it fails, the run completes with a failure status that you can inspect via the streaming URL, which includes screenshots and execution logs for every step.

How does stealth mode affect speed?

Stealth mode is slightly slower than lite mode because of the additional anti-bot handling. Simple extractions in stealth typically take 10–30 seconds. Multi-step workflows take 30–90 seconds depending on complexity. For sites without bot protection, use browser_profile: "lite" for faster execution.

Can I use TinyFish alongside a CAPTCHA solving service?

Yes. TinyFish's approach is prevention-first — the stealth layer keeps CAPTCHAs from being triggered. For the minority of cases where a CAPTCHA still appears, you can integrate a third-party solving service (2Captcha, CapSolver) in your application layer and feed the token back into your workflow.

How does TinyFish's anti-bot compare to Browserbase or Firecrawl?

Browserbase provides cloud browsers and relies on JavaScript injection for stealth — you build and maintain the anti-bot logic via Stagehand or your own code. Firecrawl handles basic anti-bot through its rendering engine, but independent testing shows lower success rates on heavily protected sites. TinyFish handles anti-bot at the infrastructure level — native stealth layer, residential proxy rotation, and auto-reconfiguration — all activated with a single parameter. See our TinyFish vs Browserbase and TinyFish vs Firecrawl comparisons for detailed breakdowns.

Related Reading

  • Why AI Agents Need a Unified Web Infrastructure
  • TinyFish vs Browserbase: Cold Start, Pricing, and Real-World Performance
  • TinyFish vs Firecrawl: When Extraction Needs More Than a Crawl Endpoint
Get started

Start building.

No credit card. No setup. Run your first operation in under a minute.

Get 500 free creditsRead the docs
More Articles
Building a Browser for the Agent Era
Engineering

Building a Browser for the Agent Era

Max Luong·Apr 14, 2026
Production-Grade Web Fetching for AI Agents
Engineering

Production-Grade Web Fetching for AI Agents

Chenlu Ji·Apr 14, 2026
Why Stitched Web Stacks Fail in Production
Product and Integrations

Why Stitched Web Stacks Fail in Production

Keith Zhai·Apr 14, 2026