Skip to main content

Detect Google AdSense on "Tough" Sites with Playwright

Β· 5 min read
Serhii Hrekov
software engineer, creator, artist, programmer, projects founder

When standard requests scripts fail with a 403 Forbidden or a Cloudflare "Verify you are human" challenge, it's usually because the website is looking for real browser behavior (like rendering JavaScript or moving a mouse).

Playwright is a modern browser automation library that acts like a real human using Chrome, Firefox, or Safari. It can bypass simple bot detection and see exactly what a user sees, making it the ultimate tool for AdSense detection on "tough" sites.


🎭 Why Playwright for AdSense?​

Modern AdSense implementations often use "Lazy Loading"-the ads don't even appear in the HTML until you scroll down or wait for a script to trigger. Standard scrapers only see the "empty" initial HTML. Playwright waits for the scripts to execute.


πŸ’» The Implementation​

This script launches a Headless Chromium instance, navigates to the site, waits for network activity to settle, and then scans the live DOM for AdSense markers.

1. Requirements​

You need the Playwright library and the browser binaries.

pip install playwright
playwright install chromium

2. The Code​

import asyncio
from playwright.async_api import async_playwright
import re

async def detect_adsense_advanced(url):
async with async_playwright() as p:
# Launch a real browser (headless=True means no window pops up)
browser = await p.chromium.launch(headless=True)

# Use a real User-Agent to look less like a bot
context = await browser.new_context(
user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
)

page = await context.new_page()
print(f"πŸ•΅οΈ Navigating to {url}...")

try:
# Navigate and wait until the network is idle (scripts finished loading)
await page.goto(url, wait_until="networkidle", timeout=30000)

# Scroll down to trigger any lazy-loaded ads
await page.mouse.wheel(0, 2000)
await page.wait_for_timeout(2000) # Wait 2 seconds for ads to pop in

# 1. Check for the AdSense Script in the live DOM
# We look for scripts containing 'adsbygoogle'
adsense_scripts = await page.locator('script[src*="adsbygoogle"]').count()

# 2. Check for the AdSense Tag/Container
adsense_ins_tags = await page.locator('ins.adsbygoogle').count()

# 3. Extract the Publisher ID from the page content
content = await page.content()
pub_id_match = re.search(r'pub-\d{16}', content)
pub_id = pub_id_match.group(0) if pub_id_match else "Not Found"

# --- Results ---
print(f"\nπŸ“Š Playwright Scan Results:")
print(f"βœ… AdSense Scripts Loaded: {adsense_scripts}")
print(f"πŸ“¦ Ad Containers (<ins>): {adsense_ins_tags}")
print(f"πŸ†” Publisher ID: {pub_id}")

if adsense_scripts > 0 or pub_id != "Not Found":
print("\nπŸš€ Verdict: AdSense DETECTED (even with bot protection).")
else:
print("\nπŸ“ Verdict: No AdSense found.")

except Exception as e:
print(f"❌ Error: {e}")
finally:
await browser.close()

# --- Run the Async Function ---
if __name__ == "__main__":
asyncio.run(detect_adsense_advanced("https://www.example.com"))


πŸ›‘οΈ How to be Even More "Stealthy"​

If the site still blocks you, you can use the playwright-stealth plugin. It hides the "I am a robot" fingerprints that Playwright naturally leaves behind (like the navigator.webdriver property).

Installation:

pip install playwright-stealth

Usage Snippet:

from playwright_stealth import stealth_async

# Inside your async function, after creating the 'page':
await stealth_async(page)


πŸ“Š Comparison: Requests vs. Playwright​

Featurerequests + BS4Playwright
Speed⚑ Extremely Fast🐒 Slower (loads images/JS)
JS Rendering❌ Noβœ… Yes
Lazy Loading❌ Failsβœ… Works
Bot DetectionπŸ›‘ Easy to BlockπŸ›‘οΈ Harder to Block

πŸ“š Sources & Technical Refs​

  • [1.1] Playwright Python Docs: Locators and Selectors - How to find elements efficiently.
  • [2.1] GitHub: playwright-stealth - Understanding the fingerprints that sites use to detect bots.
  • [3.1] Google AdSense Docs: Ad Units Code - Details on the <ins class="adsbygoogle"> structure.

Related articles