Detect Google AdSense on "Tough" Sites with Playwright
When standard requests scripts fail with a 403 Forbidden or a Cloudflare "Verify you are human" challenge, it's usually because the website is looking for real browser behavior (like rendering JavaScript or moving a mouse).
Playwright is a modern browser automation library that acts like a real human using Chrome, Firefox, or Safari. It can bypass simple bot detection and see exactly what a user sees, making it the ultimate tool for AdSense detection on "tough" sites.
π Why Playwright for AdSense?β
Modern AdSense implementations often use "Lazy Loading"-the ads don't even appear in the HTML until you scroll down or wait for a script to trigger. Standard scrapers only see the "empty" initial HTML. Playwright waits for the scripts to execute.
π» The Implementationβ
This script launches a Headless Chromium instance, navigates to the site, waits for network activity to settle, and then scans the live DOM for AdSense markers.
1. Requirementsβ
You need the Playwright library and the browser binaries.
pip install playwright
playwright install chromium
2. The Codeβ
import asyncio
from playwright.async_api import async_playwright
import re
async def detect_adsense_advanced(url):
async with async_playwright() as p:
# Launch a real browser (headless=True means no window pops up)
browser = await p.chromium.launch(headless=True)
# Use a real User-Agent to look less like a bot
context = await browser.new_context(
user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
)
page = await context.new_page()
print(f"π΅οΈ Navigating to {url}...")
try:
# Navigate and wait until the network is idle (scripts finished loading)
await page.goto(url, wait_until="networkidle", timeout=30000)
# Scroll down to trigger any lazy-loaded ads
await page.mouse.wheel(0, 2000)
await page.wait_for_timeout(2000) # Wait 2 seconds for ads to pop in
# 1. Check for the AdSense Script in the live DOM
# We look for scripts containing 'adsbygoogle'
adsense_scripts = await page.locator('script[src*="adsbygoogle"]').count()
# 2. Check for the AdSense Tag/Container
adsense_ins_tags = await page.locator('ins.adsbygoogle').count()
# 3. Extract the Publisher ID from the page content
content = await page.content()
pub_id_match = re.search(r'pub-\d{16}', content)
pub_id = pub_id_match.group(0) if pub_id_match else "Not Found"
# --- Results ---
print(f"\nπ Playwright Scan Results:")
print(f"β
AdSense Scripts Loaded: {adsense_scripts}")
print(f"π¦ Ad Containers (<ins>): {adsense_ins_tags}")
print(f"π Publisher ID: {pub_id}")
if adsense_scripts > 0 or pub_id != "Not Found":
print("\nπ Verdict: AdSense DETECTED (even with bot protection).")
else:
print("\nπ Verdict: No AdSense found.")
except Exception as e:
print(f"β Error: {e}")
finally:
await browser.close()
# --- Run the Async Function ---
if __name__ == "__main__":
asyncio.run(detect_adsense_advanced("https://www.example.com"))
π‘οΈ How to be Even More "Stealthy"β
If the site still blocks you, you can use the playwright-stealth plugin. It hides the "I am a robot" fingerprints that Playwright naturally leaves behind (like the navigator.webdriver property).
Installation:
pip install playwright-stealth
Usage Snippet:
from playwright_stealth import stealth_async
# Inside your async function, after creating the 'page':
await stealth_async(page)
π Comparison: Requests vs. Playwrightβ
| Feature | requests + BS4 | Playwright |
|---|---|---|
| Speed | β‘ Extremely Fast | π’ Slower (loads images/JS) |
| JS Rendering | β No | β Yes |
| Lazy Loading | β Fails | β Works |
| Bot Detection | π Easy to Block | π‘οΈ Harder to Block |
π Sources & Technical Refsβ
- [1.1] Playwright Python Docs: Locators and Selectors - How to find elements efficiently.
- [2.1] GitHub: playwright-stealth - Understanding the fingerprints that sites use to detect bots.
- [3.1] Google AdSense Docs: Ad Units Code - Details on the
<ins class="adsbygoogle">structure.
