Tích hợp HTTPX + CaptchaAI

HTTPX là ứng dụng khách HTTP Python hiện đại có hỗ trợ không đồng bộ và HTTP/2. Hướng dẫn này cho biết cách sử dụng nó với CaptchaAI để giải quyết cả CAPTCHA đồng bộ hóa và không đồng bộ.

Yêu cầu

Yêu cầu	Chi tiết
Python	3,8+
httpx	0,24+
Khóa API CaptchaAI	Nhận một cái ở đây

pip install httpx

Máy khách đồng bộ

import httpx
import time
import os

class CaptchaAISync:
    def __init__(self, api_key):
        self.api_key = api_key
        self.base_url = "https://ocr.captchaai.com"
        self.client = httpx.Client(timeout=30)

    def solve(self, params, timeout=300):
        params["key"] = self.api_key

        # Submit
        resp = self.client.get(f"{self.base_url}/in.php", params=params)
        text = resp.text

        if not text.startswith("OK|"):
            raise Exception(f"Submit failed: {text}")

        task_id = text.split("|")[1]

        # Poll
        deadline = time.time() + timeout
        poll_params = {"key": self.api_key, "action": "get", "id": task_id}

        while time.time() < deadline:
            time.sleep(5)
            result = self.client.get(
                f"{self.base_url}/res.php", params=poll_params
            )

            if result.text == "CAPCHA_NOT_READY":
                continue
            if result.text.startswith("OK|"):
                return result.text.split("|", 1)[1]
            raise Exception(f"Solve failed: {result.text}")

        raise TimeoutError(f"Task {task_id} timed out")

    def get_balance(self):
        resp = self.client.get(f"{self.base_url}/res.php", params={
            "key": self.api_key, "action": "getbalance"
        })
        return float(resp.text)

    def close(self):
        self.client.close()

# Usage
solver = CaptchaAISync(os.environ["CAPTCHAAI_API_KEY"])

token = solver.solve({
    "method": "userrecaptcha",
    "googlekey": "6Le-wvkS...",
    "pageurl": "https://example.com",
})
print(f"Token: {token[:50]}...")
solver.close()

Máy khách không đồng bộ

import httpx
import asyncio
import os

class CaptchaAIAsync:
    def __init__(self, api_key):
        self.api_key = api_key
        self.base_url = "https://ocr.captchaai.com"
        self.client = httpx.AsyncClient(timeout=30)

    async def solve(self, params, timeout=300):
        params["key"] = self.api_key

        # Submit
        resp = await self.client.get(
            f"{self.base_url}/in.php", params=params
        )
        text = resp.text

        if not text.startswith("OK|"):
            raise Exception(f"Submit failed: {text}")

        task_id = text.split("|")[1]

        # Poll
        deadline = asyncio.get_event_loop().time() + timeout
        poll_params = {"key": self.api_key, "action": "get", "id": task_id}

        while asyncio.get_event_loop().time() < deadline:
            await asyncio.sleep(5)
            result = await self.client.get(
                f"{self.base_url}/res.php", params=poll_params
            )

            if result.text == "CAPCHA_NOT_READY":
                continue
            if result.text.startswith("OK|"):
                return result.text.split("|", 1)[1]
            raise Exception(f"Solve failed: {result.text}")

        raise TimeoutError(f"Task {task_id} timed out")

    async def get_balance(self):
        resp = await self.client.get(f"{self.base_url}/res.php", params={
            "key": self.api_key, "action": "getbalance"
        })
        return float(resp.text)

    async def close(self):
        await self.client.aclose()

# Usage
async def main():
    solver = CaptchaAIAsync(os.environ["CAPTCHAAI_API_KEY"])

    # Solve multiple concurrently
    tasks = [
        solver.solve({
            "method": "userrecaptcha",
            "googlekey": "6Le-wvkS...",
            "pageurl": f"https://example.com/page{i}",
        })
        for i in range(5)
    ]

    results = await asyncio.gather(*tasks, return_exceptions=True)
    for i, r in enumerate(results):
        if isinstance(r, Exception):
            print(f"Page {i}: FAILED - {r}")
        else:
            print(f"Page {i}: solved ({len(r)} chars)")

    await solver.close()

asyncio.run(main())

Hỗ trợ HTTP/2

HTTPX hỗ trợ HTTP/2, giảm chi phí kết nối:

pip install httpx[http2]

client = httpx.AsyncClient(http2=True, timeout=30)

HTTP/2 ghép các yêu cầu qua một kết nối duy nhất, cải thiện hiệu suất khi gửi và thăm dò nhiều CAPTCHA.

Ví dụ về việc thu thập dữ liệu bằng cách xử lý CAPTCHA

import httpx
import re
import os

async def scrape_with_captcha(url, solver):
    async with httpx.AsyncClient() as client:
        # Fetch page
        resp = await client.get(url)
        html = resp.text

        # Check for reCAPTCHA
        match = re.search(
            r'data-sitekey=["\']([A-Za-z0-9_-]+)["\']', html
        )
        if not match:
            return html

        site_key = match.group(1)
        token = await solver.solve({
            "method": "userrecaptcha",
            "googlekey": site_key,
            "pageurl": url,
        })

        # Submit form with token
        resp = await client.post(url, data={
            "g-recaptcha-response": token,
        })
        return resp.text

async def main():
    solver = CaptchaAIAsync(os.environ["CAPTCHAAI_API_KEY"])
    content = await scrape_with_captcha("https://example.com", solver)
    print(f"Got {len(content)} chars")
    await solver.close()

asyncio.run(main())

So sánh: httpx với yêu cầu và aiohttp

tính năng	httpx (đồng bộ)	httpx (không đồng bộ)	yêu cầu	aiohttp
Hỗ trợ không đồng bộ	⏳	✅	⏳	✅
HTTP/2	✅	✅	⏳	⏳
Tổng hợp kết nối	✅	✅	✅	✅
Khả năng tương thích API	giống như yêu cầu	giống như yêu cầu	—	Khác nhau
Tốt nhất cho	Thay thế thả vào	Mã không đồng bộ hiện đại	Kịch bản nhanh	Tính đồng thời cao

Câu hỏi thường gặp

Tôi có nên sử dụng httpx thay vì yêu cầu không?

Đối với các dự án mới, vâng. httpx có API tương thích với yêu cầu cùng với hỗ trợ async và HTTP/2. Đối với mã hiện có sử dụng các yêu cầu, việc di chuyển rất đơn giản.

httpx có nhanh hơn aiohttp không?

aiohttp có chi phí hoạt động thấp hơn một chút đối với khối lượng công việc không đồng bộ thuần túy. httpx nhanh hơn đối với các kết nối HTTP/2 và thuận tiện hơn đối với mã sync/async hỗn hợp.

Tôi có thể sử dụng httpx với Scrapy không?

Không trực tiếp - Scrapy sử dụng vòng lặp sự kiện của Twisted. Sử dụng httpx trong các tập lệnh độc lập hoặc với các khung dựa trên asyncio như FastAPI.

Tích hợp HTTPX + CaptchaAI

Yêu cầu

Máy khách đồng bộ

Máy khách không đồng bộ

Hỗ trợ HTTP/2

Ví dụ về việc thu thập dữ liệu bằng cách xử lý CAPTCHA

So sánh: httpx với yêu cầu và aiohttp

Câu hỏi thường gặp

Tôi có nên sử dụng httpx thay vì yêu cầu không?

httpx có nhanh hơn aiohttp không?

Tôi có thể sử dụng httpx với Scrapy không?

Hướng dẫn liên quan

Xây dựng giải quyết CAPTCHA theo sự kiện bằng AWS SNS và CaptchaAI

Dịch vụ giải CAPTCHA tốt nhất được so sánh (2025)

Công nhân giải quyết CAPTCHA tự động mở rộng quy mô

Xây dựng hàng đợi giải CAPTCHA trong Node.js

Triển khai Blue-Green cho cơ sở hạ tầng giải CAPTCHA

Xây dựng hàng đợi giải CAPTCHA bằng Python với CaptchaAI

Yêu cầu

Máy khách đồng bộ

Máy khách không đồng bộ

Hỗ trợ HTTP/2

Ví dụ về việc thu thập dữ liệu bằng cách xử lý CAPTCHA

So sánh: httpx với yêu cầu và aiohttp

Câu hỏi thường gặp

Tôi có nên sử dụng httpx thay vì yêu cầu không?

httpx có nhanh hơn aiohttp không?

Tôi có thể sử dụng httpx với Scrapy không?

Hướng dẫn liên quan

Postagens relacionadas

Xây dựng giải quyết CAPTCHA theo sự kiện bằng AWS SNS và CaptchaAI

Dịch vụ giải CAPTCHA tốt nhất được so sánh (2025)

Công nhân giải quyết CAPTCHA tự động mở rộng quy mô

Xây dựng hàng đợi giải CAPTCHA trong Node.js

Triển khai Blue-Green cho cơ sở hạ tầng giải CAPTCHA

Xây dựng hàng đợi giải CAPTCHA bằng Python với CaptchaAI