自学内容网 自学内容网

【Python】爬虫通过验证码

1、将验证码下载至本地

# 获取验证码界面html
url = 'http://www.example.com/a.html'
resp = requests.get(url)
soup = BeautifulSoup(resp.content.decode('UTF-8'), 'html.parser')

#找到验证码图片标签,获取其地址
src = soup.select_one('div.captcha-row img')['src']

# 验证码下载至本地
resp = requests.get(src)
with open('../images/verify.png', 'wb') as f:
f.write(resp.content)

2、解析验证码

pip install ddddocr
ocr = ddddocr.DdddOcr()
with open('../images/verify.png', 'rb') as f:
    img = f.read()
    code = ocr.classification(img)
    print(code)

3、发送验证码

#获取 token,一般验证码框有个隐藏的token
token = soup.find('input', {'name': 'csrfToken'}).get('value')

# 提交按钮对应的URL
verify_url = 'https://www.example.com/verify'

# 表单数据具体有哪几项可以在界面提交时查看(F12)
data = {
    'vcode': code,
    'token': token,
    'btnPost':''
}

# 请求头(F12 从请求里扒)
headers = {
    'content-type': 'application/x-www-form-urlencoded',
    'user-agent': 'Mozilla/5.0 (Macintosh;) AppleWebKit/537.36 (KHTML, like Gecko) Chrome'
}

response = requests.post(verify_url, data=data, headers=headers)

if response.status_code == 200:
    print('人机验证 - success')
else:
    print('人机验证 - fail')

原文地址:https://blog.csdn.net/weixin_42364929/article/details/143657691

免责声明:本站文章内容转载自网络资源,如本站内容侵犯了原著者的合法权益,可联系本站删除。更多内容请关注自学内容网(zxcms.com)!