具体网页不公布了:
问题:
打印的结果:
print(resp.status_code)
print(111,resp.content.decode())
304
111
这让我很诧异,因为我也是第一次遇见304,不知道如何下手,就百度谷歌搜索结果,看看有没有解决方法:
最后找到了一些304的原理文章
https://blog.csdn.net/soonfly/article/details/50953814
https://blog.csdn.net/huwei2003/article/details/70139062
其中看到了里面有俩个关键的词,然后我再headers找到了同样的俩个词,然后爬取时去掉即可。
headers = {
"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3",
"accept-encoding": "gzip, deflate, br",
"accept-language": "zh-CN,zh;q=0.9",
"cache-control": "max-age=0",
"cookie": "PHPSESSID=mk0a6o889gjlmg6143nngqcqg3; Hm_lvt_475c542162a560b5bb02f9f6fc6cb31e=1560394732; Hm_lvt_a632bb02989bf5564a21489660475bda=1560394808; Hm_lpvt_a632bb02989bf5564a21489660475bda=1560406531; Hm_lpvt_475c542162a560b5bb02f9f6fc6cb31e=1560407160",
# "if-modified-since": "Sun, 05 May 2019 04:32:58 GMT",
# "if-none-match": "\"5cce677a-5b33\"",
"upgrade-insecure-requests": "1",
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.80 Safari/537.36"
}
这样结果就是200了,至此问题解决。