## **Requests的常用方法**
### Requests庫常用的函數方法
```
requests.get() 獲取Html的主要方法,模擬發送get請求
requests.post() 向html提交post請求方法
requests.put()????????????向html提交put請求方法
requests.patch??????????? 向html?提交局部修改的請求
requests.delete()???????? 向html?提交刪除的請求
```
### 1.Get請求
~~~
import requests
import json
r = requests.get('http://httpbin.org/get')
html = r.text
html2 = json.loads(html)
print(html)
print(type(html),type(html2))
print(html["url"])
print(html2["url"])
運行結果如下:
{
"args": {},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Host": "httpbin.org",
Traceback (most recent call last):
"User-Agent": "python-requests/2.22.0"
},
"origin": "114.248.162.218, 114.248.162.218",
File "F:/Desktop/Project/課件代碼/1.py", line 8, in <module>
"url": "https://httpbin.org/get"
}
print(html["url"])
TypeError: string indices must be integers
<class 'str'> <class 'dict'>
~~~
### 2.POST請求
~~~
import requests
data = {'name': 'germey', 'age': '22'}
r = requests.post("http://httpbin.org/post", data=data)
print(r.text)
運行結果
{
"args": {},
"data": "",
"files": {},
"form": {
"age": "22",
"name": "germey"
},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Content-Length": "18",
"Content-Type": "application/x-www-form-urlencoded",
"Host": "httpbin.org",
"User-Agent": "python-requests/2.22.0"
},
"json": null,
"origin": "114.248.162.218, 114.248.162.218",
"url": "https://httpbin.org/post"
}
~~~
### 3.添加header
~~~
import requests
r1 = requests.get("https://www.zhihu.com/explore")
print(r1.text)
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac oS X 10 11 _4) AppleWebKit/537. 36 (KHTML, like Gecko)'
}
r2 = requests.get("https://www.zhihu.com/explore",headers=headers)
print(r2.text)
運行結果
<html>
<head><title>400 Bad Request</title></head>
<body bgcolor="white">
<center><h1>400 Bad Request</h1></center>
<hr><center>openresty</center>
</body>
</html>
==============
<!doctype html>
<html lang="zh" data-hairline="true" data-theme="light"><head><meta charSet="utf-8"/><title data-react-helmet="true">發現 - 知乎</title><meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=1"/><meta name="renderer" content="webkit"/><meta name="force-rendering" content="webkit"/><meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1"/><meta name="google-site-verification" content="FTeR0c8arOPKh8c5DYh_9uu98_zJbaWw53J-Sch9MTg"/><meta name="description" property="og:description" content="有問題,上知乎。知乎,可信賴的問答社區,以讓每個人高效獲得可信賴的解答為使命。知乎憑借認真、專業和友善的社區氛圍,結構化、易獲得的優質內容,基于問答的內容生產方式和獨特的社區機制,吸引、聚集了各行各業中大量的親歷者、內行人、領域專家、領域愛好者,將高質量的內容透過人的節點來成規模地生產和分享。用戶通過問答等交流方式建立信任和連接,打造和提升個人影響力,并發現、獲得新機會。"/><link data-react-helmet="true" rel="apple-touch-icon" href="https://static.zhihu.com/heifetz/assets/apple-touch-icon-152.67c7b278.png"/><link data-react-helmet="true" rel="apple-touch-icon" href="https://static.zhihu.com/heifetz/assets/apple-touch-icon-152.67c7b278.png" sizes="152x152"/><link data-react-helmet="true" rel="apple-touch-icon" href="https://static.zhihu.com/heifetz/assets/apple-touch-icon-120.b3e6278d.png" sizes="120x120"/><link data-react-helmet="true" rel="apple-touch-icon" href="https://static.zhihu.com/heifetz/assets/apple-touch-icon-76.7a750095.png" sizes="76x76"/><link data-react-helmet="true" rel="apple-touch-icon" href="https://static.zhihu.com/heifetz/assets/apple-touch-icon-60.a4a761d4.png" sizes="60x60"/><link rel="shortcut icon" type="image/x-icon" href="https://static.zhihu.com/static/favicon.ico"/><link rel="search" type="application/opensearchdescription+xml" href="https://static.zhihu.com/static/search.xml" title="知乎"/><link rel="dns-prefetch" href="//static.zhimg.com"/><link rel="dns-prefetch" href="//pic1.zhimg.com"/><link rel="dns-prefetch" href="//pic2.zhimg.com"/><link rel="dns-prefetch" href="//pic3.zhimg.com"/><link rel="dns-prefetch" href="//pic4.zhimg.com"/><style>
.u-safeAreaInset-top {
height: constant(safe-area-inset-top) !important;
height: env(safe-area-inset-top) !important;
}
.u-safeAreaInset-bottom {
height: constant(safe-area-inset-bottom) !important;
height: env(safe-area-inset-bottom) !important;
}
~~~
### 4.文件上傳
~~~
import requests
files = {'file': open('favicon.png', 'rb')}
r = requests. post("http://httpbin.org/post", files=files)
print(r.text)
運行結果
{
"args": {},
"data": "",
"files": {
"file": "data:application/octet-stream;base64,iVBORw0KGgoAAAANSUhEUgAAAhwAAAECCAMAAACCFP44AAAACXBIWXMAAAsTAAALEwEAmpwYAAAKTWlDQ1BQaG90b3Nob3AgSUNDIHByb2ZpbGUAAHjanVN3WJP3Fj7f92UPVkLY8LGXbIEAIiOsCMgQWaIQkgBhhBASQMWFiApWFBURnEhVxILVCkidiOKgKLhnQYqIWotVXDjuH9yntX167+3t+9f7vOec5/zOec8PgBESJpHmomoAOVKFPDrYH49PS"
},
"form": {},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Content-Length": "8024",
"Content-Type": "multipart/form-data; boundary=ae576c1072214f7675389b19c437283d",
"Host": "httpbin.org",
"User-Agent": "python-requests/2.22.0"
},
"json": null,
"origin": "114.248.162.218, 114.248.162.218",
"url": "https://httpbin.org/post"
}
~~~
### 5.代理設置
對于某些網站,在測試的時候請求幾次,能正常獲取內容。但是一- 旦開始大規模爬取,對于大規
模且頻繁的請求,網站可能會彈出驗證碼,或者跳轉到登錄認證頁面,更甚者可能會直接封禁客戶端
的IP,導致一定時間段內無法訪問。
那么,為了防止這種情況發生,我們需要設置代理來解決這個問題,這就需要用到proxies參數。
可以用這樣的方式設置:
~~~
import requests
proxies = {
"http": "http://sun:qq123456.@192.168.66.211:520",
}
r1 = requests.get('http://httpbin.org/get')
r2 = requests.get('http://httpbin.org/get',proxies=proxies)
print(r1.text)
print(r2.text)
運行結果:
{
"args": {},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Host": "httpbin.org",
"User-Agent": "python-requests/2.22.0"
},
"origin": "114.248.162.218, 114.248.162.218",
"url": "https://httpbin.org/get"
}
{
"args": {},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Host": "httpbin.org",
"User-Agent": "python-requests/2.22.0"
},
"origin": "175.98.194.165, 175.98.194.165",
"url": "https://httpbin.org/get"
}
~~~
### 超時設置
在本機網絡狀況不好或者服務器網絡響應太慢甚至無響應時,我們可能會等待特別久的時間才可
能收到響應,甚至到最后收不到響應而報錯。為了防止服務器不能及時響應,應該設置一個超時時間,
即超過了這個時間還沒有得到響應,那就報錯。這需要用到timeout參數。這個時間的計算是發出請
求到服務器返回響應的時間。示例如下:
~~~
#設置超時
import requests
r = requests.get("https://www.taobao.com", timeout = 0.0001)
print(r.status_code)
運行結果
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='www.taobao.com', port=443): Read timed out. (read timeout=0.0001)
#永不超時
import requests
r = requests.get("https://www.taobao.com", timeout = 1)
print(r.status_code)
r = requests.get( 'https://www.google.com',timeout=None)
print(r.text)
~~~
### 會話保持
在requests中,如果直接利用get()或post()等方法的確可以做到模擬網頁的請求,但是這實際
上是相當于不同的會話,也就是說相當于你用了兩個瀏覽器打開了不同的頁面。
設想這樣一個場景,第一個請求利用post()方法登錄了某個網站,第二次想獲取成功登錄后的自
己的個人信息,你又用了一次get()方法去請求個人信息頁面。實際上,這相當于打開了兩個瀏覽器,
是兩個完全不相關的會話,能成功獲取個人信息嗎?那當然不能。
有小伙伴可能說了,我在兩次請求時設置一樣的cookies 不就行了?可以,但這樣做起來顯得很
煩瑣,我們有更簡單的解決方法。
其實解決這個問題的主要方法就是維持同--個會話,也就是相當于打開一個新的瀏覽器選項
卡而不是新開- - 個瀏覽器。但是我又不想每次設置cookies, 那該怎么辦呢?這時候就有了新的
利器--- Session 對象。
利用它,我們可以方便地維護一一個會話,而且不用擔心cookies 的問題,它會幫我們自動處理好。
~~~
get測試:
import requests
requests .get('http://httpbin.org/cookies/set/number/123456789')
r = requests .get('http://httpbin.org/cookies')
print(r.text)
運行結果:
{
"cookies": {}
}
使用會話進行測試:
import requests
s = requests.Session()
s.get('http://httpbin.org/cookies/set/number/123456789')
r = s.get('http://httpbin.org/cookies')
print(r.text)
運行結果:
{
"cookies": {
"number": "123456789"
}
}
~~~