Python Requests 教程 · ZetCode 中文系列教程

# Python Requests 教程 > 原文： [http://zetcode.com/python/requests/](http://zetcode.com/python/requests/) 在本教程中，我們展示了如何使用 Python Requests 模塊。我們獲取數據，發布數據，流數據并連接到安全的網頁。在示例中，我們使用在線服務，nginx 服務器，Python HTTP 服務器和 Flask 應用。 ZetCode 也有一個簡潔的 [Python 教程](/lang/python/)。超文本傳輸??協議（HTTP）是用于分布式協作超媒體信息系統的應用協議。 HTTP 是萬維網數據通信的基礎。 ## Python Requests Requests 是一個簡單優雅的 Python HTTP 庫。它提供了通過 HTTP 訪問 Web 資源的方法。 Requests 是內置的 Python 模塊。 ```py $ sudo service nginx start ``` 我們在本地主機上運行 nginx Web 服務器。我們的一些示例使用`nginx`服務器。 ## Python Requests 版本第一個程序打印請求庫的版本。 `version.py` ```py #!/usr/bin/env python3 import requests print(requests.__version__) print(requests.__copyright__) ``` 該程序將打印請求的版本和版權。 ```py $ ./version.py 2.21.0 Copyright 2018 Kenneth Reitz ``` 這是示例的示例輸出。 ## Python Requests 讀取網頁 `get()`方法發出 GET 請求；它獲取由給定 URL 標識的文檔。 `read_webpage.py` ```py #!/usr/bin/env python3 import requests as req resp = req.get("http://www.webcode.me") print(resp.text) ``` 該腳本獲取`www.webcode.me`網頁的內容。 ```py resp = req.get("http://www.webcode.me") ``` `get()`方法返回一個響應對象。 ```py print(resp.text) ``` `text`屬性包含響應的內容，以 Unicode 表示。 ```py $ ./read_webpage.py <!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <title>My html page</title> </head> <body> Today is a beautiful day. We go swimming and fishing. Hello there. How are you? </body> </html> ``` 這是`read_webpage.py`腳本的輸出。以下程序獲取一個小型網頁，并剝離其 HTML 標簽。 `strip_tags.py` ```py #!/usr/bin/env python3 import requests as req import re resp = req.get("http://www.webcode.me") content = resp.text stripped = re.sub('<[^<]+?>', '', content) print(stripped) ``` 該腳本會剝離`www.webcode.me`網頁的 HTML 標簽。 ```py stripped = re.sub('<[^<]+?>', '', content) ``` 一個簡單的正則表達式用于剝離 HTML 標記。 ## HTTP 請求 HTTP 請求是從客戶端發送到瀏覽器的消息，以檢索某些信息或采取某些措施。 `Request`的`request`方法創建一個新請求。請注意，`request`模塊具有一些更高級的方法，例如`get()`，`post()`或`put()`，為我們節省了一些輸入。 `create_request.py` ```py #!/usr/bin/env python3 import requests as req resp = req.request(method='GET', url="http://www.webcode.me") print(resp.text) ``` 該示例創建一個 GET 請求并將其發送到`http://www.webcode.me`。 ## Python Requests 獲取狀態 `Response`對象包含服務器對 HTTP 請求的響應。其`status_code`屬性返回響應的 HTTP 狀態代碼，例如 200 或 404。 `get_status.py` ```py #!/usr/bin/env python3 import requests as req resp = req.get("http://www.webcode.me") print(resp.status_code) resp = req.get("http://www.webcode.me/news") print(resp.status_code) ``` 我們使用`get()`方法執行兩個 HTTP 請求，并檢查返回的狀態。 ```py $ ./get_status.py 200 404 ``` 200 是成功 HTTP 請求的標準響應，而 404 則表明找不到所請求的資源。 ## Python Requests HEAD 方法 `head()`方法檢索文檔標題。標頭由字段組成，包括日期，服務器，內容類型或上次修改時間。 `head_request.py` ```py #!/usr/bin/env python3 import requests as req resp = req.head("http://www.webcode.me") print("Server: " + resp.headers['server']) print("Last modified: " + resp.headers['last-modified']) print("Content type: " + resp.headers['content-type']) ``` 該示例打印服務器，`www.webcode.me`網頁的上次修改時間和內容類型。 ```py $ ./head_request.py Server: nginx/1.6.2 Last modified: Sat, 20 Jul 2019 11:49:25 GMT Content type: text/html ``` 這是`head_request.py`程序的輸出。 ## Python Requests GET 方法 `get()`方法向服務器發出 GET 請求。 GET 方法請求指定資源的表示形式。 `httpbin.org`是免費提供的 HTTP 請求&響應服務。 `mget.py` ```py #!/usr/bin/env python3 import requests as req resp = req.get("https://httpbin.org/get?name=Peter") print(resp.text) ``` 該腳本將具有值的變量發送到`httpbin.org`服務器。該變量直接在 URL 中指定。 ```py $ ./mget.py { "args": { "name": "Peter" }, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Host": "httpbin.org", "User-Agent": "python-requests/2.21.0" }, ... } ``` 這是示例的輸出。 `mget2.py` ```py #!/usr/bin/env python3 import requests as req payload = {'name': 'Peter', 'age': 23} resp = req.get("https://httpbin.org/get", params=payload) print(resp.url) print(resp.text) ``` `get()`方法采用`params`參數，我們可以在其中指定查詢參數。 ```py payload = {'name': 'Peter', 'age': 23} ``` 數據在 Python 字典中發送。 ```py resp = req.get("https://httpbin.org/get", params=payload) ``` 我們將 GET 請求發送到`httpbin.org`站點，并傳遞`params`參數中指定的數據。 ```py print(resp.url) print(resp.text) ``` 我們將 URL 和響應內容打印到控制臺。 ```py $ ./mget2.py http://httpbin.org/get?name=Peter&age=23 { "args": { "age": "23", "name": "Peter" }, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Host": "httpbin.org", "User-Agent": "python-requests/2.21.0" }, ... } ``` 這是示例的輸出。 ## Python Requests 重定向重定向是將一個 URL 轉發到另一個 URL 的過程。 HTTP 響應狀態代碼 301“永久移動”用于永久 URL 重定向； 302 找到臨時重定向。 `redirect.py` ```py #!/usr/bin/env python3 import requests as req resp = req.get("https://httpbin.org/redirect-to?url=/") print(resp.status_code) print(resp.history) print(resp.url) ``` 在示例中，我們向`https://httpbin.org/redirect-to`頁面發出 GET 請求。該頁面重定向到另一個頁面；重定向響應存儲在響應的`history`屬性中。 ```py $ ./redirect.py 200 [<Response [302]>] https://httpbin.org/ ``` 對`https://httpbin.org/redirect-to`的 GET 請求被重定向到`https://httpbin.org` 302。在第二個示例中，我們不遵循重定向。 `redirect2.py` ```py #!/usr/bin/env python3 import requests as req resp = req.get("https://httpbin.org/redirect-to?url=/", allow_redirects=False) print(resp.status_code) print(resp.url) ``` `allow_redirects`參數指定是否遵循重定向。默認情況下，重定向之后。 ```py $ ./redirect2.py 302 https://httpbin.org/redirect-to?url=/ ``` 這是示例的輸出。 ## 用 nginx 重定向在下一個示例中，我們顯示如何在 nginx 服務器中設置頁面重定向。 ```py location = /oldpage.html { return 301 /newpage.html; } ``` 將這些行添加到位于 Debian 上`/etc/nginx/sites-available/default`的 Nginx 配置文件中。 ```py $ sudo service nginx restart ``` 編輯完文件后，我們必須重新啟動 nginx 才能應用更改。 `oldpage.html` ```py <!DOCTYPE html> <html> <head> <title>Old page</title> </head> <body> This is old page </body> </html> ``` 這是位于 nginx 文檔根目錄中的`oldpage.html`文件。 `newpage.html` ```py <!DOCTYPE html> <html> <head> <title>New page</title> </head> <body> This is a new page </body> </html> ``` 這是`newpage.html`。 `redirect3.py` ```py #!/usr/bin/env python3 import requests as req resp = req.get("http://localhost/oldpage.html") print(resp.status_code) print(resp.history) print(resp.url) print(resp.text) ``` 該腳本訪問舊頁面并遵循重定向。如前所述，默認情況下，請求遵循重定向。 ```py $ ./redirect3.py 200 (<Response [301]>,) http://localhost/files/newpage.html <!DOCTYPE html> <html> <head> <title>New page</title> </head> <body> This is a new page </body> </html> ``` 這是示例的輸出。 ```py $ sudo tail -2 /var/log/nginx/access.log 127.0.0.1 - - [21/Jul/2019:07:41:27 -0400] "GET /oldpage.html HTTP/1.1" 301 184 "-" "python-requests/2.4.3 CPython/3.4.2 Linux/3.16.0-4-amd64" 127.0.0.1 - - [21/Jul/2019:07:41:27 -0400] "GET /newpage.html HTTP/1.1" 200 109 "-" "python-requests/2.4.3 CPython/3.4.2 Linux/3.16.0-4-amd64" ``` 從`access.log`文件中可以看到，該請求已重定向到新的文件名。通信包含兩個 GET 請求。 ## 用戶代理在本節中，我們指定用戶代理的名稱。我們創建自己的 Python HTTP 服務器。 `http_server.py` ```py #!/usr/bin/env python3 from http.server import BaseHTTPRequestHandler, HTTPServer class MyHandler(BaseHTTPRequestHandler): def do_GET(self): message = "Hello there" self.send_response(200) if self.path == '/agent': message = self.headers['user-agent'] self.send_header('Content-type', 'text/html') self.end_headers() self.wfile.write(bytes(message, "utf8")) return def main(): print('starting server on port 8081...') server_address = ('127.0.0.1', 8081) httpd = HTTPServer(server_address, MyHandler) httpd.serve_forever() main() ``` 我們有一個簡單的 Python HTTP 服務器。 ```py if self.path == '/agent': message = self.headers['user-agent'] ``` 如果路徑包含`'/agent'`，則返回指定的用戶代理。 `user_agent.py` ```py #!/usr/bin/env python3 import requests as req headers = {'user-agent': 'Python script'} resp = req.get("http://localhost:8081/agent", headers=headers) print(resp.text) ``` 該腳本向我們的 Python HTTP 服務器創建一個簡單的 GET 請求。要向請求添加 HTTP 標頭，我們將字典傳遞給`headers`參數。 ```py headers = {'user-agent': 'Python script'} ``` 標頭值放置在 Python 字典中。 ```py resp = req.get("http://localhost:8081/agent", headers=headers) ``` 這些值將傳遞到`headers`參數。 ```py $ simple_server.py starting server on port 8081... ``` 首先，我們啟動服務器。 ```py $ ./user_agent.py Python script ``` 然后我們運行腳本。服務器使用我們隨請求發送的代理名稱進行了響應。 ## Python Requests POST 值 `post`方法在給定的 URL 上調度 POST 請求，為填寫的表單內容提供鍵/值對。 `post_value.py` ```py #!/usr/bin/env python3 import requests as req data = {'name': 'Peter'} resp = req.post("https://httpbin.org/post", data) print(resp.text) ``` 腳本使用具有`Peter`值的`name`鍵發送請求。 POST 請求通過`post`方法發出。 ```py $ ./post_value.py { "args": {}, "data": "", "files": {}, "form": { "name": "Peter" }, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Content-Length": "10", "Content-Type": "application/x-www-form-urlencoded", "Host": "httpbin.org", "User-Agent": "python-requests/2.21.0" }, "json": null, ... } ``` 這是`post_value.py`腳本的輸出。 ## Python Requests 上傳圖像在以下示例中，我們將上傳圖片。我們使用 Flask 創建一個 Web 應用。 `app.py` ```py #!/usr/bin/env python3 import os from flask import Flask, request app = Flask(__name__) @app.route("/") def home(): return 'This is home page' @app.route("/upload", methods=['POST']) def handleFileUpload(): msg = 'failed to upload image' if 'image' in request.files: photo = request.files['image'] if photo.filename != '': photo.save(os.path.join('.', photo.filename)) msg = 'image uploaded successfully' return msg if __name__ == '__main__': app.run() ``` 這是具有兩個端點的簡單應用。 `/upload`端點檢查是否有某些圖像并將其保存到當前目錄。 `upload_file.py` ```py #!/usr/bin/env python3 import requests as req url = 'http://localhost:5000/upload' with open('sid.jpg', 'rb') as f: files = {'image': f} r = req.post(url, files=files) print(r.text) ``` 我們將圖像發送到 Flask 應用。該文件在`post()`方法的`files`屬性中指定。 ## JSON 格式 JSON （JavaScript 對象表示法）是一種輕量級的數據交換格式。人類很容易讀寫，機器也很容易解析和生成。 JSON 數據是鍵/值對的集合；在 Python 中，它是通過字典實現的。 ### 讀取 JSON 在第一個示例中，我們從 PHP 腳本讀取 JSON 數據。 `send_json.php` ```py <?php $data = [ 'name' => 'Jane', 'age' => 17 ]; header('Content-Type: application/json'); echo json_encode($data); ``` PHP 腳本發送 JSON 數據。它使用`json_encode()`函數完成該工作。 `read_json.py` ```py #!/usr/bin/env python3 import requests as req resp = req.get("http://localhost/send_json.php") print(resp.json()) ``` `read_json.py`讀取 PHP 腳本發送的 JSON 數據。 ```py print(resp.json()) ``` `json()`方法返回響應的 json 編碼內容（如果有）。 ```py $ ./read_json.py {'age': 17, 'name': 'Jane'} ``` 這是示例的輸出。 ### 發送 JSON 接下來，我們將 JSON 數據從 Python 腳本發送到 PHP 腳本。 `parse_json.php` ```py <?php $data = file_get_contents("php://input"); $json = json_decode($data , true); foreach ($json as $key => $value) { if (!is_array($value)) { echo "The $key is $value\n"; } else { foreach ($value as $key => $val) { echo "The $key is $value\n"; } } } ``` 該 PHP 腳本讀取 JSON 數據，并發送帶有已解析值的消息。 `send_json.py` ```py #!/usr/bin/env python3 import requests as req data = {'name': 'Jane', 'age': 17} resp = req.post("http://localhost/parse_json.php", json=data) print(resp.text) ``` 該腳本將 JSON 數據發送到 PHP 應用并讀取其響應。 ```py data = {'name': 'Jane', 'age': 17} ``` 這是要發送的數據。 ```py resp = req.post("http://localhost/parse_json.php", json=data) ``` 包含 JSON 數據的字典將傳遞給`json`參數。 ```py $ ./send_json.py The name is Jane The age is 17 ``` 這是示例輸出。 ## 從字典中檢索定義在以下示例中，我們在 [www.dictionary.com](http://www.dictionary.com) 上找到術語的定義。要解析 HTML，我們使用`lxml`模塊。 ```py $ pip install lxml ``` 我們使用`pip`工具安裝`lxml`模塊。 `get_term.py` ```py #!/usr/bin/env python3 import requests as req from lxml import html import textwrap term = "dog" resp = req.get("http://www.dictionary.com/browse/" + term) root = html.fromstring(resp.content) for sel in root.xpath("//span[contains(@class, 'one-click-content')]"): if sel.text: s = sel.text.strip() if (len(s) > 3): print(textwrap.fill(s, width=50)) ``` 在此腳本中，我們在`www.dictionary.com`上找到了術語狗的定義。 `lxml`模塊用于解析 HTML 代碼。 > **注意**：包含定義的標簽可能會在一夜之間發生變化。在這種情況下，我們需要調整腳本。 ```py from lxml import html ``` `lxml`模塊可用于解析 HTML。 ```py import textwrap ``` `textwrap`模塊用于將文本包裝到特定寬度。 ```py resp = req.get("http://www.dictionary.com/browse/" + term) ``` 為了執行搜索，我們在 URL 的末尾附加了該詞。 ```py root = html.fromstring(resp.content) ``` 我們需要使用`resp.content`而不是`resp.text`，因為`html.fromstring()`隱式地希望字節作為輸入。（`resp.content`以字節為單位返回內容，而`resp.text`以 Unicode 文本形式返回。 ```py for sel in root.xpath("//span[contains(@class, 'one-click-content')]"): if sel.text: s = sel.text.strip() if (len(s) > 3): print(textwrap.fill(s, width=50)) ``` 我們解析內容。主要定義位于`span`標簽內部，該標簽具有`one-click-content`屬性。我們通過消除多余的空白和雜散字符來改善格式。文字寬度最大為 50 個字符。請注意，此類解析可能會更改。 ```py $ ./get_term.py a domesticated canid, any carnivore of the dog family Canidae, having prominent canine teeth and, in the wild state, a long and slender muzzle, a deep-chested muscular body, a bushy tail, and large, erect ears. ... ``` 這是定義的部分列表。 ## Python Requests 流請求流正在傳輸音頻和/或視頻數據的連續流，同時正在使用較早的部分。 `Requests.iter_lines()`遍歷響應數據，一次一行。在請求上設置`stream=True`可以避免立即將內容讀取到內存中以獲得較大響應。 `streaming.py` ```py #!/usr/bin/env python3 import requests as req url = "https://docs.oracle.com/javase/specs/jls/se8/jls8.pdf" local_filename = url.split('/')[-1] r = req.get(url, stream=True) with open(local_filename, 'wb') as f: for chunk in r.iter_content(chunk_size=1024): f.write(chunk) ``` 該示例流式傳輸 PDF 文件并將其寫入磁盤。 ```py r = req.get(url, stream=True) ``` 在發出請求時將`stream`設置為`True`，除非我們消耗掉所有數據或調用`Response.close()`，否則請求無法釋放回池的連接。 ```py with open(local_filename, 'wb') as f: for chunk in r.iter_content(chunk_size=1024): f.write(chunk) ``` 我們按 1 KB 的塊讀取資源，并將其寫入本地文件。 ## Python Requests 憑證 `auth`參數提供基本的 HTTP 認證；它使用一個元組的名稱和密碼來用于領域。安全領域是一種用于保護 Web 應用資源的機制。 ```py $ sudo apt-get install apache2-utils $ sudo htpasswd -c /etc/nginx/.htpasswd user7 New password: Re-type new password: Adding password for user user7 ``` 我們使用`htpasswd`工具創建用于基本 HTTP 認證的用戶名和密碼。 ```py location /secure { auth_basic "Restricted Area"; auth_basic_user_file /etc/nginx/.htpasswd; } ``` 在 nginx `/etc/nginx/sites-available/default`配置文件中，我們創建一個安全頁面。領域的名稱是`"Restricted Area"`。 `index.html` ```py <!DOCTYPE html> <html lang="en"> <head> <title>Secure page</title> </head> <body> This is a secure page. </body> </html> ``` 在`/usr/share/nginx/html/secure`目錄中，我們有這個 HTML 文件。 `credentials.py` ```py #!/usr/bin/env python3 import requests as req user = 'user7' passwd = '7user' resp = req.get("http://localhost/secure/", auth=(user, passwd)) print(resp.text) ``` 該腳本連接到安全網頁；它提供訪問該頁面所需的用戶名和密碼。 ```py $ ./credentials.py <!DOCTYPE html> <html lang="en"> <head> <title>Secure page</title> </head> <body> This is a secure page. </body> </html> ``` 使用正確的憑據，`credentials.py`腳本返回受保護的頁面。在本教程中，我們使用了 Python Requests 模塊。您可能對以下相關教程感興趣： [Python 列表推導式](/articles/pythonlistcomprehensions/)， [Python SimpleJson 教程](/python/simplejson/)， [Python FTP 教程](/python/ftp/)， [OpenPyXL 教程](/articles/openpyxl/)，[ [Python CSV 教程](/python/csv/)和 [Python 教程](/lang/python/)。列出[所有 Python 教程](/all/#python)。