超文本傳輸協議 · UCB DS100 數據科學的原理與技巧

# 超文本傳輸協議 > 原文：[HTTP](https://www.textbook.ds100.org/ch/07/web_http.html) > > 校驗：[Kitty Du](https://github.com/miaoxiaozui2017) > > 自豪地采用[谷歌翻譯](https://translate.google.cn/) ```python # HIDDEN # Clear previously defined variables %reset -f # Set directory for data loading to work properly import os os.chdir(os.path.expanduser('~/notebooks/07')) ``` ## Http[](#Http) Http（AKA **H**yper**T**ext **T**transfer **P**rotocol）是一種 *請求-響應(request-response)* 協議，允許一臺計算機通過 Internet 與另一臺計算機對話。 ## 請求和響應[](#Requests-and-Responses) 互聯網允許計算機互相發送文本，但不會對文本包含的內容施加任何限制。HTTP 定義了一臺計算機（客戶端client）和另一臺計算機（服務器server）之間的文本通信結構。在這個協議中，客戶機向服務器提交一個 *請求*，是一個特殊格式的文本消息。服務器將送回一個*響應*文本到客戶端。命令行工具`curl`為我們提供了發送 HTTP 請求的簡單方法。在下面的輸出中，以`>`開頭的行表示請求中發送的文本；其余的行是服務器的響應。 ```bash $ curl -v https://httpbin.org/html ``` ``` > GET /html HTTP/1.1 > Host: httpbin.org > User-Agent: curl/7.55.1 > Accept: */* > < HTTP/1.1 200 OK < Connection: keep-alive < Server: meinheld/0.6.1 < Date: Wed, 11 Apr 2018 18:15:03 GMT < <html> <body> <h1>Herman Melville - Moby-Dick</h1> <p> Availing himself of the mild... </p> </body> </html> ``` 運行上面的`curl`命令會使客戶機的計算機構造一條如下所示的文本消息： ``` GET /html HTTP/1.1 Host: httpbin.org User-Agent: curl/7.55.1 Accept: */* {blank_line} ``` 此消息遵循特定的格式：以`GET /html HTTP/1.1`開頭，表示消息是對`/html`頁的 HTTP`GET`請求。表單 HTTP 頭后面的三行中的每一行，都是`curl`發送給服務器的可選信息。HTTP 頭的格式為`{name}: {value}`。最后，消息結尾的空白行告訴服務器消息在三個頭之后結束。注意，我們在上面的代碼段中用`{blank_line}`標記了空白行；在實際的消息中`{blank_line}`用空白行替換。然后，客戶端的計算機使用 Internet 將此消息發送到`https://httpbin.org`Web 服務器。服務器處理請求，并發送以下響應： ``` HTTP/1.1 200 OK Connection: keep-alive Server: meinheld/0.6.1 Date: Wed, 11 Apr 2018 18:15:03 GMT {blank_line} ``` 響應的第一行說明請求已成功完成。下面的三行組成了 HTTP 響應頭，這是服務器發送回客戶機的可選信息。最后，消息結束時的空白行告訴客戶端服務器已完成發送響應頭，然后將發送響應體： ``` <html> <body> <h1>Herman Melville - Moby-Dick</h1> <p> Availing himself of the mild... </p> </body> </html> ``` 幾乎每一個與互聯網交互的應用程序都使用這個 HTTP 協議。例如，在您的 Web 瀏覽器中訪問[https://httpbin.org/html](https://httpbin.org/html) 會發出與上面的`curl`命令相同的基本 HTTP 請求。您的瀏覽器沒有像上面那樣將響應顯示為純文本，而是識別出文本是一個 HTML 文檔，并將相應地顯示它。實際上，我們不會以文本形式寫出完整的 HTTP 請求。相反，我們使用像`curl`或 Python 庫這樣的工具來構造請求。 ### Python中[](#In-Python) Python **requests**庫允許我們用 Python 發出 HTTP 請求。下面的代碼發出的 HTTP 請求與運行`curl -v https://httpbin.org/html`相同。 ```python import requests url = "https://httpbin.org/html" response = requests.get(url) response ``` ``` <Response [200]> ``` ### 請求[](#The-Request) 讓我們仔細看看我們提出的請求。我們可以使用`response`對象訪問原始請求；我們在下面顯示請求的 HTTP 頭： ```python request = response.request for key in request.headers: # The headers in the response are stored as a dictionary. print(f'{key}: {request.headers[key]}') ``` ``` User-Agent: python-requests/2.12.4 Accept-Encoding: gzip, deflate Accept: */* Connection: keep-alive ``` 每個 HTTP 請求都有一個類型。在本例中，我們使用了一個從服務器檢索信息的`GET`請求。 ```python request.method ``` ``` 'GET' ``` ### 響應[](#The-Response) 讓我們檢查一下從服務器收到的響應。首先，我們將打印響應的 HTTP 頭。 ```python for key in response.headers: print(f'{key}: {response.headers[key]}') ``` ``` Connection: keep-alive Server: gunicorn/19.7.1 Date: Wed, 25 Apr 2018 18:32:51 GMT Content-Type: text/html; charset=utf-8 Content-Length: 3741 Access-Control-Allow-Origin: * Access-Control-Allow-Credentials: true X-Powered-By: Flask X-Processed-Time: 0 Via: 1.1 vegur ``` HTTP 響應包含一個狀態碼，是一個特殊的數字，表示請求是成功還是失敗。狀態碼`200`表示請求成功。 ```python response.status_code ``` ``` 200 ``` 最后，我們顯示響應內容的前 100 個字符（整個響應內容太長，無法在這里很好地顯示）。 ```python response.text[:100] ``` ``` '<!DOCTYPE html>\n<html>\n <head>\n </head>\n <body>\n <h1>Herman Melville - Moby-Dick</h1>\n\n ' ``` ## 請求類型[](#Types-of-Requests) 我們上面的請求是一個`GET`Http 請求。有多種 HTTP 請求類型；最重要的兩種是`GET`和`POST`請求。 ### GET 請求[](#GET-Requests) `GET`請求用于從服務器檢索信息。由于您每次在地址欄中輸入 URL 時您的 Web 瀏覽器都會發出`GET`請求，`GET`請求是最常見的 HTTP 請求類型。 `curl`默認使用`GET`請求，因此運行`curl https://www.google.com/`會向`https://www.google.com/`發出`GET`請求。 ### POST請求[](#POST-Request) `POST`請求用于將信息從客戶端發送到服務器。例如，一些網頁包含供用戶填寫的表單，例如登錄表單。單擊“提交”按鈕后，大多數 Web 瀏覽器都會發出一個`POST`請求，將表單數據發送到服務器進行處理。讓我們來看一個將`'sam'`作為參數`'name'`發送的`POST`請求的例子。這個可以通過在命令行上運行 **`curl -d 'name=sam' https://httpbin.org/post`** 來完成。請注意，這次我們的請求有一個主體（用`POST`請求的參數填充），響應的內容與之前的`GET`響應不同。與 HTTP 頭類似，以`POST`請求發送的數據使用鍵值格式。在 Python 中，我們可以通過使用`requests.post`并將字典作為參數傳入來發出`POST`請求。 ```python post_response = requests.post("https://httpbin.org/post", data={'name': 'sam'}) post_response ``` ``` <Response [200]> ``` 服務器將用狀態碼響應，以表示`POST`請求是否成功完成。此外，服務器通常會發送一個響應主體來顯示給客戶機。 ```python post_response.status_code ``` ``` 200 ``` ```python post_response.text ``` ``` '{\n "args": {}, \n "data": "", \n "files": {}, \n "form": {\n "name": "sam"\n }, \n "headers": {\n "Accept": "*/*", \n "Accept-Encoding": "gzip, deflate", \n "Connection": "close", \n "Content-Length": "8", \n "Content-Type": "application/x-www-form-urlencoded", \n "Host": "httpbin.org", \n "User-Agent": "python-requests/2.12.4"\n }, \n "json": null, \n "origin": "136.152.143.72", \n "url": "https://httpbin.org/post"\n}\n' ``` ## 響應狀態碼類型[](#Types-of-Response-Status-Codes) 先前的 HTTP 響應的 HTTP 狀態碼為`200`。此狀態碼表示請求已成功完成。還有數百個其他的 HTTP 狀態碼。謝天謝地，它們被分為不同的類別，以便于記憶： * **100s**-信息：需要更多來自客戶端或服務器的輸入 _（例如 100 Continue(繼續)、102 Processing(處理中)）_ * **200S**-成功：客戶端請求成功 _（例如 200 OK(成功)，202 Accepted(已接受)）_ * **300s**-重定向：請求的 URL 位于其他位置；可能需要用戶的進一步操作 _（例如，300 Multiple Choices(多個選項)，301 Moved Permanently(永久移動)）_ * **400s**-客戶端錯誤：客戶端錯誤 _（例如 400 Bad Request(錯誤請求)，403 Forbidden(禁止)，404 Not Found(未找到)）_ * **500s**-服務器錯誤：服務器端錯誤或服務器無法執行請求 _（例如，500 Internal Server Error(內部服務器錯誤)，503 Service Unavailable(服務不可用)）_ 我們可以看看其中一些錯誤的例子。 ```python # This page doesn't exist, so we get a 404 page not found error url = "https://www.youtube.com/404errorwow" errorResponse = requests.get(url) print(errorResponse) ``` ``` <Response [404]> ``` ```python # This specific page results in a 500 server error url = "https://httpstat.us/500" serverResponse = requests.get(url) print(serverResponse) ``` ``` <Response [500]> ``` ## 摘要[](#Summary) 我們介紹了 HTTP 協議，這是使用 Web 的應用程序的基本通信方法。盡管協議指定了一種特定的文本格式，但我們通常使用其他工具來為我們發出 HTTP 請求，例如命令行工具`curl`和 Python 庫`requests`。