## **爬取當當前500數據-如何設計一個完整的請求程序**
實戰:爬取當當網 Top 500 本五星好評書籍
#url樣式
[http://bang.dangdang.com/books/fivestars/01.00.00.00.00.00-recent30-0-0-1-1](http://bang.dangdang.com/books/fivestars/01.00.00.00.00.00-recent30-0-0-1-1)
......
[http://bang.dangdang.com/books/fivestars/01.00.00.00.00.00-recent30-0-0-1-3](http://bang.dangdang.com/books/fivestars/01.00.00.00.00.00-recent30-0-0-1-3)
......
[http://bang.dangdang.com/books/fivestars/01.00.00.00.00.00-recent30-0-0-1-25](http://bang.dangdang.com/books/fivestars/01.00.00.00.00.00-recent30-0-0-1-25)
<span style="color:red;">1.首次嘗試</span>
~~~
import requests
import random
url = "http://bang.dangdang.com/books/fivestars/01.00.00.00.00.00-recent30-0-0-1-1"
user_agent_list =[
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; en) Opera 9.50",
"Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko",
"Mozilla/4.0 (compatible; MSIE 6.0; ) Opera/UCWEB7.0.2.37/28/999",
]
header = {
"User-Agent":random.choice(user_agent_list)
}
print(header)
response = requests.get(url,headers=header)
print(response.status_code)
print(response.text)
~~~
<span style="color:red;">2.二次改進循環遍歷獲取全部</span>
~~~
import requests
import random
user_agent_list =[
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; en) Opera 9.50",
"Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko",
"Mozilla/4.0 (compatible; MSIE 6.0; ) Opera/UCWEB7.0.2.37/28/999",
]
for x in range(26):
url = "http://bang.dangdang.com/books/fivestars/01.00.00.00.00.00-recent30-0-0-1-{}"
header = {
"User-Agent": random.choice(user_agent_list)
}
print(header)
url = url.format(x)
print(url)
response = requests.get(url,headers=header)
print(response.status_code)
~~~
<span style="color:red;">3.三次改進,增加捕獲異常,增加時間等待</span>
~~~
import requests
import random
user_agent_list =[
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; en) Opera 9.50",
"Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko",
"Mozilla/4.0 (compatible; MSIE 6.0; ) Opera/UCWEB7.0.2.37/28/999",
]
for x in range(26):
url = "http://bang.dangdang.com/books/fivestars/01.00.00.00.00.00-recent30-0-0-1-{}"
header = {
"User-Agent": random.choice(user_agent_list)
}
print(header)
url = url.format(x)
print(url)
#對請求做錯誤兼容,這樣就不會造成程序的異常終止
try:
response = requests.get(url,headers=header)
except Exception as e:
print(e) # 打印錯誤
continue
print(response.status_code)
~~~
<span style="color:red;">4.知識點</span>
1.Python2.6 開始,新增了一種格式化字符串的函數str.format(),它增強了字符串格式化的功能。
基本語法是通過{},和:來代替以前的%。
例如:
~~~
res = "{} {}".format("hello", "world") # 不設置指定位置,按默認順序
print(res)
hello world
~~~
~~~
res = "{0} {1}".format("hello", "world") # 設置指定位置
print(res)
'hello world'
~~~
~~~
res = "{1} {0} {1}".format("hello", "world") # 設置指定位置
print(res)
'world hello world'
~~~
~~~
res = "我叫:{name}, 我家在 {add}".format( add="Shanxi",name="sunyuwei")
print(res)
~~~
2. try-except異常處理
參考網址:
[https://www.runoob.com/python/python-exceptions.html](https://www.runoob.com/python/python-exceptions.html)
~~~
try:
正常的操作
......................
except:
發生異常,執行這塊代碼
......................
else:
如果沒有異常執行這塊代碼
~~~
3.try-finally 語句
try-finally 語句無論是否發生異常都將執行最后的代碼。
~~~
try:
正常的操作
......................
finally:
無論是否發生異常,都執行這塊代碼
......................
~~~