對爬取到的網頁進行翻頁 · scrapy教程

## 案例網站`http://m.soxs.cc/shuku/` ## 邏輯： 1. 我們需要定義一個入口將url放進啟動程序 2. 我們寫一個獲取url方法，用之前的xpath選擇器進行獲取篩選， 3. 在主程序中我們調用上面的方法，如果有連接則在利用 `scrapy.Request` 進行訪問 4. yield scrapy.Request("url",callback = self.回調方法名) 5. 請求寫法固定 ``` import scrapy class booklist2Spider(scrapy.Spider): name = 'booklist2' allowed_domains = ['m.soxs.cc'] # 定義只爬取變量內的網站 start_urls = ["http://m.soxs.cc/"]# 定義爬取的url， # 程序入口 def parse(self, response): # 爬蟲啟動后進入parse方法 print("程序加載完成。。。") yield scrapy.Request("http://m.soxs.cc/shuku/",callback = self.next) # 固定寫法 nextLink是url ，getrepones是請求成功調用方法 # 這里是主程序用來處理每頁數據 def next(self, response): print(response) # 輸出爬取狀態 200為成功獲取內容 nextLink = self.getnextlink(response) if nextLink == False: print("到最后一頁了。。。") return yield scrapy.Request(nextLink,callback = self.next) # 獲取下一頁鏈接方法 def getnextlink(self, response): list = response.xpath("//div[@class='pagelist']/a") is_last = 1 for i in list: if i.xpath("text()")[0].extract() == "下一頁": url = "http://"+self.allowed_domains[0]+i.xpath("@href").extract_first() return url return False ``` 執行結果： ![](https://img.kancloud.cn/66/4f/664f649bdaca54593247e6350d2a064a_915x683.png)