<ruby id="bdb3f"></ruby>

    <p id="bdb3f"><cite id="bdb3f"></cite></p>

      <p id="bdb3f"><cite id="bdb3f"><th id="bdb3f"></th></cite></p><p id="bdb3f"></p>
        <p id="bdb3f"><cite id="bdb3f"></cite></p>

          <pre id="bdb3f"></pre>
          <pre id="bdb3f"><del id="bdb3f"><thead id="bdb3f"></thead></del></pre>

          <ruby id="bdb3f"><mark id="bdb3f"></mark></ruby><ruby id="bdb3f"></ruby>
          <pre id="bdb3f"><pre id="bdb3f"><mark id="bdb3f"></mark></pre></pre><output id="bdb3f"></output><p id="bdb3f"></p><p id="bdb3f"></p>

          <pre id="bdb3f"><del id="bdb3f"><progress id="bdb3f"></progress></del></pre>

                <ruby id="bdb3f"></ruby>

                合規國際互聯網加速 OSASE為企業客戶提供高速穩定SD-WAN國際加速解決方案。 廣告
                ### **第一步:創建項目** ~~~ scrapy startproject douyu ~~~ ### **第二步:創建爬蟲** ~~~ scrapy genspider douyucdn http://capi.douyucdn.cn ~~~ ### **第三步:編寫items.py,明確需要提取的數據** ~~~ import scrapy class DouyuItem(scrapy.Item): nickname = scrapy.Field() headimg = scrapy.Field() ~~~ ### **第四步:編寫spiders/xxx.py 編寫爬蟲文件,處理請求和響應,以及提取數據(yeild item)** ~~~ import scrapy import json from douyu.items import DouyuItem class DouyucdnSpider(scrapy.Spider): name = 'douyucdn' allowed_domains = ['douyucdn.cn'] baseUrl='http://capi.douyucdn.cn/api/v1/getVerticalRoom?limit=20&offset=' offset=0 start_urls = [baseUrl+str(offset)] def parse(self, response): data_list=json.loads(response.body)['data'] if not len(data_list): return for data in data_list: item=DouyuItem() item['headimg']=data['vertical_src'] item['nickname']=data['nickname'] yield item self.offset+=20 yield scrapy.Request(self.baseUrl+str(self.offset),callback=self.parse) ~~~ ### **第五步:編寫pipelines.py管道文件,處理spider返回item數據** ~~~ from scrapy.pipelines.images import ImagesPipeline from douyu.settings import IMAGES_STORE as images_store import scrapy import os class DouyuPipeline(ImagesPipeline): def get_media_requests(self, item, info): imgUrl=item['headimg'] yield scrapy.Request(imgUrl) def item_completed(self, results, item, info): #取出results中的文件地址 image_path=[x["path"] for ok , x in results if ok] #然后拼接一下具體路徑,這里需要引入settings.py中的IMAGES_STORE的值 old_path=images_store+image_path[0] new_path=images_store+'named/'+item['nickname']+'.jpg' os.rename(old_path,new_path) return item ~~~ ### **第六步:編寫settings.py,啟動管理文件,以及其他相關設置** > 因為要偽裝成手機訪問,所以要指定user-agent,可以到http://www.fynas.com/ua 中找到自己想要偽裝的手機信息 ~~~ USER_AGENT = 'Mozilla/5.0 (iPhone 84; CPU iPhone OS 10_3_3 like Mac OS X) AppleWebKit/603.3.8 (KHTML, like Gecko) Version/10.0 MQQBrowser/7.8.0 Mobile/14G60 Safari/8536.25 MttCustomUA/2 QBWebViewType/1 WKType/1' ~~~ > 因為我們要把主播的照片保存到本地,所以需要指定保存的地址 ~~~ IMAGES_STORE = "C:/Users/Administrator/Desktop/douyu/images/" ~~~ > 因為涉及到圖片處理,所以需要應用到第三方庫Pillow,所以如果之前沒有安裝過,需要先安裝一下,不然會有關于Pil的報錯 ~~~ pip install Pillow ~~~ 因為有些網站會做robot過濾,所以要把robot關掉 ~~~ ROBOTSTXT_OBEY = False ~~~ 然后寫一下管道名稱: ~~~ ITEM_PIPELINES = { 'douyu.pipelines.DouyuPipeline': 300, } ~~~ ### **第七步:執行爬蟲** 備注: 如何提取下面這段代碼中的path值? ~~~ results='[(True, {'url': 'https://rpic.douyucdn.cn/live-cover/appCovers/2018/02/01/4189383_20180201171138_big.jpg', 'path': 'full/811a893386a55177f36abcde290eaf16933e5888.jpg', 'checksum': '0fd2746c8711d9eb6c7bc3db138f0ac4'})]' ~~~ 用下面的方法 ~~~ path=[x["path"] for ok ,x in results if ok ] ~~~
                  <ruby id="bdb3f"></ruby>

                  <p id="bdb3f"><cite id="bdb3f"></cite></p>

                    <p id="bdb3f"><cite id="bdb3f"><th id="bdb3f"></th></cite></p><p id="bdb3f"></p>
                      <p id="bdb3f"><cite id="bdb3f"></cite></p>

                        <pre id="bdb3f"></pre>
                        <pre id="bdb3f"><del id="bdb3f"><thead id="bdb3f"></thead></del></pre>

                        <ruby id="bdb3f"><mark id="bdb3f"></mark></ruby><ruby id="bdb3f"></ruby>
                        <pre id="bdb3f"><pre id="bdb3f"><mark id="bdb3f"></mark></pre></pre><output id="bdb3f"></output><p id="bdb3f"></p><p id="bdb3f"></p>

                        <pre id="bdb3f"><del id="bdb3f"><progress id="bdb3f"></progress></del></pre>

                              <ruby id="bdb3f"></ruby>

                              哎呀哎呀视频在线观看