<ruby id="bdb3f"></ruby>

    <p id="bdb3f"><cite id="bdb3f"></cite></p>

      <p id="bdb3f"><cite id="bdb3f"><th id="bdb3f"></th></cite></p><p id="bdb3f"></p>
        <p id="bdb3f"><cite id="bdb3f"></cite></p>

          <pre id="bdb3f"></pre>
          <pre id="bdb3f"><del id="bdb3f"><thead id="bdb3f"></thead></del></pre>

          <ruby id="bdb3f"><mark id="bdb3f"></mark></ruby><ruby id="bdb3f"></ruby>
          <pre id="bdb3f"><pre id="bdb3f"><mark id="bdb3f"></mark></pre></pre><output id="bdb3f"></output><p id="bdb3f"></p><p id="bdb3f"></p>

          <pre id="bdb3f"><del id="bdb3f"><progress id="bdb3f"></progress></del></pre>

                <ruby id="bdb3f"></ruby>

                ??碼云GVP開源項目 12k star Uniapp+ElementUI 功能強大 支持多語言、二開方便! 廣告
                使用scrapy內置方法下載文件的好處: 1. 避免重新下載最近已經下載過的數據; 2. 可以方便的指定文件存儲的路徑; 3. 可以將下載的圖片轉換成通用的格式。如: png、jpg; 4. 可以方便的生成縮略圖; 5. 可以方便的檢測圖片的寬和高,確保他們滿足最小限制; 6. 異步下載,效率非常高; scrapy提供了`FilesPipeline`和`ImagesPipeline`兩種Pipeline來幫助我們自動將圖片存儲到我們的電腦上,可以認為它們就是下載器,這兩種方式沒有太大的區別,并且它們是可以同時使用的。 <br/> 兩種Pipeline的使用步驟如下: **1. 在`items.py`中定義`image_urls`和`images`字段** ```python """ @Date 2021/4/9 """ import scrapy class CareerstencentItem(scrapy.Item): # 使用ImagesPipeline則定義image_urls和images image_urls = scrapy.Field() images = scrapy.Field() # FilesPipeline則定義file_urls和files # file_urls = scrapy.Field() # files = scrapy.Field() pass ``` <br/> **2. 在`settings.py`中配置文件相關屬性** ```python ############## 必須指定的配置 ################## # 指定Pipeline類型 ITEM_PIPELINES = { 'scrapy.pipelines.images.ImagesPipeline': 1 #'scrapy.pipeline.files.FilesPipeline': 1 } # 下載文件到你的電腦上的存儲路徑 # 如果你的存儲的路徑為F:/images/,scrapy則會將文件自動保存到 F:/images/full 目錄 # 存儲到你的電腦上的圖片重命名為圖片url路徑的hash值, # 如圖片的路徑為http://www.example.com/image.jpg,則存儲到你的電腦的圖片命令為3afec3b4765f8f0a07b78f98c07b83f013567a0a.jpg IMAGES_STORE = "F:/images/" # FILES_STORE = "F:/images/" ############## 可選的配置 ################## # 說明90天或30天內不會下載同一張圖片,可避免重復下載 # FILES_EXPIRES = 90 IMAGES_EXPIRES = 30 # 生成縮略圖 # 如果設置的大小大于原始圖片的大小,則默認為原始圖片大小 # 如果小于原始圖片則采用你定義的大小 # 保存的路徑有如下三個路徑 # F:/images/full/63bbfea82b8880ed33cdb762aa11fab722a90a24.jpg 原始大小 # F:/images/thumbs/small/63bbfea82b8880ed33cdb762aa11fab722a90a24.jpg # F:/images/thumbs/big/63bbfea82b8880ed33cdb762aa11fab722a90a24.jpg IMAGES_THUMBS = { 'small': (50, 50), 'big': (270, 270), } # 過濾height<110并且width<110的圖片,但是不影響縮略圖的生成 IMAGES_MIN_HEIGHT = 110 IMAGES_MIN_WIDTH = 110 ``` <br/> **3. 在你的爬蟲代碼中返回文件的資源路徑** ```python import scrapy from CareersTencent.items import CareerstencentItem class PicturesSpider(scrapy.Spider): name = 'pictures' allowed_domains = ['www.wxapp-union.com'] start_urls = ['http://www.wxapp-union.com/'] def parse(self, response): item = CareerstencentItem() img_src_list = response.xpath("//div[@id='diy_con1']//ul[@id='itemContainer']//img/@src").extract() for src in img_src_list: # https://www.wxapp-union.com/./data/attachment/block/db/db034088ff685c241b1f7586886869c3.jpg item["image_urls"] = ["https://www.wxapp-union.com/" + src] yield item pass ```
                  <ruby id="bdb3f"></ruby>

                  <p id="bdb3f"><cite id="bdb3f"></cite></p>

                    <p id="bdb3f"><cite id="bdb3f"><th id="bdb3f"></th></cite></p><p id="bdb3f"></p>
                      <p id="bdb3f"><cite id="bdb3f"></cite></p>

                        <pre id="bdb3f"></pre>
                        <pre id="bdb3f"><del id="bdb3f"><thead id="bdb3f"></thead></del></pre>

                        <ruby id="bdb3f"><mark id="bdb3f"></mark></ruby><ruby id="bdb3f"></ruby>
                        <pre id="bdb3f"><pre id="bdb3f"><mark id="bdb3f"></mark></pre></pre><output id="bdb3f"></output><p id="bdb3f"></p><p id="bdb3f"></p>

                        <pre id="bdb3f"><del id="bdb3f"><progress id="bdb3f"></progress></del></pre>

                              <ruby id="bdb3f"></ruby>

                              哎呀哎呀视频在线观看