在pipelines中定義下載圖片的pipeline
~~~
from scrapy.pipelines.images import ImagesPipeline
class ImgPipeline(ImagesPipeline):
def item_completed(self, results, item, info):
if results:
item['img_path'] = []
for key, value in results:
if 'path' in value:
item['img_path'].append(value['path'])
else:
item['img_path'] = ''
else:
item['img_path'] = ''
return item
~~~
在settings.py的ITEM_PIPELINES中添加此pipeline
并定義下載圖片保存路徑
~~~
import os
dir_path = os.path.dirname(os.path.abspath(os.curdir))
此處有個坑,絕對不能使用__file__,此變量與scrapyd沖突,使用后在部署運行時會拋出異常
IMAGES_URLS_FIELD = 'img_url'#item里的圖片url字段
IMAGES_STORE = os.path.join(dir_path, 'images')
~~~