數據清洗及入庫pipelines.py · Python秘籍

## 可以直接保存為文件 ``` scrapy crawl cake -o cake.csv scrapy crawl cake -o cake.xml scrapy crawl cake -o cake.json scrapy crawl cake -o cake.pickle scrapy crawl cake -o cake.marshal scrapy crawl cake -o ftp://user:pass@ftp.example.com/path/to/cake.csv ``` - scrapy輸出的json文件中顯示中文 scrapy用-o filename.json 輸出時，會默認使用unicode編碼，當內容為中文時，輸出的json文件不便于查看可以在setting.py文件中修改默認的輸出編碼方式，只需要在setting.py中增加如下語句（默認似乎是沒有指定的，所以要增加，如果默認有，就直接修改） >FEED_EXPORT_ENCODING = 'utf-8' - pipelines.py ``` # -*- coding: utf-8 -*- # Define your item pipelines here # # Don't forget to add your pipeline to the ITEM_PIPELINES setting # See: https://doc.scrapy.org/en/latest/topics/item-pipeline.html import sqlite3 class MeituanPipeline(object): def open_spider(self,spider): # 爬蟲啟動時,連接數據庫 self.con = sqlite3.connect("meituan.sqlite") # self.cu用來執行sql語句 self.cu = self.con.cursor() def process_item(self, item, spider): # print(spider.name) # 插入數據庫,format格式化values insert_sql = "insert into cake (title, money) values('{}', '{}')".format(item['title'], item['money']) print(insert_sql) self.cu.execute(insert_sql) # 所有的數據修改需要提交 self.con.commit() return item # 爬蟲結束時,關閉連接 def spider_close(self,spider): self.con.close() ```