發行說明 · Scrapy 1.6 中文文檔

# 發行說明 > 譯者：[OSGeo 中國](https://www.osgeo.cn/) ## Scrapy 1.6.0（2019-01-30）亮點： * 更好的Windows支持； * python 3.7兼容性； * 大的文檔改進，包括從 `.extract_first()` + `.extract()` API到 `.get()` + `.getall()` 應用程序編程接口； * Feed 導出、文件管道和媒體管道改進； * 更好的擴展性： [`item_error`](topics/signals.html#std:signal-item_error) 和 [`request_reached_downloader`](topics/signals.html#std:signal-request_reached_downloader) 信號； `from_crawler` 支持Feed 導出、Feed 倉庫和雙過濾器。 * `scrapy.contracts` 修復和新功能； * Telnet控制臺安全性改進，首次作為后端發布于 [Scrapy 1.5.2（2019-01-22）](#release-1-5-2) ； * 清理棄用的代碼； * 各種錯誤修復、小的新特性和整個代碼庫的可用性改進。 ### 選擇器API更改雖然這些不是scrapy本身的更改，而是scrapy用于xpath/css選擇器的parsel_u庫中的更改，但這些更改在這里值得一提。Scrapy現在依賴于parsel>=1.5，并且Scrapy文檔會更新以跟蹤最近的 `parsel` API慣例。最明顯的變化是 `.get()` 和 `.getall()` 選擇器方法現在比 `.extract_first()` 和 `.extract()` . 我們認為這些新方法會產生更簡潔和可讀的代碼。見 [extract（）和extract_first（）。](topics/selectors.html#old-extraction-api) 了解更多詳細信息。注解目前有 **no plans** 貶低 `.extract()` 和 `.extract_first()` 方法。另一個有用的新特性是 `Selector.attrib` 和 `SelectorList.attrib` 屬性，這使得獲取HTML元素的屬性更加容易。見 [選擇元素屬性](topics/selectors.html#selecting-attributes) . CSS選擇器緩存在parsel>=1.5中，這使得在多次使用相同的css路徑時更快。這是非常常見的情況下，剪貼 Spider ：回調通常被稱為多次，在不同的網頁。如果使用自定義 `Selector` 或 `SelectorList` 子類 **backward incompatible** Parsel中的更改可能會影響代碼。見 [parsel changelog](https://parsel.readthedocs.io/en/latest/history.html) 詳細描述，以及完整的改進列表。 ### Telnet控制臺向后不兼容: Scrapy的telnet控制臺現在需要用戶名和密碼。見 [遠程登錄控制臺](topics/telnetconsole.html#topics-telnetconsole) 了解更多詳細信息。此更改修復了安全問題; 看見 [Scrapy 1.5.2（2019-01-22）](#release-1-5-2) 發布詳細說明。 ### 新的可擴展性功能 * `from_crawler` 對Feed 導出和Feed倉庫增加了支持。除此之外，它還允許從自定義飼料倉庫和出口商訪問零碎設置。（ [issue 1605](https://github.com/scrapy/scrapy/issues/1605) ， [issue 3348](https://github.com/scrapy/scrapy/issues/3348) ） * `from_crawler` 對雙過濾器增加了支持（ [issue 2956](https://github.com/scrapy/scrapy/issues/2956) ）；這允許從雙面打印器訪問設置或 Spider 。 * [`item_error`](topics/signals.html#std:signal-item_error) 在管道中發生錯誤時激發（ [issue 3256](https://github.com/scrapy/scrapy/issues/3256) ； * [`request_reached_downloader`](topics/signals.html#std:signal-request_reached_downloader) 當下載程序收到新請求時激發；此信號可能有用，例如，對于自定義計劃程序有用（ [issue 3393](https://github.com/scrapy/scrapy/issues/3393) ） * 新建SiteMapSpider [`sitemap_filter()`](topics/spiders.html#scrapy.spiders.SitemapSpider.sitemap_filter "scrapy.spiders.SitemapSpider.sitemap_filter") 方法，該方法允許根據SiteMapSpider子類中的屬性選擇站點地圖條目（ [issue 3512](https://github.com/scrapy/scrapy/issues/3512) ） * 下載程序處理程序的延遲加載現在是可選的；這使得在自定義下載程序處理程序中能夠更好地處理初始化錯誤。（ [issue 3394](https://github.com/scrapy/scrapy/issues/3394) ） ### 新的文件管道和媒體管道功能 * 顯示s3filestore的更多選項： [`AWS_ENDPOINT_URL`](topics/settings.html#std:setting-AWS_ENDPOINT_URL) ， [`AWS_USE_SSL`](topics/settings.html#std:setting-AWS_USE_SSL) ， [`AWS_VERIFY`](topics/settings.html#std:setting-AWS_VERIFY) ， [`AWS_REGION_NAME`](topics/settings.html#std:setting-AWS_REGION_NAME) . 例如，這允許使用可選的或自托管的與AWS兼容的提供程序（ [issue 2609](https://github.com/scrapy/scrapy/issues/2609) ， [issue 3548](https://github.com/scrapy/scrapy/issues/3548) ） * 對谷歌云存儲的ACL支持： [`FILES_STORE_GCS_ACL`](topics/media-pipeline.html#std:setting-FILES_STORE_GCS_ACL) 和 [`IMAGES_STORE_GCS_ACL`](topics/media-pipeline.html#std:setting-IMAGES_STORE_GCS_ACL) （ [issue 3199](https://github.com/scrapy/scrapy/issues/3199) ） ### `scrapy.contracts` 改進 * 更好地處理合同代碼中的異常（ [issue 3377](https://github.com/scrapy/scrapy/issues/3377) ； * `dont_filter=True` 用于合同請求，該請求允許使用相同的URL測試不同的回調（ [issue 3381](https://github.com/scrapy/scrapy/issues/3381) ； * `request_cls` 合同子類中的屬性允許在合同中使用不同的請求類，例如FormRequest（ [issue 3383](https://github.com/scrapy/scrapy/issues/3383) ） * 合同中的固定errback處理，例如，對于為返回非200響應的URL執行合同的情況（ [issue 3371](https://github.com/scrapy/scrapy/issues/3371) ） ### 可用性改進 * robotstxtmiddleware的更多統計信息（ [issue 3100](https://github.com/scrapy/scrapy/issues/3100) ） * 信息日志級別用于顯示telnet主機/端口（ [issue 3115](https://github.com/scrapy/scrapy/issues/3115) ） * 在robotstxtmiddleware中將消息添加到ignorerequest（ [issue 3113](https://github.com/scrapy/scrapy/issues/3113) ） * 更好地驗證 `url` 論點 `Response.follow` （ [issue 3131](https://github.com/scrapy/scrapy/issues/3131) ） * spider初始化出錯時，從scrapy命令返回非零退出代碼（ [issue 3226](https://github.com/scrapy/scrapy/issues/3226) ） * 鏈接提取改進：“ftp”添加到方案列表中（ [issue 3152](https://github.com/scrapy/scrapy/issues/3152) ）將“flv”添加到常用視頻擴展（ [issue 3165](https://github.com/scrapy/scrapy/issues/3165) ） * 禁用導出程序時出現更好的錯誤消息（ [issue 3358](https://github.com/scrapy/scrapy/issues/3358) ； * `scrapy shell --help` 提到本地文件所需的語法（ `./file.html` - [issue 3496](https://github.com/scrapy/scrapy/issues/3496) . * referer頭值添加到rfpdupefilter日志消息中（ [issue 3588](https://github.com/scrapy/scrapy/issues/3588) ） ### 錯誤修復 * 修復了Windows下.csv導出中多余空行的問題（ [issue 3039](https://github.com/scrapy/scrapy/issues/3039) ； * 在為磁盤隊列序列化對象時正確處理python 3中的picking錯誤（ [issue 3082](https://github.com/scrapy/scrapy/issues/3082) ） * 復制請求時標志現在被保留（ [issue 3342](https://github.com/scrapy/scrapy/issues/3342) ； * formRequest.from_response clickdata不應忽略帶有 `input[type=image]` （ [issue 3153](https://github.com/scrapy/scrapy/issues/3153) ） * formRequest.from響應應保留重復的密鑰（ [issue 3247](https://github.com/scrapy/scrapy/issues/3247) ） ### 文檔改進 * 重新編寫文檔是為了建議.get/.getall API而不是.extract/.extract_。也， [選擇器](topics/selectors.html#topics-selectors) 文檔被更新并重新構造以匹配最新的Parsel文檔；它們現在包含更多的主題，例如 [選擇元素屬性](topics/selectors.html#selecting-attributes) 或 [CSS選擇器的擴展](topics/selectors.html#topics-selectors-css-extensions) （ [issue 3390](https://github.com/scrapy/scrapy/issues/3390) ） * [使用瀏覽器的開發人員工具進行抓取](topics/developer-tools.html#topics-developer-tools) 是一個新的教程，它取代了舊的火狐和Firebug教程（ [issue 3400](https://github.com/scrapy/scrapy/issues/3400) ） * Scrapy_項目環境變量記錄在案（ [issue 3518](https://github.com/scrapy/scrapy/issues/3518) ； * 安裝說明中添加了故障排除部分（ [issue 3517](https://github.com/scrapy/scrapy/issues/3517) ； * 改進了教程中初學者資源的鏈接（ [issue 3367](https://github.com/scrapy/scrapy/issues/3367) ， [issue 3468](https://github.com/scrapy/scrapy/issues/3468) ； * 固定的 [`RETRY_HTTP_CODES`](topics/downloader-middleware.html#std:setting-RETRY_HTTP_CODES) 文檔中的默認值（ [issue 3335](https://github.com/scrapy/scrapy/issues/3335) ； * 移除未使用的素材 `DEPTH_STATS` 文檔選項（ [issue 3245](https://github.com/scrapy/scrapy/issues/3245) ； * 其他清理（ [issue 3347](https://github.com/scrapy/scrapy/issues/3347) ， [issue 3350](https://github.com/scrapy/scrapy/issues/3350) ， [issue 3445](https://github.com/scrapy/scrapy/issues/3445) ， [issue 3544](https://github.com/scrapy/scrapy/issues/3544) ， [issue 3605](https://github.com/scrapy/scrapy/issues/3605) ） ### 折舊清除 1.0以前版本的 Scrapy 模塊名稱的兼容性墊片已移除（ [issue 3318](https://github.com/scrapy/scrapy/issues/3318) ）： * `scrapy.command` * `scrapy.contrib` （所有子模塊） * `scrapy.contrib_exp` （所有子模塊） * `scrapy.dupefilter` * `scrapy.linkextractor` * `scrapy.project` * `scrapy.spider` * `scrapy.spidermanager` * `scrapy.squeue` * `scrapy.stats` * `scrapy.statscol` * `scrapy.utils.decorator` 見 [模塊重新定位](#module-relocations) 有關詳細信息，或使用Scrapy 1.5.x Deprecation Warnings中的建議更新代碼。其他折舊移除： * 已刪除不推薦使用的scrapy.interfaces.ispIderManager；請使用scrapy.interfaces.ispIderLoader。 * 已棄用 `CrawlerSettings` 類已刪除（ [issue 3327](https://github.com/scrapy/scrapy/issues/3327) ） * 已棄用 `Settings.overrides` 和 `Settings.defaults` 屬性被刪除（ [issue 3327](https://github.com/scrapy/scrapy/issues/3327) ， [issue 3359](https://github.com/scrapy/scrapy/issues/3359) ） ### 其他改進、清理 * 所有碎片測試現在都在Windows上通過；碎片測試套件在CI上的Windows環境中執行（ [issue 3315](https://github.com/scrapy/scrapy/issues/3315) ） * python 3.7支持（ [issue 3326](https://github.com/scrapy/scrapy/issues/3326) ， [issue 3150](https://github.com/scrapy/scrapy/issues/3150) ， [issue 3547](https://github.com/scrapy/scrapy/issues/3547) ） * 測試和CI修復（ [issue 3526](https://github.com/scrapy/scrapy/issues/3526) ， [issue 3538](https://github.com/scrapy/scrapy/issues/3538) ， [issue 3308](https://github.com/scrapy/scrapy/issues/3308) ， [issue 3311](https://github.com/scrapy/scrapy/issues/3311) ， [issue 3309](https://github.com/scrapy/scrapy/issues/3309) ， [issue 3305](https://github.com/scrapy/scrapy/issues/3305) ， [issue 3210](https://github.com/scrapy/scrapy/issues/3210) ， [issue 3299](https://github.com/scrapy/scrapy/issues/3299) ） * `scrapy.http.cookies.CookieJar.clear` 接受“域”、“路徑”和“名稱”可選參數（ [issue 3231](https://github.com/scrapy/scrapy/issues/3231) ） * 附加文件包含在SDIST中（ [issue 3495](https://github.com/scrapy/scrapy/issues/3495) ； * 代碼樣式修復（ [issue 3405](https://github.com/scrapy/scrapy/issues/3405) ， [issue 3304](https://github.com/scrapy/scrapy/issues/3304) ； * 已刪除不需要的.strip（）調用（ [issue 3519](https://github.com/scrapy/scrapy/issues/3519) ； * collections.deque用于存儲MiddleWarManager方法，而不是列表（ [issue 3476](https://github.com/scrapy/scrapy/issues/3476) ） ## Scrapy 1.5.2（2019-01-22） * 安全修補程序: telnet控制臺擴展可以很容易地被發布內容到http://localhost:6023的流氓網站利用，我們還沒有找到從scrappy利用它的方法，但是很容易欺騙瀏覽器這樣做，并提高了本地開發環境的風險。修復程序向后不兼容, 默認情況下，它使用隨機生成的密碼啟用telnet用戶密碼驗證。如果您不能立即升級，請考慮設置 `TELNET_CONSOLE_PORT` 超出其默認值。見 [telnet console](topics/telnetconsole.html#topics-telnetconsole) 有關詳細信息的文檔 * 由于boto導入錯誤，gce環境下的backport ci build失敗。 ## Scrapy 1.5.1（2018-07-12）這是一個包含重要錯誤修復的維護版本，但沒有新功能： * `O(N^2)` 解決了影響python 3和pypy的gzip解壓問題（ [issue 3281](https://github.com/scrapy/scrapy/issues/3281) ； * 改進了對TLS驗證錯誤的跳過（ [issue 3166](https://github.com/scrapy/scrapy/issues/3166) ； * ctrl-c處理在python 3.5中是固定的+（ [issue 3096](https://github.com/scrapy/scrapy/issues/3096) ； * 測試修復 [issue 3092](https://github.com/scrapy/scrapy/issues/3092) ， [issue 3263](https://github.com/scrapy/scrapy/issues/3263) ； * 文檔改進（ [issue 3058](https://github.com/scrapy/scrapy/issues/3058) ， [issue 3059](https://github.com/scrapy/scrapy/issues/3059) ， [issue 3089](https://github.com/scrapy/scrapy/issues/3089) ， [issue 3123](https://github.com/scrapy/scrapy/issues/3123) ， [issue 3127](https://github.com/scrapy/scrapy/issues/3127) ， [issue 3189](https://github.com/scrapy/scrapy/issues/3189) ， [issue 3224](https://github.com/scrapy/scrapy/issues/3224) ， [issue 3280](https://github.com/scrapy/scrapy/issues/3280) ， [issue 3279](https://github.com/scrapy/scrapy/issues/3279) ， [issue 3201](https://github.com/scrapy/scrapy/issues/3201) ， [issue 3260](https://github.com/scrapy/scrapy/issues/3260) ， [issue 3284](https://github.com/scrapy/scrapy/issues/3284) ， [issue 3298](https://github.com/scrapy/scrapy/issues/3298) ， [issue 3294](https://github.com/scrapy/scrapy/issues/3294) ） ## Scrapy 1.5.0（2017-12-29）這個版本在代碼庫中帶來了一些新的小特性和改進。一些亮點： * 文件管道和ImageSpipeline支持Google云存儲。 * 隨著到代理的連接現在可以重用，使用代理服務器進行爬行變得更加高效。 * 對警告、異常和日志消息進行了改進，使調試更加容易。 * `scrapy parse` 命令現在允許通過 `--meta` 爭論。 * 與python 3.6、pypy和pypy3的兼容性得到了改進；通過在CI上運行測試，pypy和pypy3現在得到了官方支持。 * 更好地默認處理HTTP 308、522和524狀態代碼。 * 像往常一樣，文檔得到了改進。 ### 向后不兼容的更改 * Scrapy1.5放棄了對python 3.3的支持。 * 默認的scrapy用戶代理現在使用https鏈接到scrapy.org（ [issue 2983](https://github.com/scrapy/scrapy/issues/2983) ）這在技術上是向后不兼容的; 覆蓋 [`USER_AGENT`](topics/settings.html#std:setting-USER_AGENT) 如果你依賴舊的價值觀。 * 記錄被覆蓋的設置 `custom_settings` 是固定的； **this is technically backward-incompatible** 因為記錄器從 `[scrapy.utils.log]` 到 `[scrapy.crawler]` . 如果您正在分析垃圾日志，請更新日志分析器（ [issue 1343](https://github.com/scrapy/scrapy/issues/1343) ） * Linkextractor現在忽略 `m4v` 默認情況下，這是行為的更改。 * 522和524狀態代碼添加到 `RETRY_HTTP_CODES` （ [issue 2851](https://github.com/scrapy/scrapy/issues/2851) ） ### 新特點 * 支持 `<link>` 標簽在 `Response.follow` （ [issue 2785](https://github.com/scrapy/scrapy/issues/2785) ） * 支持 `ptpython` 雷普爾 [issue 2654](https://github.com/scrapy/scrapy/issues/2654) ） * Google云存儲支持文件管道和圖像管道（ [issue 2923](https://github.com/scrapy/scrapy/issues/2923) ） * 新的 `--meta` “scrapy parse”命令的選項允許傳遞附加請求。（ [issue 2883](https://github.com/scrapy/scrapy/issues/2883) ） * 使用時填充spider變量 `shell.inspect_response` （ [issue 2812](https://github.com/scrapy/scrapy/issues/2812) ） * 處理HTTP 308永久重定向（ [issue 2844](https://github.com/scrapy/scrapy/issues/2844) ） * 將522和524添加到 `RETRY_HTTP_CODES` （ [issue 2851](https://github.com/scrapy/scrapy/issues/2851) ） * 啟動時記錄版本信息（ [issue 2857](https://github.com/scrapy/scrapy/issues/2857) ） * `scrapy.mail.MailSender` 現在在python 3中工作（它需要Twisted17.9.0） * 重新使用與代理服務器的連接（ [issue 2743](https://github.com/scrapy/scrapy/issues/2743) ） * 為下載器中間件添加模板（ [issue 2755](https://github.com/scrapy/scrapy/issues/2755) ） * 未定義分析回調時NotImplementedError的顯式消息（ [issue 2831](https://github.com/scrapy/scrapy/issues/2831) ） * CrawlerProcess有一個選項可以禁用安裝根日志處理程序（ [issue 2921](https://github.com/scrapy/scrapy/issues/2921) ） * Linkextractor現在忽略 `m4v` 默認情況下的擴展 * 更好地記錄響應消息 [`DOWNLOAD_WARNSIZE`](topics/settings.html#std:setting-DOWNLOAD_WARNSIZE) 和 [`DOWNLOAD_MAXSIZE`](topics/settings.html#std:setting-DOWNLOAD_MAXSIZE) 限制（限制） [issue 2927](https://github.com/scrapy/scrapy/issues/2927) ） * 當URL被放入時顯示警告 `Spider.allowed_domains` 而不是域（ [issue 2250](https://github.com/scrapy/scrapy/issues/2250) ） ### 錯誤修復 * 修復由重寫的設置的日志記錄 `custom_settings` ； **this is technically backward-incompatible** 因為記錄器從 `[scrapy.utils.log]` 到 `[scrapy.crawler]` ，因此如果需要，請更新日志分析器（ [issue 1343](https://github.com/scrapy/scrapy/issues/1343) ） * 默認的scrapy用戶代理現在使用https鏈接到scrapy.org（ [issue 2983](https://github.com/scrapy/scrapy/issues/2983) ）這在技術上是向后不兼容的; 覆蓋 [`USER_AGENT`](topics/settings.html#std:setting-USER_AGENT) 如果你依賴舊的價值觀。 * 修復pypy和pypy3測試失敗，正式支持它們（ [issue 2793](https://github.com/scrapy/scrapy/issues/2793) ， [issue 2935](https://github.com/scrapy/scrapy/issues/2935) ， [issue 2990](https://github.com/scrapy/scrapy/issues/2990) ， [issue 3050](https://github.com/scrapy/scrapy/issues/3050) ， [issue 2213](https://github.com/scrapy/scrapy/issues/2213) ， [issue 3048](https://github.com/scrapy/scrapy/issues/3048) ） * 在下列情況下修復DNS解析程序 `DNSCACHE_ENABLED=False` （ [issue 2811](https://github.com/scrapy/scrapy/issues/2811) ） * 添加 `cryptography` Debian Jessie毒性試驗環境（ [issue 2848](https://github.com/scrapy/scrapy/issues/2848) ） * 添加驗證以檢查請求回調是否可調用（ [issue 2766](https://github.com/scrapy/scrapy/issues/2766) ） * 端口 `extras/qpsclient.py` 到Python 3（Python） [issue 2849](https://github.com/scrapy/scrapy/issues/2849) ） * 在python 3的場景下使用getfullargspec來停止取消預測警告（ [issue 2862](https://github.com/scrapy/scrapy/issues/2862) ） * 更新不推薦使用的測試別名（ [issue 2876](https://github.com/scrapy/scrapy/issues/2876) ） * 固定 `SitemapSpider` 支持備用鏈接（ [issue 2853](https://github.com/scrapy/scrapy/issues/2853) ） ### 文檔 * 為添加了缺少的項目符號點 `AUTOTHROTTLE_TARGET_CONCURRENCY` 設置。（ [issue 2756](https://github.com/scrapy/scrapy/issues/2756) ） * 更新貢獻文檔，記錄新的支持渠道（ [issue 2762](https://github.com/scrapy/scrapy/issues/2762) ，問題：“3038” * 在文檔中包含對Scrapy Subreddit的引用 * 修復斷開的鏈接；對外部鏈接使用https://（ [issue 2978](https://github.com/scrapy/scrapy/issues/2978) ， [issue 2982](https://github.com/scrapy/scrapy/issues/2982) ， [issue 2958](https://github.com/scrapy/scrapy/issues/2958) ） * 文檔CloseSpider擴展更好（ [issue 2759](https://github.com/scrapy/scrapy/issues/2759) ） * 使用 `pymongo.collection.Collection.insert_one()` 在MongoDB示例中（ [issue 2781](https://github.com/scrapy/scrapy/issues/2781) ） * 拼寫錯誤和打字錯誤（ [issue 2828](https://github.com/scrapy/scrapy/issues/2828) ， [issue 2837](https://github.com/scrapy/scrapy/issues/2837) ， [issue 2884](https://github.com/scrapy/scrapy/issues/2884) ， [issue 2924](https://github.com/scrapy/scrapy/issues/2924) ） * 澄清 `CSVFeedSpider.headers` 文件編制（ [issue 2826](https://github.com/scrapy/scrapy/issues/2826) ） * 文件 `DontCloseSpider` 例外和澄清 `spider_idle` （ [issue 2791](https://github.com/scrapy/scrapy/issues/2791) ） * 更新自述文件中的“releases”部分（ [issue 2764](https://github.com/scrapy/scrapy/issues/2764) ） * 修正RST語法 `DOWNLOAD_FAIL_ON_DATALOSS` 文檔庫 [issue 2763](https://github.com/scrapy/scrapy/issues/2763) ） * StartProject參數描述中的小修復（ [issue 2866](https://github.com/scrapy/scrapy/issues/2866) ） * 在response.body文檔中澄清數據類型（ [issue 2922](https://github.com/scrapy/scrapy/issues/2922) ） * 添加有關的注釋 `request.meta['depth']` 到DepthmIddleware文檔（ [issue 2374](https://github.com/scrapy/scrapy/issues/2374) ） * 添加有關的注釋 `request.meta['dont_merge_cookies']` CookiesMiddleware 文檔（ [issue 2999](https://github.com/scrapy/scrapy/issues/2999) ） * 最新的項目結構示例（ [issue 2964](https://github.com/scrapy/scrapy/issues/2964) ， [issue 2976](https://github.com/scrapy/scrapy/issues/2976) ） * itemexporters用法的更好示例（ [issue 2989](https://github.com/scrapy/scrapy/issues/2989) ） * 文件 `from_crawler` Spider 和下載者中間商的方法（ [issue 3019](https://github.com/scrapy/scrapy/issues/3019) ） ## Scrapy 1.4.0（2017-05-18） Scrapy1.4并沒有帶來那么多驚人的新功能，但還是有相當多的便利改進。 scrappy現在支持匿名ftp會話，通過新的 [`FTP_USER`](topics/settings.html#std:setting-FTP_USER) 和 [`FTP_PASSWORD`](topics/settings.html#std:setting-FTP_PASSWORD) 設置。如果您使用的是Twisted版本17.1.0或更高版本，那么ftp現在可用于python 3。有一個新的 [`response.follow`](topics/request-response.html#scrapy.http.TextResponse.follow "scrapy.http.TextResponse.follow") 創建請求的方法；現在，它是一種推薦的在“碎片 Spider ”中創建請求的方法。. 這種方法使得編寫正確的spider更加容易； `response.follow` 與創建 `scrapy.Request` 直接對象： * 它處理相關的URL； * 它可以在非utf8頁面上正確地使用非ASCII URL； * 除了絕對和相對URL之外，它還支持選擇器；用于 `<a>` 元素也可以提取它們的Href值。例如，而不是： ```py for href in response.css('li.page a::attr(href)').extract(): url = response.urljoin(href) yield scrapy.Request(url, self.parse, encoding=response.encoding) ``` 現在可以寫下： ```py for a in response.css('li.page a'): yield response.follow(a, self.parse) ``` 鏈接提取器也得到了改進。它們的工作方式類似于常規的現代瀏覽器：從屬性中刪除前導空格和尾隨空格（想想 `href="?? http://example.com"` ）建造時 `Link` 物體。這種空白剝離也發生在 `action` 屬性與 `FormRequest` . [**](#id1)請注意，鏈接提取器在默認情況下不再規范化URL。[**](#id3)這讓用戶不時感到困惑，實際上瀏覽器并不是這樣做的，因此我們刪除了對提取鏈接的額外轉換。對于那些想要更多控制 `Referer:` 當跟蹤鏈接時Scrapy發送的標題，您可以設置自己的 `Referrer Policy` . 在Scrapy 1.4之前，默認 `RefererMiddleware` 會簡單而盲目地將其設置為生成HTTP請求的響應的URL（這可能會泄漏URL種子的信息）。默認情況下，scrappy現在的行為與常規瀏覽器非常相似。這個策略完全可以用W3C標準值定制（或者如果你愿意的話，可以用你自己定制的值）。見 [`REFERRER_POLICY`](topics/spider-middleware.html#std:setting-REFERRER_POLICY) 有關詳細信息。為了使scrappyspider更容易調試，scrappy在1.4中默認記錄更多的統計信息：內存使用統計信息、詳細的重試統計信息、詳細的HTTP錯誤代碼統計信息。類似的變化是，HTTP緩存路徑現在也可以在日志中看到。最后但同樣重要的是，scrapy現在可以選擇使用新的 [`FEED_EXPORT_INDENT`](topics/feed-exports.html#std:setting-FEED_EXPORT_INDENT) 設置。享受！（或繼續閱讀此版本中的其他更改。） ### 折舊和向后不兼容的變更 * 默認為 `canonicalize=False` 在里面 `scrapy.linkextractors.LinkExtractor` （ [issue 2537](https://github.com/scrapy/scrapy/issues/2537) 修正 [issue 1941](https://github.com/scrapy/scrapy/issues/1941) 和 [issue 1982](https://github.com/scrapy/scrapy/issues/1982) ）：**警告，這是技術上向后不兼容的** * 默認情況下啟用memusage擴展（ [issue 2539](https://github.com/scrapy/scrapy/issues/2539) 修正 [issue 2187](https://github.com/scrapy/scrapy/issues/2187) ； **this is technically backward-incompatible** 因此，請檢查您是否有任何非違約行為 `MEMUSAGE_***` 選項集。 * `EDITOR` 環境變量現在優先于 `EDITOR` 在settings.py中定義的選項（ [issue 1829](https://github.com/scrapy/scrapy/issues/1829) ）； Scrapy 默認設置不再依賴于環境變量。從技術上講，這是一個前后不相容的變化. * `Spider.make_requests_from_url` 被貶低 [issue 1728](https://github.com/scrapy/scrapy/issues/1728) 修正 [issue 1495](https://github.com/scrapy/scrapy/issues/1495) ） ### 新特點 * 接受代理憑據 [`proxy`](topics/downloader-middleware.html#std:reqmeta-proxy) 請求元鍵（ [issue 2526](https://github.com/scrapy/scrapy/issues/2526) ） * 支持 [brotli](https://github.com/google/brotli)-compressed content; requires optional [brotlipy](https://github.com/python-hyper/brotlipy/) ([issue 2535](https://github.com/scrapy/scrapy/issues/2535)) * 新的 [response.follow](intro/tutorial.html#response-follow-example) 創建請求的快捷方式（ [issue 1940](https://github.com/scrapy/scrapy/issues/1940) ） * 補充 `flags` 參數和屬性 [`Request`](topics/request-response.html#scrapy.http.Request "scrapy.http.Request") 對象（ [issue 2047](https://github.com/scrapy/scrapy/issues/2047) ） * 支持匿名ftp（ [issue 2342](https://github.com/scrapy/scrapy/issues/2342) ） * 補充 `retry/count` ， `retry/max_reached` 和 `retry/reason_count/<reason>` 統計到 [`RetryMiddleware`](topics/downloader-middleware.html#scrapy.downloadermiddlewares.retry.RetryMiddleware "scrapy.downloadermiddlewares.retry.RetryMiddleware") （ [issue 2543](https://github.com/scrapy/scrapy/issues/2543) ） * 補充 `httperror/response_ignored_count` 和 `httperror/response_ignored_status_count/<status>` 統計到 [`HttpErrorMiddleware`](topics/spider-middleware.html#scrapy.spidermiddlewares.httperror.HttpErrorMiddleware "scrapy.spidermiddlewares.httperror.HttpErrorMiddleware") （ [issue 2566](https://github.com/scrapy/scrapy/issues/2566) ） * 可定制的 [`Referrer policy`](topics/spider-middleware.html#std:setting-REFERRER_POLICY) 在里面 [`RefererMiddleware`](topics/spider-middleware.html#scrapy.spidermiddlewares.referer.RefererMiddleware "scrapy.spidermiddlewares.referer.RefererMiddleware") （ [issue 2306](https://github.com/scrapy/scrapy/issues/2306) ） * 新的 `data:` URI下載處理程序（ [issue 2334](https://github.com/scrapy/scrapy/issues/2334) 修正 [issue 2156](https://github.com/scrapy/scrapy/issues/2156) ） * 使用HTTP緩存時的日志緩存目錄（ [issue 2611](https://github.com/scrapy/scrapy/issues/2611) 修正 [issue 2604](https://github.com/scrapy/scrapy/issues/2604) ） * 當項目包含重復的 Spider 名稱時警告用戶（修復 [issue 2181](https://github.com/scrapy/scrapy/issues/2181) ） * `CaselessDict` 現在接受 `Mapping` 實例而不僅僅是聽寫（ [issue 2646](https://github.com/scrapy/scrapy/issues/2646) ） * [Media downloads](topics/media-pipeline.html#topics-media-pipeline) 用 `FilesPipelines` 或 `ImagesPipelines` ，現在可以選擇使用新的 [`MEDIA_ALLOW_REDIRECTS`](topics/media-pipeline.html#std:setting-MEDIA_ALLOW_REDIRECTS) 設置（ [issue 2616](https://github.com/scrapy/scrapy/issues/2616) 修正 [issue 2004](https://github.com/scrapy/scrapy/issues/2004) ） * 接受來自使用新的 [`DOWNLOAD_FAIL_ON_DATALOSS`](topics/settings.html#std:setting-DOWNLOAD_FAIL_ON_DATALOSS) 設置（ [issue 2590](https://github.com/scrapy/scrapy/issues/2590) 修正 [issue 2586](https://github.com/scrapy/scrapy/issues/2586) ） * JSON和XML項的可選漂亮打印通過 [`FEED_EXPORT_INDENT`](topics/feed-exports.html#std:setting-FEED_EXPORT_INDENT) 設置（ [issue 2456](https://github.com/scrapy/scrapy/issues/2456) 修正 [issue 1327](https://github.com/scrapy/scrapy/issues/1327) ） * 允許刪除字段 `FormRequest.from_response` 格式數據 `None` 值已傳遞（ [issue 667](https://github.com/scrapy/scrapy/issues/667) ） * 每個請求使用新的 [`max_retry_times`](topics/request-response.html#std:reqmeta-max_retry_times) 元密鑰（元密鑰） [issue 2642](https://github.com/scrapy/scrapy/issues/2642) ） * `python -m scrapy` 作為更明確的替代方案 `scrapy` 命令（ [issue 2740](https://github.com/scrapy/scrapy/issues/2740) ） ### 錯誤修復 * Linkextractor現在從屬性中去掉前導空格和尾隨空格。（ [issue 2547](https://github.com/scrapy/scrapy/issues/2547) 修正 [issue 1614](https://github.com/scrapy/scrapy/issues/1614) ） * 在中正確處理action屬性中的空白 `FormRequest` （ [issue 2548](https://github.com/scrapy/scrapy/issues/2548) ） * 從代理服務器緩沖連接響應字節，直到收到所有HTTP頭（ [issue 2495](https://github.com/scrapy/scrapy/issues/2495) 修正 [issue 2491](https://github.com/scrapy/scrapy/issues/2491) ） * ftp下載器現在可以在python 3上工作，前提是使用twisted>=17.1（ [issue 2599](https://github.com/scrapy/scrapy/issues/2599) ） * 在解壓縮內容后使用body選擇響應類型（ [issue 2393](https://github.com/scrapy/scrapy/issues/2393) 修正 [issue 2145](https://github.com/scrapy/scrapy/issues/2145) ） * 總是解壓縮 `Content-Encoding: gzip` 在 [`HttpCompressionMiddleware`](topics/downloader-middleware.html#scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware "scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware") 階段（階段） [issue 2391](https://github.com/scrapy/scrapy/issues/2391) ） * 尊重自定義日志級別 `Spider.custom_settings` （ [issue 2581](https://github.com/scrapy/scrapy/issues/2581) 修正 [issue 1612](https://github.com/scrapy/scrapy/issues/1612) ） * MacOS的“make htmlview”修復程序（ [issue 2661](https://github.com/scrapy/scrapy/issues/2661) ） * 從命令列表中刪除“命令”（ [issue 2695](https://github.com/scrapy/scrapy/issues/2695) ） * 修復具有空正文的投遞請求的重復內容長度頭（ [issue 2677](https://github.com/scrapy/scrapy/issues/2677) ） * 適當地取消大量下載，如上面所述 [`DOWNLOAD_MAXSIZE`](topics/settings.html#std:setting-DOWNLOAD_MAXSIZE) （ [issue 1616](https://github.com/scrapy/scrapy/issues/1616) ） * ImageSpipeline：使用調色板固定處理透明PNG圖像（ [issue 2675](https://github.com/scrapy/scrapy/issues/2675) ） ### 清理和重構 * 測試：刪除臨時文件和文件夾（ [issue 2570](https://github.com/scrapy/scrapy/issues/2570) ）修復了OS X上的projectutilstest（ [issue 2569](https://github.com/scrapy/scrapy/issues/2569) ）在Travis CI上使用Linux的便攜式pypy（ [issue 2710](https://github.com/scrapy/scrapy/issues/2710) ） * 獨立建筑請求 `_requests_to_follow` 爬行 Spider （ [issue 2562](https://github.com/scrapy/scrapy/issues/2562) ） * 刪除“python 3 progress”徽章（ [issue 2567](https://github.com/scrapy/scrapy/issues/2567) ） * 再添加幾行到 `.gitignore` （ [issue 2557](https://github.com/scrapy/scrapy/issues/2557) ） * 刪除BumpVersion預發布配置（ [issue 2159](https://github.com/scrapy/scrapy/issues/2159) ） * 添加codecov.yml文件（ [issue 2750](https://github.com/scrapy/scrapy/issues/2750) ） * 基于扭曲版本設置上下文工廠實現（ [issue 2577](https://github.com/scrapy/scrapy/issues/2577) 修正 [issue 2560](https://github.com/scrapy/scrapy/issues/2560) ） * 添加省略 `self` 默認項目中間件模板中的參數（ [issue 2595](https://github.com/scrapy/scrapy/issues/2595) ） * 刪除冗余 `slot.add_request()` 調用ExecutionEngine（ [issue 2617](https://github.com/scrapy/scrapy/issues/2617) ） * 捕捉更具體的 `os.error` 例外 `FSFilesStore` （ [issue 2644](https://github.com/scrapy/scrapy/issues/2644) ） * 更改“localhost”測試服務器證書（ [issue 2720](https://github.com/scrapy/scrapy/issues/2720) ） * 移除未使用的 `MEMUSAGE_REPORT` 設置（ [issue 2576](https://github.com/scrapy/scrapy/issues/2576) ） ### 文檔 * 導出程序需要二進制模式（ [issue 2564](https://github.com/scrapy/scrapy/issues/2564) 修正 [issue 2553](https://github.com/scrapy/scrapy/issues/2553) ） * 提及問題 [`FormRequest.from_response`](topics/request-response.html#scrapy.http.FormRequest.from_response "scrapy.http.FormRequest.from_response") 由于lxml中的錯誤（ [issue 2572](https://github.com/scrapy/scrapy/issues/2572) ） * 在模板中統一使用單引號（ [issue 2596](https://github.com/scrapy/scrapy/issues/2596) ） * 文件 `ftp_user` 和 `ftp_password` 元密鑰（元密鑰） [issue 2587](https://github.com/scrapy/scrapy/issues/2587) ） * 已刪除上的節，已棄用 `contrib/` （ [issue 2636](https://github.com/scrapy/scrapy/issues/2636) ） * 在窗戶上安裝 Scrapy 時建議使用水蟒（ [issue 2477](https://github.com/scrapy/scrapy/issues/2477) 修正 [issue 2475](https://github.com/scrapy/scrapy/issues/2475) ） * 常見問題解答：在Windows上重寫關于python 3支持的說明（ [issue 2690](https://github.com/scrapy/scrapy/issues/2690) ） * 重新排列選擇器節（ [issue 2705](https://github.com/scrapy/scrapy/issues/2705) ） * 去除 `__nonzero__` 從 `SelectorList` 文檔庫 [issue 2683](https://github.com/scrapy/scrapy/issues/2683) ） * 在文檔中說明如何禁用請求篩選 [`DUPEFILTER_CLASS`](topics/settings.html#std:setting-DUPEFILTER_CLASS) 設置（ [issue 2714](https://github.com/scrapy/scrapy/issues/2714) ） * 在文檔設置自述文件中添加sphinx_rtd_主題（ [issue 2668](https://github.com/scrapy/scrapy/issues/2668) ） * 在json item writer示例中以文本模式打開文件（ [issue 2729](https://github.com/scrapy/scrapy/issues/2729) ） * 澄清 `allowed_domains` 實例（例） [issue 2670](https://github.com/scrapy/scrapy/issues/2670) ） ## Scrapy 1.3.3（2017-03-10） ### 錯誤修復 * 制作 `SpiderLoader` 提升 `ImportError` 對于缺少依賴項和錯誤 [`SPIDER_MODULES`](topics/settings.html#std:setting-SPIDER_MODULES) . 從1.3.0開始，這些例外被作為警告而沉默。引入新的設置，以便在警告或異常（如果需要）之間切換；請參見 [`SPIDER_LOADER_WARN_ONLY`](topics/settings.html#std:setting-SPIDER_LOADER_WARN_ONLY) 有關詳細信息。 ## Scrapy 1.3.2（2017-02-13） ### 錯誤修復 * 在轉換為/從dicts（utils.reqser）時保留請求類（ [issue 2510](https://github.com/scrapy/scrapy/issues/2510) ） * 在教程中為作者字段使用一致的選擇器（ [issue 2551](https://github.com/scrapy/scrapy/issues/2551) ） * 在Twisted 17中修復TLS兼容性+（ [issue 2558](https://github.com/scrapy/scrapy/issues/2558) ） ## Scrapy 1.3.1（2017-02-08） ### 新特點 * 支持 `'True'` 和 `'False'` 布爾值設置的字符串值（ [issue 2519](https://github.com/scrapy/scrapy/issues/2519) ）你現在可以做 `scrapy crawl myspider -s REDIRECT_ENABLED=False` . * 支持Kwargs `response.xpath()` 使用 [XPath variables](topics/selectors.html#topics-selectors-xpath-variables) 和特殊名稱空間聲明；這至少需要Parselv1.1（ [issue 2457](https://github.com/scrapy/scrapy/issues/2457) ） * 添加對python 3.6的支持（ [issue 2485](https://github.com/scrapy/scrapy/issues/2485) ） * 在pypy上運行測試（警告：某些測試仍然失敗，因此pypy尚不受支持）。 ### 錯誤修復 * 強制執行 `DNS_TIMEOUT` 設置（ [issue 2496](https://github.com/scrapy/scrapy/issues/2496) ） * 固定 [`view`](topics/commands.html#std:command-view) 命令；這是v1.3.0中的回歸（ [issue 2503](https://github.com/scrapy/scrapy/issues/2503) ） * 修復有關的測試 `*_EXPIRES settings` 帶有文件/圖像管道（ [issue 2460](https://github.com/scrapy/scrapy/issues/2460) ） * 使用基本項目模板時，修復生成的管道類的名稱（ [issue 2466](https://github.com/scrapy/scrapy/issues/2466) ） * 用扭曲17固定相容性+（ [issue 2496](https://github.com/scrapy/scrapy/issues/2496) ， [issue 2528](https://github.com/scrapy/scrapy/issues/2528) ） * 固定 `scrapy.Item` python 3.6上的繼承（ [issue 2511](https://github.com/scrapy/scrapy/issues/2511) ） * 按順序強制組件的數值 `SPIDER_MIDDLEWARES` ， `DOWNLOADER_MIDDLEWARES` ， `EXTENIONS` 和 `SPIDER_CONTRACTS` （ [issue 2420](https://github.com/scrapy/scrapy/issues/2420) ） ### 文檔 * 修改了Coduct部分的代碼并升級到Contributor Covenant v1.4（ [issue 2469](https://github.com/scrapy/scrapy/issues/2469) ） * 澄清傳遞spider參數會將其轉換為spider屬性（ [issue 2483](https://github.com/scrapy/scrapy/issues/2483) ） * 文件 `formid` 爭論 `FormRequest.from_response()` （ [issue 2497](https://github.com/scrapy/scrapy/issues/2497) ） * 向自述文件添加.rst擴展名（ [issue 2507](https://github.com/scrapy/scrapy/issues/2507) ） * 提到級別數據庫緩存存儲后端（ [issue 2525](https://github.com/scrapy/scrapy/issues/2525) ） * 使用 `yield` 在示例回調代碼中（ [issue 2533](https://github.com/scrapy/scrapy/issues/2533) ） * 添加有關HTML實體解碼的說明 `.re()/.re_first()` （ [issue 1704](https://github.com/scrapy/scrapy/issues/1704) ） * 打字錯誤 [issue 2512](https://github.com/scrapy/scrapy/issues/2512) ， [issue 2534](https://github.com/scrapy/scrapy/issues/2534) ， [issue 2531](https://github.com/scrapy/scrapy/issues/2531) ） ### 清除 * 拆下減速器簽入 `MetaRefreshMiddleware` （ [issue 2542](https://github.com/scrapy/scrapy/issues/2542) ） * 更快的入住 `LinkExtractor` 允許/拒絕模式（ [issue 2538](https://github.com/scrapy/scrapy/issues/2538) ） * 刪除支持舊扭曲版本的死碼（ [issue 2544](https://github.com/scrapy/scrapy/issues/2544) ） ## Scrapy 1.3.0（2016-12-21）這個版本出現在1.2.2之后不久，主要原因之一是：發現從0.18到1.2.2（包括）的版本使用了一些來自Twisted的反向端口代碼（ `scrapy.xlib.tx.*` ，即使有新的扭曲模塊可用。現在使用的 `twisted.web.client` 和 `twisted.internet.endpoints` 直接。（另請參見下面的清理。）由于這是一個重大的變化，我們希望在不破壞任何使用1.2系列的項目的情況下，快速修復bug。 ### 新特點 * `MailSender` 現在接受單個字符串作為 `to` 和 `cc` 爭論（ [issue 2272](https://github.com/scrapy/scrapy/issues/2272) ） * `scrapy fetch url` ， `scrapy shell url` 和 `fetch(url)` 在scrapy shell內部，現在默認遵循HTTP重定向（ [issue 2290](https://github.com/scrapy/scrapy/issues/2290) 見 [`fetch`](topics/commands.html#std:command-fetch) 和 [`shell`](topics/commands.html#std:command-shell) 有關詳細信息。 * `HttpErrorMiddleware` 現在記錄錯誤 `INFO` 級別而不是 `DEBUG` ；從技術上講 **backward incompatible** 所以請檢查您的日志分析器。 * 默認情況下，記錄器名稱現在使用長格式路徑，例如 `[scrapy.extensions.logstats]` 而不是先前版本（例如 `[scrapy]` 這是 **backward incompatible** 如果日志解析器需要短的logger name部分。您可以使用 [`LOG_SHORT_NAMES`](topics/settings.html#std:setting-LOG_SHORT_NAMES) 設置為 `True` . ### 依賴關系和清理 * scrappy現在需要twisted>=13.1，這已經是許多Linux發行版的情況了。 * 結果，我們擺脫了 `scrapy.xlib.tx.*` 模塊，它復制了一些扭曲的代碼，供用戶使用“舊”的扭曲版本 * `ChunkedTransferMiddleware` 已棄用并從默認的下載器中間軟件中刪除。 ## Scrapy 1.2.3（2017-03-03） * 打包修復：在setup.py中不允許不支持的扭曲版本 ## Scrapy 1.2.2（2016-12-06） ### 錯誤修復 * 修復管道上發生故障時的神秘回溯 `open_spider()` （ [issue 2011](https://github.com/scrapy/scrapy/issues/2011) ） * 修復嵌入的ipythonShell變量（修復 [issue 396](https://github.com/scrapy/scrapy/issues/396) 重新出現在1.2.0中，固定在 [issue 2418](https://github.com/scrapy/scrapy/issues/2418) ） * 處理robots.txt時的幾個補丁： * 處理（非標準）相對站點地圖URL（ [issue 2390](https://github.com/scrapy/scrapy/issues/2390) ） * 在python 2中處理非ASCII URL和用戶代理（ [issue 2373](https://github.com/scrapy/scrapy/issues/2373) ） ### 文檔 * 文件 `"download_latency"` 鍵入 `Request` 的 `meta` DICT [issue 2033](https://github.com/scrapy/scrapy/issues/2033) ） * 從目錄中刪除Ubuntu包上的頁面（已棄用且不受支持）（ [issue 2335](https://github.com/scrapy/scrapy/issues/2335) ） * 一些固定的打字錯誤（ [issue 2346](https://github.com/scrapy/scrapy/issues/2346) ， [issue 2369](https://github.com/scrapy/scrapy/issues/2369) ， [issue 2369](https://github.com/scrapy/scrapy/issues/2369) ， [issue 2380](https://github.com/scrapy/scrapy/issues/2380) ）和澄清（ [issue 2354](https://github.com/scrapy/scrapy/issues/2354) ， [issue 2325](https://github.com/scrapy/scrapy/issues/2325) ， [issue 2414](https://github.com/scrapy/scrapy/issues/2414) ） ### 其他變化 * 登廣告 [conda-forge](https://anaconda.org/conda-forge/scrapy) as Scrapy's official conda channel ([issue 2387](https://github.com/scrapy/scrapy/issues/2387)) * 嘗試使用時出現更多有用的錯誤消息 `.css()` 或 `.xpath()` 關于非文本響應（ [issue 2264](https://github.com/scrapy/scrapy/issues/2264) ） * `startproject` 命令現在生成一個示例 `middlewares.py` 文件（文件） [issue 2335](https://github.com/scrapy/scrapy/issues/2335) ） * 在中添加更多依賴項的版本信息 `scrapy version` 詳細輸出（ [issue 2404](https://github.com/scrapy/scrapy/issues/2404) ） * 全部刪除 `*.pyc` 源分發中的文件（ [issue 2386](https://github.com/scrapy/scrapy/issues/2386) ） ## Scrapy 1.2.1（2016-10-21） ### 錯誤修復 * 在建立TLS/SSL連接時包括OpenSSL更為允許的默認密碼（ [issue 2314](https://github.com/scrapy/scrapy/issues/2314) ） * 修復非ASCII URL重定向上的“位置”HTTP頭解碼（ [issue 2321](https://github.com/scrapy/scrapy/issues/2321) ） ### 文檔 * 修復jsonWriterPipeline示例（ [issue 2302](https://github.com/scrapy/scrapy/issues/2302) ） * 各種注釋： [issue 2330](https://github.com/scrapy/scrapy/issues/2330) 關于 Spider 的名字， [issue 2329](https://github.com/scrapy/scrapy/issues/2329) 在中間件方法處理順序上， [issue 2327](https://github.com/scrapy/scrapy/issues/2327) 以列表形式獲取多值HTTP頭。 ### 其他變化 * 遠離的 `www.` 從 `start_urls` 內置 Spider 模板（ [issue 2299](https://github.com/scrapy/scrapy/issues/2299) ） ## Scrapy 1.2.0（2016-10-03） ### 新特點 * 新的 [`FEED_EXPORT_ENCODING`](topics/feed-exports.html#std:setting-FEED_EXPORT_ENCODING) 用于自定義將項寫入文件時使用的編碼的設置。可用于關閉 `\uXXXX` 在JSON輸出中進行轉義。這對于那些希望XML或CSV輸出使用UTF-8以外的東西的人也很有用。（ [issue 2034](https://github.com/scrapy/scrapy/issues/2034) ） * `startproject` 命令現在支持一個可選的目標目錄，以根據項目名稱覆蓋默認目錄。（ [issue 2005](https://github.com/scrapy/scrapy/issues/2005) ） * 新的 [`SCHEDULER_DEBUG`](topics/settings.html#std:setting-SCHEDULER_DEBUG) 設置為日志請求序列化失敗（ [issue 1610](https://github.com/scrapy/scrapy/issues/1610) ） * JSON編碼器現在支持序列化 `set` 實例（實例） [issue 2058](https://github.com/scrapy/scrapy/issues/2058) ） * 解讀 `application/json-amazonui-streaming` 作為 `TextResponse` （ [issue 1503](https://github.com/scrapy/scrapy/issues/1503) ） * `scrapy` 在使用shell工具時默認導入（ [`shell`](topics/commands.html#std:command-shell) ， [inspect_response](topics/shell.html#topics-shell-inspect-response) （ [issue 2248](https://github.com/scrapy/scrapy/issues/2248) ） ### 錯誤修復 * defaultrequestheaders中間件現在在useragent中間件之前運行（ [issue 2088](https://github.com/scrapy/scrapy/issues/2088) ）警告：這在技術上是向后不兼容的, 盡管我們認為這是錯誤修復。 * HTTP緩存擴展和使用 `.scrapy` 數據目錄現在在項目外部工作（ [issue 1581](https://github.com/scrapy/scrapy/issues/1581) ）警告：這在技術上是向后不兼容的, 盡管我們認為這是錯誤修復。 * `Selector` 不允許兩者同時通過 `response` 和 `text` 不再（ [issue 2153](https://github.com/scrapy/scrapy/issues/2153) ） * 修復了錯誤回調名稱的日志記錄 `scrapy parse` （ [issue 2169](https://github.com/scrapy/scrapy/issues/2169) ） * 修復一個奇怪的gzip解壓錯誤（ [issue 1606](https://github.com/scrapy/scrapy/issues/1606) ） * 使用時修復所選回調 `CrawlSpider` 具有 [`scrapy parse`](topics/commands.html#std:command-parse) （ [issue 2225](https://github.com/scrapy/scrapy/issues/2225) ） * 修復 Spider 不生成任何項時的無效JSON和XML文件（ [issue 872](https://github.com/scrapy/scrapy/issues/872) ） * 實施 `flush()` FPR `StreamLogger` 避免日志中出現警告（ [issue 2125](https://github.com/scrapy/scrapy/issues/2125) ） ### 重構 * `canonicalize_url` 已移至 [w3lib.url](https://w3lib.readthedocs.io/en/latest/w3lib.html#w3lib.url.canonicalize_url) ([issue 2168](https://github.com/scrapy/scrapy/issues/2168)) . ### 測試和要求 Scrapy的新需求基線是Debian8“Jessie”。它以前是Ubuntu12.04精確版。實際上，這意味著我們至少要用這些（主要）包版本運行連續集成測試：twisted 14.0、pyopenssl 0.14、lxml 3.4。 Scrapy可以很好地處理這些包的舊版本（例如，代碼庫中仍然有用于舊的扭曲版本的開關），但不能保證（因為它不再被測試）。 ### 文檔 * 語法修正： [issue 2128](https://github.com/scrapy/scrapy/issues/2128) ， [issue 1566](https://github.com/scrapy/scrapy/issues/1566) . * 從自述文件中刪除“下載狀態”徽章（ [issue 2160](https://github.com/scrapy/scrapy/issues/2160) ） * 新污點 [architecture diagram](topics/architecture.html#topics-architecture) （ [issue 2165](https://github.com/scrapy/scrapy/issues/2165) ） * 更新的 `Response` 參數文檔（ [issue 2197](https://github.com/scrapy/scrapy/issues/2197) ） * 改寫誤導 [`RANDOMIZE_DOWNLOAD_DELAY`](topics/settings.html#std:setting-RANDOMIZE_DOWNLOAD_DELAY) 描述（ [issue 2190](https://github.com/scrapy/scrapy/issues/2190) ） * 添加stackoverflow作為支持通道（ [issue 2257](https://github.com/scrapy/scrapy/issues/2257) ） ## Scrapy 1.1.4（2017-03-03） * 打包修復：在setup.py中不允許不支持的扭曲版本 ## Scrapy 1.1.3（2016-09-22） ### 錯誤修復 * 子類的類屬性 `ImagesPipeline` 和 `FilesPipeline` 像1.1.1之前那樣工作（ [issue 2243](https://github.com/scrapy/scrapy/issues/2243) 修正 [issue 2198](https://github.com/scrapy/scrapy/issues/2198) ） ### 文檔 * [Overview](intro/overview.html#intro-overview) 和 [tutorial](intro/tutorial.html#intro-tutorial) 重寫以使用http://toscrape.com網站（ [issue 2236](https://github.com/scrapy/scrapy/issues/2236) ， [issue 2249](https://github.com/scrapy/scrapy/issues/2249) ， [issue 2252](https://github.com/scrapy/scrapy/issues/2252) ） ## Scrapy 1.1.2（2016-08-18） ### 錯誤修復 * 介紹一個失蹤者 [`IMAGES_STORE_S3_ACL`](topics/media-pipeline.html#std:setting-IMAGES_STORE_S3_ACL) 覆蓋中默認ACL策略的設置 `ImagesPipeline` 將圖像上載到S3時（請注意，默認的ACL策略是“private”--而不是“public read”--因為scrapy 1.1.0） * [`IMAGES_EXPIRES`](topics/media-pipeline.html#std:setting-IMAGES_EXPIRES) 默認值設回90（回歸在1.1.1中引入） ## Scrapy 1.1.1（2016-07-13） ### 錯誤修復 * 在連接請求到HTTPS代理中添加“主機”頭（ [issue 2069](https://github.com/scrapy/scrapy/issues/2069) ） * 使用響應 `body` 選擇響應類時（ [issue 2001](https://github.com/scrapy/scrapy/issues/2001) 修正 [issue 2000](https://github.com/scrapy/scrapy/issues/2000) ） * 使用錯誤的netloc規范化URL時不要失敗（ [issue 2038](https://github.com/scrapy/scrapy/issues/2038) 修正 [issue 2010](https://github.com/scrapy/scrapy/issues/2010) ） * 一些修正 `HttpCompressionMiddleware` （和 `SitemapSpider` ）： * 不解碼磁頭響應（ [issue 2008](https://github.com/scrapy/scrapy/issues/2008) 修正 [issue 1899](https://github.com/scrapy/scrapy/issues/1899) ） * gzip內容類型頭中的句柄charset參數（ [issue 2050](https://github.com/scrapy/scrapy/issues/2050) 修正 [issue 2049](https://github.com/scrapy/scrapy/issues/2049) ） * 不解壓縮gzip八進制流響應（ [issue 2065](https://github.com/scrapy/scrapy/issues/2065) 修正 [issue 2063](https://github.com/scrapy/scrapy/issues/2063) ） * 根據IP地址主機驗證證書時捕獲（并忽略并發出警告）異常（ [issue 2094](https://github.com/scrapy/scrapy/issues/2094) 修正 [issue 2092](https://github.com/scrapy/scrapy/issues/2092) ） * 制作 `FilesPipeline` 和 `ImagesPipeline` 關于使用遺留類屬性進行自定義的向后兼容（ [issue 1989](https://github.com/scrapy/scrapy/issues/1989) 修正 [issue 1985](https://github.com/scrapy/scrapy/issues/1985) ） ### 新特點 * 在項目文件夾外啟用genspider命令（ [issue 2052](https://github.com/scrapy/scrapy/issues/2052) ） * 重試HTTPS連接 `TunnelError` 默認情況下（ [issue 1974](https://github.com/scrapy/scrapy/issues/1974) ） ### 文檔 * `FEED_TEMPDIR` 設置在詞典編纂位置（ [commit 9b3c72c](https://github.com/scrapy/scrapy/commit/9b3c72c) ） * 習慣用法 `.extract_first()` 概覽（綜述） [issue 1994](https://github.com/scrapy/scrapy/issues/1994) ） * 版權公告中的更新年份（ [commit c2c8036](https://github.com/scrapy/scrapy/commit/c2c8036) ） * 添加有關錯誤回復的信息和示例（ [issue 1995](https://github.com/scrapy/scrapy/issues/1995) ） * 在下載器中間件示例中使用“url”變量（ [issue 2015](https://github.com/scrapy/scrapy/issues/2015) ） * 語法修正（英文） [issue 2054](https://github.com/scrapy/scrapy/issues/2054) ， [issue 2120](https://github.com/scrapy/scrapy/issues/2120) ） * 在 Spider 回調中使用美麗湯的新常見問題解答條目（ [issue 2048](https://github.com/scrapy/scrapy/issues/2048) ） * 添加有關scrapy不在使用python 3的窗口上工作的注釋（ [issue 2060](https://github.com/scrapy/scrapy/issues/2060) ） * 在拉取請求中鼓勵完整的標題（ [issue 2026](https://github.com/scrapy/scrapy/issues/2026) ） ### 測驗 * 將travis ci和pin py test cov的py.test要求升級至2.2.1（ [issue 2095](https://github.com/scrapy/scrapy/issues/2095) ） ## Scrapy 1.1.0（2016-05-11）這個1.1版本帶來了許多有趣的特性和錯誤修復： * scrapy 1.1支持beta python 3（需要twisted>=15.5）。見 [beta python 3支持](#news-betapy3) 更多細節和一些限制。 * 熱門新功能： * 項目加載器現在支持嵌套加載器（ [issue 1467](https://github.com/scrapy/scrapy/issues/1467) ） * `FormRequest.from_response` 改進（ [issue 1382](https://github.com/scrapy/scrapy/issues/1382) ， [issue 1137](https://github.com/scrapy/scrapy/issues/1137) ） * 附加設置 [`AUTOTHROTTLE_TARGET_CONCURRENCY`](topics/autothrottle.html#std:setting-AUTOTHROTTLE_TARGET_CONCURRENCY) 和改進的 AutoThrottle 文檔（ [issue 1324](https://github.com/scrapy/scrapy/issues/1324) ） * 補充 `response.text` 以Unicode形式獲取正文（ [issue 1730](https://github.com/scrapy/scrapy/issues/1730) ） * 匿名S3連接（ [issue 1358](https://github.com/scrapy/scrapy/issues/1358) ） * 下載器中間件中的延遲（ [issue 1473](https://github.com/scrapy/scrapy/issues/1473) ）這樣可以更好地處理robots.txt（ [issue 1471](https://github.com/scrapy/scrapy/issues/1471) ） * HTTP緩存現在更接近于RFC2616，增加了設置 [`HTTPCACHE_ALWAYS_STORE`](topics/downloader-middleware.html#std:setting-HTTPCACHE_ALWAYS_STORE) 和 [`HTTPCACHE_IGNORE_RESPONSE_CACHE_CONTROLS`](topics/downloader-middleware.html#std:setting-HTTPCACHE_IGNORE_RESPONSE_CACHE_CONTROLS) （ [issue 1151](https://github.com/scrapy/scrapy/issues/1151) ） * 選擇器被提取到Parsel_u庫（ [issue 1409](https://github.com/scrapy/scrapy/issues/1409) ）這意味著您可以使用沒有scrapy的scrapy選擇器，也可以在不需要升級scrapy的情況下升級選擇器引擎。 * 現在，HTTPS下載器默認情況下執行TLS協議協商，而不是強制使用TLS 1.0。您還可以使用新的 [`DOWNLOADER_CLIENT_TLS_METHOD`](topics/settings.html#std:setting-DOWNLOADER_CLIENT_TLS_METHOD) . * 這些錯誤修復可能需要您注意： * 默認情況下不重試錯誤請求（HTTP 400）（ [issue 1289](https://github.com/scrapy/scrapy/issues/1289) ）如果您需要舊的行為，請添加 `400` 到 [`RETRY_HTTP_CODES`](topics/downloader-middleware.html#std:setting-RETRY_HTTP_CODES) . * 修復shell文件參數處理（ [issue 1710](https://github.com/scrapy/scrapy/issues/1710) ， [issue 1550](https://github.com/scrapy/scrapy/issues/1550) ）如果你嘗試 `scrapy shell index.html` 它將嘗試加載URL [http://index.html](http://index.html)，使用 `scrapy shell ./index.html` 加載本地文件。 * 現在，默認情況下，已為新創建的項目啟用robots.txt遵從性（ [issue 1724](https://github.com/scrapy/scrapy/issues/1724) ）Scrapy還將等待robots.txt下載，然后再繼續爬行。（ [issue 1735](https://github.com/scrapy/scrapy/issues/1735) ）如果要禁用此行為，請更新 [`ROBOTSTXT_OBEY`](topics/settings.html#std:setting-ROBOTSTXT_OBEY) 在里面 `settings.py` 創建新項目后的文件。 * 導出程序現在使用Unicode，而不是默認的字節。（ [issue 1080](https://github.com/scrapy/scrapy/issues/1080) ）如果你使用 `PythonItemExporter` ，您可能希望更新代碼以禁用二進制模式，但現在已棄用該模式。 * 接受包含點的XML節點名為有效（ [issue 1533](https://github.com/scrapy/scrapy/issues/1533) ） * 將文件或圖像上載到S3時（使用 `FilesPipeline` 或 `ImagesPipeline` ）默認的acl策略現在是“private”而不是“public”[**](#id1)警告：向后不兼容！[**](#id3)你可以使用 [`FILES_STORE_S3_ACL`](topics/media-pipeline.html#std:setting-FILES_STORE_S3_ACL) 改變它。 * 我們重新實施了 `canonicalize_url()` 以獲得更正確的輸出，特別是對于具有非ASCII字符的URL（ [issue 1947](https://github.com/scrapy/scrapy/issues/1947) ）這可能會更改鏈接提取程序的輸出，與以前的碎片版本相比。這也可能會使運行1.1之前的部分緩存項失效。**警告：向后不兼容！** 繼續閱讀以獲取有關其他改進和錯誤修復的更多詳細信息。 ### beta python 3支持我們一直在 [hard at work to make Scrapy run on Python 3](https://github.com/scrapy/scrapy/wiki/Python-3-Porting) . 因此，現在您可以在python 3.3、3.4和3.5上運行spider（twisted>=15.5必需）。有些功能仍然缺失（有些功能可能永遠無法移植）。幾乎所有內置擴展/中間產品都可以工作。但是，我們知道Python3中的一些限制： * Scrapy不適用于使用python 3的Windows * 不支持發送電子郵件 * 不支持ftp下載處理程序 * 不支持telnet控制臺 ### 其他新功能和增強功能 * Scrapy現在有一個 [Code of Conduct](https://github.com/scrapy/scrapy/blob/master/CODE_OF_CONDUCT.md) ([issue 1681](https://github.com/scrapy/scrapy/issues/1681)) . * 命令行工具現在已經完成了zsh（ [issue 934](https://github.com/scrapy/scrapy/issues/934) ） * 改進 `scrapy shell` ： * 支持bpython并通過 `SCRAPY_PYTHON_SHELL` （ [issue 1100](https://github.com/scrapy/scrapy/issues/1100) ， [issue 1444](https://github.com/scrapy/scrapy/issues/1444) ） * 支持沒有方案的URL（ [issue 1498](https://github.com/scrapy/scrapy/issues/1498) ）**警告：向后不兼容！** * 恢復對相對文件路徑的支持（ [issue 1710](https://github.com/scrapy/scrapy/issues/1710) ， [issue 1550](https://github.com/scrapy/scrapy/issues/1550) ） * 補充 [`MEMUSAGE_CHECK_INTERVAL_SECONDS`](topics/settings.html#std:setting-MEMUSAGE_CHECK_INTERVAL_SECONDS) 更改默認檢查間隔的設置（ [issue 1282](https://github.com/scrapy/scrapy/issues/1282) ） * 下載處理程序現在使用其方案在第一個請求上延遲加載（ [issue 1390](https://github.com/scrapy/scrapy/issues/1390) ， [issue 1421](https://github.com/scrapy/scrapy/issues/1421) ） * HTTPS下載處理程序不再強制TLS 1.0；相反，OpenSSL的 `SSLv23_method()/TLS_method()` 用于允許嘗試與遠程主機協商其可以達到的最高TLS協議版本（ [issue 1794](https://github.com/scrapy/scrapy/issues/1794) ， [issue 1629](https://github.com/scrapy/scrapy/issues/1629) ） * `RedirectMiddleware` 現在跳過狀態代碼 `handle_httpstatus_list` Spider 屬性或 `Request` 的 `meta` 密鑰（密鑰） [issue 1334](https://github.com/scrapy/scrapy/issues/1334) ， [issue 1364](https://github.com/scrapy/scrapy/issues/1364) ， [issue 1447](https://github.com/scrapy/scrapy/issues/1447) ） * 表格提交： * 現在工作 `<button>` 元素也一樣（ [issue 1469](https://github.com/scrapy/scrapy/issues/1469) ） * 空字符串現在用于沒有值的提交按鈕（ [issue 1472](https://github.com/scrapy/scrapy/issues/1472) ） * 類似dict的設置現在具有每個鍵的優先級（ [issue 1135](https://github.com/scrapy/scrapy/issues/1135) ， [issue 1149](https://github.com/scrapy/scrapy/issues/1149) 和 [issue 1586](https://github.com/scrapy/scrapy/issues/1586) ） * 發送非ASCII電子郵件（ [issue 1662](https://github.com/scrapy/scrapy/issues/1662) ） * `CloseSpider` 和 `SpiderState` 如果沒有設置相關設置，擴展現在將被禁用。（ [issue 1723](https://github.com/scrapy/scrapy/issues/1723) ， [issue 1725](https://github.com/scrapy/scrapy/issues/1725) ） * 添加的方法 `ExecutionEngine.close` （ [issue 1423](https://github.com/scrapy/scrapy/issues/1423) ） * 添加的方法 `CrawlerRunner.create_crawler` （ [issue 1528](https://github.com/scrapy/scrapy/issues/1528) ） * 調度程序優先級隊列現在可以通過 [`SCHEDULER_PRIORITY_QUEUE`](topics/settings.html#std:setting-SCHEDULER_PRIORITY_QUEUE) （ [issue 1822](https://github.com/scrapy/scrapy/issues/1822) ） * `.pps` 默認情況下，鏈接提取器中的鏈接現在被忽略。（ [issue 1835](https://github.com/scrapy/scrapy/issues/1835) ） * 可以使用新的 [`FEED_TEMPDIR`](topics/settings.html#std:setting-FEED_TEMPDIR) 設置（ [issue 1847](https://github.com/scrapy/scrapy/issues/1847) ） * `FilesPipeline` 和 `ImagesPipeline` 設置現在是實例屬性而不是類屬性，啟用特定于 Spider 的行為（ [issue 1891](https://github.com/scrapy/scrapy/issues/1891) ） * `JsonItemExporter` 現在，在自己的行（輸出文件的第一行和最后一行）上設置打開和關閉方括號的格式（ [issue 1950](https://github.com/scrapy/scrapy/issues/1950) ） * 如果可用， `botocore` 用于 `S3FeedStorage` ， `S3DownloadHandler` 和 `S3FilesStore` （ [issue 1761](https://github.com/scrapy/scrapy/issues/1761) ， [issue 1883](https://github.com/scrapy/scrapy/issues/1883) ） * 大量文檔更新和相關修復（ [issue 1291](https://github.com/scrapy/scrapy/issues/1291) ， [issue 1302](https://github.com/scrapy/scrapy/issues/1302) ， [issue 1335](https://github.com/scrapy/scrapy/issues/1335) ， [issue 1683](https://github.com/scrapy/scrapy/issues/1683) ， [issue 1660](https://github.com/scrapy/scrapy/issues/1660) ， [issue 1642](https://github.com/scrapy/scrapy/issues/1642) ， [issue 1721](https://github.com/scrapy/scrapy/issues/1721) ， [issue 1727](https://github.com/scrapy/scrapy/issues/1727) ， [issue 1879](https://github.com/scrapy/scrapy/issues/1879) ） * 其他重構、優化和清理（ [issue 1476](https://github.com/scrapy/scrapy/issues/1476) ， [issue 1481](https://github.com/scrapy/scrapy/issues/1481) ， [issue 1477](https://github.com/scrapy/scrapy/issues/1477) ， [issue 1315](https://github.com/scrapy/scrapy/issues/1315) ， [issue 1290](https://github.com/scrapy/scrapy/issues/1290) ， [issue 1750](https://github.com/scrapy/scrapy/issues/1750) ， [issue 1881](https://github.com/scrapy/scrapy/issues/1881) ） ### 折舊和清除 * 補充 `to_bytes` 和 `to_unicode` 蔑視 `str_to_unicode` 和 `unicode_to_str` 功能（ [issue 778](https://github.com/scrapy/scrapy/issues/778) ） * `binary_is_text` 介紹，以取代使用 `isbinarytext` （但返回值相反）（ [issue 1851](https://github.com/scrapy/scrapy/issues/1851) ） * 這個 `optional_features` 已刪除集合（ [issue 1359](https://github.com/scrapy/scrapy/issues/1359) ） * 這個 `--lsprof` 已刪除命令行選項（ [issue 1689](https://github.com/scrapy/scrapy/issues/1689) ）警告：向后不兼容, 但不會破壞用戶代碼。 * 下列數據類型已棄用（ [issue 1720](https://github.com/scrapy/scrapy/issues/1720) ）： * `scrapy.utils.datatypes.MultiValueDictKeyError` * `scrapy.utils.datatypes.MultiValueDict` * `scrapy.utils.datatypes.SiteNode` * 以前捆綁的 `scrapy.xlib.pydispatch` 庫已被棄用并替換為 [pydispatcher](https://pypi.python.org/pypi/PyDispatcher) . ### 重新定位 * `telnetconsole` 被重新安置到 `extensions/` （ [issue 1524](https://github.com/scrapy/scrapy/issues/1524) ） * 注意：在python 3上沒有啟用telnet（[https://github.com/scrapy/scrapy/pull/1524](https://github.com/scrapy/scrapy/pull/1524) issuecomment-146985595） ### 錯誤修正 * Scrapy不會重試 `HTTP 400 Bad Request` 回復了。（ [issue 1289](https://github.com/scrapy/scrapy/issues/1289) ）**警告：向后不兼容！** * 支持http_proxy config的空密碼（ [issue 1274](https://github.com/scrapy/scrapy/issues/1274) ） * 解讀 `application/x-json` 作為 `TextResponse` （ [issue 1333](https://github.com/scrapy/scrapy/issues/1333) ） * 支持多值鏈接rel屬性（ [issue 1201](https://github.com/scrapy/scrapy/issues/1201) ） * 固定的 `scrapy.http.FormRequest.from_response` 當有 `<base>` 標簽（標簽） [issue 1564](https://github.com/scrapy/scrapy/issues/1564) ） * 固定的 [`TEMPLATES_DIR`](topics/settings.html#std:setting-TEMPLATES_DIR) 處理（ [issue 1575](https://github.com/scrapy/scrapy/issues/1575) ） * 各種各樣 `FormRequest` 修復（ [issue 1595](https://github.com/scrapy/scrapy/issues/1595) ， [issue 1596](https://github.com/scrapy/scrapy/issues/1596) ， [issue 1597](https://github.com/scrapy/scrapy/issues/1597) ） * 使 `_monkeypatches` 更健壯（ [issue 1634](https://github.com/scrapy/scrapy/issues/1634) ） * 固定錯誤 `XMLItemExporter` 項目中包含非字符串字段（ [issue 1738](https://github.com/scrapy/scrapy/issues/1738) ） * 在OS X中修復了startproject命令（ [issue 1635](https://github.com/scrapy/scrapy/issues/1635) ） * 非字符串項類型的固定pythonitexporter和csvexporter（ [issue 1737](https://github.com/scrapy/scrapy/issues/1737) ） * 各種與日志相關的修復（ [issue 1294](https://github.com/scrapy/scrapy/issues/1294) ， [issue 1419](https://github.com/scrapy/scrapy/issues/1419) ， [issue 1263](https://github.com/scrapy/scrapy/issues/1263) ， [issue 1624](https://github.com/scrapy/scrapy/issues/1624) ， [issue 1654](https://github.com/scrapy/scrapy/issues/1654) ， [issue 1722](https://github.com/scrapy/scrapy/issues/1722) ， [issue 1726](https://github.com/scrapy/scrapy/issues/1726) 和 [issue 1303](https://github.com/scrapy/scrapy/issues/1303) ） * 固定錯誤 `utils.template.render_templatefile()` （ [issue 1212](https://github.com/scrapy/scrapy/issues/1212) ） * 從中提取站點地圖 `robots.txt` 現在不區分大小寫（ [issue 1902](https://github.com/scrapy/scrapy/issues/1902) ） * 在同一遠程主機上使用多個代理時，HTTPS+連接隧道可能會混淆。（ [issue 1912](https://github.com/scrapy/scrapy/issues/1912) ） ## Scrapy 1.0.7（2017-03-03） * 打包修復：在setup.py中不允許不支持的扭曲版本 ## Scrapy 1.0.6（2016-05-04） * 修正：retrymiddleware現在對非標準的HTTP狀態代碼是健壯的。（ [issue 1857](https://github.com/scrapy/scrapy/issues/1857) ） * 修復：文件存儲HTTP緩存正在檢查錯誤的修改時間（ [issue 1875](https://github.com/scrapy/scrapy/issues/1875) ） * 文件：斯芬克斯1.4的支持+（ [issue 1893](https://github.com/scrapy/scrapy/issues/1893) ） * 文檔：選擇器示例的一致性（ [issue 1869](https://github.com/scrapy/scrapy/issues/1869) ） ## Scrapy 1.0.5（2016-02-04） * 修復：【backport】忽略linkextractor中的偽鏈接（修復 [issue 907](https://github.com/scrapy/scrapy/issues/907) ， [commit 108195e](https://github.com/scrapy/scrapy/commit/108195e) ） * tst:已將buildbot makefile更改為使用“pytest”（ [commit 1f3d90a](https://github.com/scrapy/scrapy/commit/1f3d90a) ） * 文檔：修復了教程和媒體管道中的拼寫錯誤（ [commit 808a9ea](https://github.com/scrapy/scrapy/commit/808a9ea) 和 [commit 803bd87](https://github.com/scrapy/scrapy/commit/803bd87) ） * 文檔：在設置文檔中將ajaxcrawlMiddleware添加到下載器中間件庫（ [commit aa94121](https://github.com/scrapy/scrapy/commit/aa94121) ） ## Scrapy 1.0.4（2015-12-30） * 忽略xlib/tx文件夾，具體取決于Twisted版本。（ [commit 7dfa979](https://github.com/scrapy/scrapy/commit/7dfa979) ） * 在新Travis CI Infra上運行（ [commit 6e42f0b](https://github.com/scrapy/scrapy/commit/6e42f0b) ） * 拼寫修復（ [commit 823a1cc](https://github.com/scrapy/scrapy/commit/823a1cc) ） * 在xmliter regex中轉義nodename（ [commit da3c155](https://github.com/scrapy/scrapy/commit/da3c155) ） * 用點測試XML節點名（ [commit 4418fc3](https://github.com/scrapy/scrapy/commit/4418fc3) ） * 測試中不要使用壞枕頭版本（ [commit a55078c](https://github.com/scrapy/scrapy/commit/a55078c) ） * 禁用登錄版本命令。關閉α1426 [commit 86fc330](https://github.com/scrapy/scrapy/commit/86fc330) ） * 禁用登錄StartProject命令（ [commit db4c9fe](https://github.com/scrapy/scrapy/commit/db4c9fe) ） * 添加pypi下載狀態徽章（ [commit df2b944](https://github.com/scrapy/scrapy/commit/df2b944) ） * 如果一個pr是由一個 Scrapy / Scrapy 的分支生成的，則不要在travis上運行兩次測試。（ [commit a83ab41](https://github.com/scrapy/scrapy/commit/a83ab41) ） * 在自述文件中添加python 3移植狀態徽章（ [commit 73ac80d](https://github.com/scrapy/scrapy/commit/73ac80d) ） * 固定的rfpduefilter持久性（ [commit 97d080e](https://github.com/scrapy/scrapy/commit/97d080e) ） * TST顯示Dupefilter持久性不起作用的測試（ [commit 97f2fb3](https://github.com/scrapy/scrapy/commit/97f2fb3) ） * 在file://scheme handler上顯式關閉文件（ [commit d9b4850](https://github.com/scrapy/scrapy/commit/d9b4850) ） * 禁用shell中的dupefilter（ [commit c0d0734](https://github.com/scrapy/scrapy/commit/c0d0734) ） * 文檔：向側邊欄中顯示的目錄樹添加標題（ [commit aa239ad](https://github.com/scrapy/scrapy/commit/aa239ad) ） * Doc從安裝說明中刪除了pywin32，因為它已經聲明為依賴項。（ [commit 10eb400](https://github.com/scrapy/scrapy/commit/10eb400) ） * 添加了有關在Windows和其他操作系統中使用Conda的安裝說明。（ [commit 1c3600a](https://github.com/scrapy/scrapy/commit/1c3600a) ） * 修正了小語法問題。（ [commit 7f4ddd5](https://github.com/scrapy/scrapy/commit/7f4ddd5) ） * 修正了文檔中的拼寫錯誤。（ [commit b71f677](https://github.com/scrapy/scrapy/commit/b71f677) ） * 版本1現在存在（ [commit 5456c0e](https://github.com/scrapy/scrapy/commit/5456c0e) ） * 修復另一個無效的xpath錯誤（ [commit 0a1366e](https://github.com/scrapy/scrapy/commit/0a1366e) ） * 修復值錯誤：selectors.rst上的xpath://div/[id=“not exists”]/text（）無效（ [commit ca8d60f](https://github.com/scrapy/scrapy/commit/ca8d60f) ） * 拼寫錯誤更正（ [commit 7067117](https://github.com/scrapy/scrapy/commit/7067117) ） * 修復downloader-middleware.rst和exceptions.rst中的拼寫錯誤，middlware->middleware（ [commit 32f115c](https://github.com/scrapy/scrapy/commit/32f115c) ） * 在Ubuntu安裝部分添加有關Debian兼容性的說明（ [commit 23fda69](https://github.com/scrapy/scrapy/commit/23fda69) ） * 用virtualenv替換替代的osx安裝解決方案（ [commit 98b63ee](https://github.com/scrapy/scrapy/commit/98b63ee) ） * 有關安裝說明，請參閱自制主頁。（ [commit 1925db1](https://github.com/scrapy/scrapy/commit/1925db1) ） * 將最舊支持的TOX版本添加到參與文檔（ [commit 5d10d6d](https://github.com/scrapy/scrapy/commit/5d10d6d) ） * 安裝文檔中關于pip已經包含在python中的說明>=2.7.9（ [commit 85c980e](https://github.com/scrapy/scrapy/commit/85c980e) ） * 在文檔的Ubuntu安裝部分添加非python依賴項（ [commit fbd010d](https://github.com/scrapy/scrapy/commit/fbd010d) ） * 將OS X安裝部分添加到文檔（ [commit d8f4cba](https://github.com/scrapy/scrapy/commit/d8f4cba) ） * 文檔（enh）：顯式指定RTD主題的路徑（ [commit de73b1a](https://github.com/scrapy/scrapy/commit/de73b1a) ） * 次要：scrapy.spider docs語法（ [commit 1ddcc7b](https://github.com/scrapy/scrapy/commit/1ddcc7b) ） * 使常用實踐示例代碼與注釋匹配（ [commit 1b85bcf](https://github.com/scrapy/scrapy/commit/1b85bcf) ） * 下一個重復呼叫（心跳）。（ [commit 55f7104](https://github.com/scrapy/scrapy/commit/55f7104) ） * 與Twisted 15.4.0的后端修復兼容性（ [commit b262411](https://github.com/scrapy/scrapy/commit/b262411) ） * 插腳Pytest至2.7.3（ [commit a6535c2](https://github.com/scrapy/scrapy/commit/a6535c2) ） * 合并請求1512來自mgedmin/patch-1（ [commit 8876111](https://github.com/scrapy/scrapy/commit/8876111) ） * 合并請求1513來自mgedmin/patch-2（ [commit 5d4daf8](https://github.com/scrapy/scrapy/commit/5d4daf8) ） * Typo [commit f8d0682](https://github.com/scrapy/scrapy/commit/f8d0682) ） * 修復列表格式（ [commit 5f83a93](https://github.com/scrapy/scrapy/commit/5f83a93) ） * 在最近對queuelib進行了更改之后，修復 Scrapy 尖叫測試（ [commit 3365c01](https://github.com/scrapy/scrapy/commit/3365c01) ） * 合并請求1475來自RWEindl/Patch-1（ [commit 2d688cd](https://github.com/scrapy/scrapy/commit/2d688cd) ） * 更新tutorial.rst（ [commit fbc1f25](https://github.com/scrapy/scrapy/commit/fbc1f25) ） * 合并請求1449，來自Rhoekman/Patch-1（ [commit 7d6538c](https://github.com/scrapy/scrapy/commit/7d6538c) ） * 小的語法變化（ [commit 8752294](https://github.com/scrapy/scrapy/commit/8752294) ） * 將openssl版本添加到version命令（ [commit 13c45ac](https://github.com/scrapy/scrapy/commit/13c45ac) ） ## Scrapy 1.0.3（2015-08-11） * 將服務標識添加到Scrapy安裝需要（ [commit cbc2501](https://github.com/scrapy/scrapy/commit/cbc2501) ） * Travis的解決方案296（ [commit 66af9cd](https://github.com/scrapy/scrapy/commit/66af9cd) ） ## Scrapy 1.0.2（2015-08-06） * Twisted 15.3.0不會引發picklinger或序列化lambda函數（ [commit b04dd7d](https://github.com/scrapy/scrapy/commit/b04dd7d) ） * 次要方法名稱修復（ [commit 6f85c7f](https://github.com/scrapy/scrapy/commit/6f85c7f) ） * 小調：下流。 Spider 語法和清晰度（ [commit 9c9d2e0](https://github.com/scrapy/scrapy/commit/9c9d2e0) ） * 宣傳支持渠道（ [commit c63882b](https://github.com/scrapy/scrapy/commit/c63882b) ） * 固定輸入錯誤 [commit a9ae7b0](https://github.com/scrapy/scrapy/commit/a9ae7b0) ） * 修復文檔引用。（ [commit 7c8a4fe](https://github.com/scrapy/scrapy/commit/7c8a4fe) ） ## Scrapy 1.0.1（2015-07-01） * 在傳遞到ftpclient之前取消引用請求路徑，它已經轉義了路徑（ [commit cc00ad2](https://github.com/scrapy/scrapy/commit/cc00ad2) ） * 在清單中包括測試/到源分發。（ [commit eca227e](https://github.com/scrapy/scrapy/commit/eca227e) ） * Doc Fix SelectJMES文檔（ [commit b8567bc](https://github.com/scrapy/scrapy/commit/b8567bc) ） * Doc將Ubuntu和ArchLinux帶到Windows子部分之外（ [commit 392233f](https://github.com/scrapy/scrapy/commit/392233f) ） * 從Ubuntu包中刪除版本后綴（ [commit 5303c66](https://github.com/scrapy/scrapy/commit/5303c66) ） * 1.0的文檔更新發布日期（ [commit c89fa29](https://github.com/scrapy/scrapy/commit/c89fa29) ） ## Scrapy 1.0.0（2015-06-19）在這個主要版本中，您會發現許多新的特性和錯誤修復。確保檢查我們的更新 [overview](intro/overview.html#intro-overview) 看看其中的一些變化，以及我們的刷 [tutorial](intro/tutorial.html#intro-tutorial) . ### 支持在spiders中返回字典聲明和返回 Scrapy 項目不再需要從您的 Spider 收集抓取的數據，您現在可以返回顯式字典。 _經典版_ ```py class MyItem(scrapy.Item): url = scrapy.Field() class MySpider(scrapy.Spider): def parse(self, response): return MyItem(url=response.url) ``` _新版本_ ```py class MySpider(scrapy.Spider): def parse(self, response): return {'url': response.url} ``` ### 每個 Spider 設置（GSOC 2014）去年的谷歌夏季代碼項目完成了一項重要的機制重新設計，用于填充設置，引入明確的優先級來覆蓋任何給定的設置。作為該目標的擴展，我們為專門針對單個 Spider 的設置提供了新的優先級，允許它們重新定義項目設置。通過定義 [`custom_settings`](topics/spiders.html#scrapy.spiders.Spider.custom_settings "scrapy.spiders.Spider.custom_settings") Spider 中的類變量： ```py class MySpider(scrapy.Spider): custom_settings = { "DOWNLOAD_DELAY": 5.0, "RETRY_ENABLED": False, } ``` 閱讀有關設置填充的詳細信息： [設置](topics/settings.html#topics-settings) ### Python 測井 Scrapy1.0已經從扭曲的日志記錄轉移到支持python內置的默認日志記錄系統。我們對大多數舊的自定義接口保持向后兼容性，以便調用日志記錄函數，但是您將收到警告，以便完全切換到Python日志記錄API。 _舊版本_ ```py from scrapy import log log.msg('MESSAGE', log.INFO) ``` _新版本_ ```py import logging logging.info('MESSAGE') ``` 用 Spider 記錄仍然是一樣的，但在 [`log()`](topics/spiders.html#scrapy.spiders.Spider.log "scrapy.spiders.Spider.log") 方法可以訪問自定義 [`logger`](topics/spiders.html#scrapy.spiders.Spider.logger "scrapy.spiders.Spider.logger") 為 Spider 發布日志事件而創建： ```py class MySpider(scrapy.Spider): def parse(self, response): self.logger.info('Response received') ``` 閱讀日志文檔中的更多內容： [Logging](topics/logging.html#topics-logging) ### 爬蟲API重構（GSOC 2014）上一個谷歌夏季代碼的另一個里程碑是對內部API的重構，尋求更簡單和更容易的使用。檢查新的核心接口： [核心API](topics/api.html#topics-api) 您將要面對這些更改的一個常見情況是在從腳本運行scrapy時。以下是如何使用新API手動運行spider的快速示例： ```py from scrapy.crawler import CrawlerProcess process = CrawlerProcess({ 'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)' }) process.crawl(MySpider) process.start() ``` 請記住，此功能仍在開發中，其API可能會更改，直到達到穩定狀態。請參閱運行scrappy的腳本的更多示例： [常用做法](topics/practices.html#topics-practices) ### 模塊重新定位為了改善 Scrapy 的總體結構，模塊進行了大量的重新排列。主要的變化是將不同的子包分離成新的項目，并同時解散這兩個項目。 `scrapy.contrib` 和 `scrapy.contrib_exp` 到頂級包中。內部重新定位之間保持向后兼容性，而導入不推薦使用的模塊時會收到指示其新位置的警告。 #### 重新定位的完整列表外包包注解這些擴展進行了一些小的更改，例如更改了一些設置名稱。請檢查每個新存儲庫中的文檔以熟悉新用法。 | 老位置 | 新位置 | | --- | --- | | scrapy.commands.deploy | [scrapyd-client](https://github.com/scrapy/scrapyd-client) （見其他備選方案： [部署 Spider](topics/deploy.html#topics-deploy) ） | | scrapy.contrib.djangoitem | [scrapy-djangoitem](https://github.com/scrapy-plugins/scrapy-djangoitem) | | scrapy.webservice | [scrapy-jsonrpc](https://github.com/scrapy-plugins/scrapy-jsonrpc) | `scrapy.contrib_exp` 和 `scrapy.contrib` 溶解 | 老位置 | 新位置 | | --- | --- | | scrapy.contribexp.downloadermiddleware.解壓縮 | scrapy.downloadermiddleware.decompresson | | scrapy.contrib_exp.iterators | scrapy.utils.iterators | | scrapy.contrib.downloadermiddleware | scrapy.downloadermiddlewares | | scrapy.contrib.exporter | scrapy.exporters | | scrapy.contrib.linkextractors | scrapy.linkextractors | | scrapy.contrib.loader | scrapy.loader | | scrapy.contrib.loader.processor | scrapy.loader.processors | | scrapy.contrib.pipeline | scrapy.pipelines | | scrapy.contrib.spidermiddleware | scrapy.spidermiddlewares | | scrapy.contrib.spiders | scrapy.spiders | | * scrapy.contrib.closespider * scrapy.contrib.corestats * scrapy.contrib.debug * scrapy.contrib.feedexport * scrapy.contrib.httpcache * scrapy.contrib.logstats * scrapy.contrib.memdebug * scrapy.contrib.memusage * scrapy.contrib.spiderstate * scrapy.contrib.statsmailer * scrapy.contrib.throttle | scrapy.extensions.* | 復數重命名與模塊統一 | 老位置 | 新位置 | | --- | --- | | scrapy.command | scrapy.commands | | scrapy.dupefilter | scrapy.dupefilters | | scrapy.linkextractor | scrapy.linkextractors | | scrapy.spider | scrapy.spiders | | scrapy.squeue | scrapy.squeues | | scrapy.statscol | scrapy.statscollectors | | scrapy.utils.decorator | scrapy.utils.decorators | 類重命名 | 老位置 | 新位置 | | --- | --- | | scrapy.spidermanager.SpiderManager | scrapy.spiderloader.SpiderLoader | 設置重命名 | 老位置 | 新位置 | | --- | --- | | SPIDER_MANAGER_CLASS | SPIDER_LOADER_CLASS | ### Changelog 新功能和增強功能 * python日志（ [issue 1060](https://github.com/scrapy/scrapy/issues/1060) ， [issue 1235](https://github.com/scrapy/scrapy/issues/1235) ， [issue 1236](https://github.com/scrapy/scrapy/issues/1236) ， [issue 1240](https://github.com/scrapy/scrapy/issues/1240) ， [issue 1259](https://github.com/scrapy/scrapy/issues/1259) ， [issue 1278](https://github.com/scrapy/scrapy/issues/1278) ， [issue 1286](https://github.com/scrapy/scrapy/issues/1286) ） * feed_export_fields選項（ [issue 1159](https://github.com/scrapy/scrapy/issues/1159) ， [issue 1224](https://github.com/scrapy/scrapy/issues/1224) ） * DNS緩存大小和超時選項（ [issue 1132](https://github.com/scrapy/scrapy/issues/1132) ） * 支持xmliter？lxml中的命名空間前綴（ [issue 963](https://github.com/scrapy/scrapy/issues/963) ） * 反應器線程池最大大小設置（ [issue 1123](https://github.com/scrapy/scrapy/issues/1123) ） * 允許 Spider 返回聽寫。（ [issue 1081](https://github.com/scrapy/scrapy/issues/1081) ） * 添加response.urljoin（）幫助程序（ [issue 1086](https://github.com/scrapy/scrapy/issues/1086) ） * 在~/.config/scrappy.cfg中查找用戶配置（ [issue 1098](https://github.com/scrapy/scrapy/issues/1098) ） * 處理TLS SNI（ [issue 1101](https://github.com/scrapy/scrapy/issues/1101) ） * 選擇列表先提取（ [issue 624](https://github.com/scrapy/scrapy/issues/624) ， [issue 1145](https://github.com/scrapy/scrapy/issues/1145) ） * 添加了jmesselect（ [issue 1016](https://github.com/scrapy/scrapy/issues/1016) ） * 將gzip壓縮添加到文件系統HTTP緩存后端（ [issue 1020](https://github.com/scrapy/scrapy/issues/1020) ） * 鏈接提取器中的CSS支持（ [issue 983](https://github.com/scrapy/scrapy/issues/983) ） * httpcache不緩存meta 19 689（ [issue 821](https://github.com/scrapy/scrapy/issues/821) ） * 添加調度程序丟棄請求時要發送的信號（ [issue 961](https://github.com/scrapy/scrapy/issues/961) ） * 避免下載大響應（ [issue 946](https://github.com/scrapy/scrapy/issues/946) ） * 允許在csvfeedspider中指定QuoteCar（ [issue 882](https://github.com/scrapy/scrapy/issues/882) ） * 添加對“ Spider 錯誤處理”日志消息的引用（ [issue 795](https://github.com/scrapy/scrapy/issues/795) ） * 處理robots.txt一次（ [issue 896](https://github.com/scrapy/scrapy/issues/896) ） * 每個 Spider 的GSOC設置（ [issue 854](https://github.com/scrapy/scrapy/issues/854) ） * 添加項目名稱驗證（ [issue 817](https://github.com/scrapy/scrapy/issues/817) ） * GSOC API清理（ [issue 816](https://github.com/scrapy/scrapy/issues/816) ， [issue 1128](https://github.com/scrapy/scrapy/issues/1128) ， [issue 1147](https://github.com/scrapy/scrapy/issues/1147) ， [issue 1148](https://github.com/scrapy/scrapy/issues/1148) ， [issue 1156](https://github.com/scrapy/scrapy/issues/1156) ， [issue 1185](https://github.com/scrapy/scrapy/issues/1185) ， [issue 1187](https://github.com/scrapy/scrapy/issues/1187) ， [issue 1258](https://github.com/scrapy/scrapy/issues/1258) ， [issue 1268](https://github.com/scrapy/scrapy/issues/1268) ， [issue 1276](https://github.com/scrapy/scrapy/issues/1276) ， [issue 1285](https://github.com/scrapy/scrapy/issues/1285) ， [issue 1284](https://github.com/scrapy/scrapy/issues/1284) ） * 對IO操作的響應能力更強（ [issue 1074](https://github.com/scrapy/scrapy/issues/1074) 和 [issue 1075](https://github.com/scrapy/scrapy/issues/1075) ） * 關閉時對httpcache執行leveldb壓縮（ [issue 1297](https://github.com/scrapy/scrapy/issues/1297) ）折舊和清除 * 取消預測htmlparser鏈接提取程序（ [issue 1205](https://github.com/scrapy/scrapy/issues/1205) ） * 從FeedExporter中刪除已棄用的代碼（ [issue 1155](https://github.com/scrapy/scrapy/issues/1155) ） * 用于.15兼容性的剩余部分（ [issue 925](https://github.com/scrapy/scrapy/issues/925) ） * 放棄對每個 Spider 并發請求的支持（ [issue 895](https://github.com/scrapy/scrapy/issues/895) ） * 刪除舊的發動機代碼（ [issue 911](https://github.com/scrapy/scrapy/issues/911) ） * 拆除SGMLLinkextractor（ [issue 777](https://github.com/scrapy/scrapy/issues/777) ）重新定位 * 將exporters/uuu init_uuu.py移動到exporters.py（ [issue 1242](https://github.com/scrapy/scrapy/issues/1242) ） * 將基類移動到其包中（ [issue 1218](https://github.com/scrapy/scrapy/issues/1218) ， [issue 1233](https://github.com/scrapy/scrapy/issues/1233) ） * 模塊重新定位（ [issue 1181](https://github.com/scrapy/scrapy/issues/1181) ， [issue 1210](https://github.com/scrapy/scrapy/issues/1210) ） * 將spiderManager重命名為spiderLoader（ [issue 1166](https://github.com/scrapy/scrapy/issues/1166) ） * 移除Djangoitem（ [issue 1177](https://github.com/scrapy/scrapy/issues/1177) ） * 刪除 Scrapy 部署命令（ [issue 1102](https://github.com/scrapy/scrapy/issues/1102) ） * 解除控制（ [issue 1134](https://github.com/scrapy/scrapy/issues/1134) ） * 已從根目錄中刪除bin文件夾，修復913（ [issue 914](https://github.com/scrapy/scrapy/issues/914) ） * 刪除基于JSONRPC的WebService（ [issue 859](https://github.com/scrapy/scrapy/issues/859) ） * 在項目根目錄下移動測試用例（ [issue 827](https://github.com/scrapy/scrapy/issues/827) ， [issue 841](https://github.com/scrapy/scrapy/issues/841) ） * 修復設置中重新定位路徑的向后不兼容性（ [issue 1267](https://github.com/scrapy/scrapy/issues/1267) ）文檔 * 爬蟲過程文檔（ [issue 1190](https://github.com/scrapy/scrapy/issues/1190) ） * 在描述中傾向于使用Web抓取而不是屏幕抓取（ [issue 1188](https://github.com/scrapy/scrapy/issues/1188) ） * 對Scrapy教程的一些改進（ [issue 1180](https://github.com/scrapy/scrapy/issues/1180) ） * 將文件管道與圖像管道一起記錄（ [issue 1150](https://github.com/scrapy/scrapy/issues/1150) ） * 部署文檔調整（ [issue 1164](https://github.com/scrapy/scrapy/issues/1164) ） * 增加了部署部分，包括廢料部署和SHUB（ [issue 1124](https://github.com/scrapy/scrapy/issues/1124) ） * 向項目模板添加更多設置（ [issue 1073](https://github.com/scrapy/scrapy/issues/1073) ） * 概述頁面的一些改進（ [issue 1106](https://github.com/scrapy/scrapy/issues/1106) ） * 更新了docs/topics/architecture.rst中的鏈接（ [issue 647](https://github.com/scrapy/scrapy/issues/647) ） * 文檔重新排序主題（ [issue 1022](https://github.com/scrapy/scrapy/issues/1022) ） * 更新request.meta特殊鍵列表（ [issue 1071](https://github.com/scrapy/scrapy/issues/1071) ） * 文檔下載超時（ [issue 898](https://github.com/scrapy/scrapy/issues/898) ） * 文檔簡化擴展文檔（ [issue 893](https://github.com/scrapy/scrapy/issues/893) ） * 泄漏文檔 [issue 894](https://github.com/scrapy/scrapy/issues/894) ） * 項目管道的爬蟲方法的文檔（ [issue 904](https://github.com/scrapy/scrapy/issues/904) ） * Spider 網錯誤不支持延遲（ [issue 1292](https://github.com/scrapy/scrapy/issues/1292) ） * 修正和獅身人面像相關修正（ [issue 1220](https://github.com/scrapy/scrapy/issues/1220) ， [issue 1219](https://github.com/scrapy/scrapy/issues/1219) ， [issue 1196](https://github.com/scrapy/scrapy/issues/1196) ， [issue 1172](https://github.com/scrapy/scrapy/issues/1172) ， [issue 1171](https://github.com/scrapy/scrapy/issues/1171) ， [issue 1169](https://github.com/scrapy/scrapy/issues/1169) ， [issue 1160](https://github.com/scrapy/scrapy/issues/1160) ， [issue 1154](https://github.com/scrapy/scrapy/issues/1154) ， [issue 1127](https://github.com/scrapy/scrapy/issues/1127) ， [issue 1112](https://github.com/scrapy/scrapy/issues/1112) ， [issue 1105](https://github.com/scrapy/scrapy/issues/1105) ， [issue 1041](https://github.com/scrapy/scrapy/issues/1041) ， [issue 1082](https://github.com/scrapy/scrapy/issues/1082) ， [issue 1033](https://github.com/scrapy/scrapy/issues/1033) ， [issue 944](https://github.com/scrapy/scrapy/issues/944) ， [issue 866](https://github.com/scrapy/scrapy/issues/866) ， [issue 864](https://github.com/scrapy/scrapy/issues/864) ， [issue 796](https://github.com/scrapy/scrapy/issues/796) ， [issue 1260](https://github.com/scrapy/scrapy/issues/1260) ， [issue 1271](https://github.com/scrapy/scrapy/issues/1271) ， [issue 1293](https://github.com/scrapy/scrapy/issues/1293) ， [issue 1298](https://github.com/scrapy/scrapy/issues/1298) ）錯誤修正 * 項目多繼承修復（ [issue 353](https://github.com/scrapy/scrapy/issues/353) ， [issue 1228](https://github.com/scrapy/scrapy/issues/1228) ） * itemloader.load_item:迭代字段副本（ [issue 722](https://github.com/scrapy/scrapy/issues/722) ） * 修復延遲（robotstxtmiddleware）中未處理的錯誤（ [issue 1131](https://github.com/scrapy/scrapy/issues/1131) ， [issue 1197](https://github.com/scrapy/scrapy/issues/1197) ） * 強制讀取下載超時為int（ [issue 954](https://github.com/scrapy/scrapy/issues/954) ） * scrapy.utils.misc.load_對象應打印完整的回溯（ [issue 902](https://github.com/scrapy/scrapy/issues/902) ） * 修復“.local”主機名的錯誤（ [issue 878](https://github.com/scrapy/scrapy/issues/878) ） * 修復已啟用的擴展、中間軟件、管道信息不再打印（ [issue 879](https://github.com/scrapy/scrapy/issues/879) ） * 修復在meta設置為false時不合并cookies的不良行為（ [issue 846](https://github.com/scrapy/scrapy/issues/846) ） python 3進行中支持 * 如果twisted.conch不可用，則禁用scrappy.telnet（ [issue 1161](https://github.com/scrapy/scrapy/issues/1161) ） * 修復ajaxcrawl.py中的python 3語法錯誤（ [issue 1162](https://github.com/scrapy/scrapy/issues/1162) ） * Urllib的更多python3兼容性更改（ [issue 1121](https://github.com/scrapy/scrapy/issues/1121) ） * 在Python3中，AssertItemSequal被重命名為AssertCountEqual。（ [issue 1070](https://github.com/scrapy/scrapy/issues/1070) ） * 導入unittest.mock（如果可用）。（ [issue 1066](https://github.com/scrapy/scrapy/issues/1066) ） * 更新了不推薦使用的cgi.parse_qsl以使用six的parse_qsl（ [issue 909](https://github.com/scrapy/scrapy/issues/909) ） * 防止python 3端口回歸（ [issue 830](https://github.com/scrapy/scrapy/issues/830) ） * py3:對python 3使用可變映射（ [issue 810](https://github.com/scrapy/scrapy/issues/810) ） * py3：使用six.bytesio和six.moves.cstringio（ [issue 803](https://github.com/scrapy/scrapy/issues/803) ） * py3:修復xmlrpclib和電子郵件導入（ [issue 801](https://github.com/scrapy/scrapy/issues/801) ） * PY3：使用6個用于robotparser和urlparse（ [issue 800](https://github.com/scrapy/scrapy/issues/800) ） * py3：使用6.iterkeys、6.iteritems和tempfile（ [issue 799](https://github.com/scrapy/scrapy/issues/799) ） * py3:fix有_鍵并使用six.moves.configparser（ [issue 798](https://github.com/scrapy/scrapy/issues/798) ） * py3:使用six.moves.cpickle（ [issue 797](https://github.com/scrapy/scrapy/issues/797) ） * py3使在python3中運行一些測試成為可能（ [issue 776](https://github.com/scrapy/scrapy/issues/776) ）測驗 * 從PY3中刪除不必要的行忽略（ [issue 1243](https://github.com/scrapy/scrapy/issues/1243) ） * 在收集測試時修復來自pytest的剩余警告（ [issue 1206](https://github.com/scrapy/scrapy/issues/1206) ） * 將文檔生成添加到Travis（ [issue 1234](https://github.com/scrapy/scrapy/issues/1234) ） * TST不從不推薦使用的模塊收集測試。（ [issue 1165](https://github.com/scrapy/scrapy/issues/1165) ） * 在測試中安裝Service_Identity包以防止警告（ [issue 1168](https://github.com/scrapy/scrapy/issues/1168) ） * 修復測試中不推薦使用的設置API（ [issue 1152](https://github.com/scrapy/scrapy/issues/1152) ） * 使用post方法為WebClient添加測試，但未提供主體（ [issue 1089](https://github.com/scrapy/scrapy/issues/1089) ） * py3-ignores.txt支持注釋（ [issue 1044](https://github.com/scrapy/scrapy/issues/1044) ） * 使一些主張現代化（ [issue 835](https://github.com/scrapy/scrapy/issues/835) ） * 選擇器。重復測試（ [issue 779](https://github.com/scrapy/scrapy/issues/779) ）代碼重構 * csvfeedspider清理：使用迭代 Spider 網輸出（ [issue 1079](https://github.com/scrapy/scrapy/issues/1079) ） * 從scrapy.utils.spider.iter_spider_輸出中刪除不必要的檢查（ [issue 1078](https://github.com/scrapy/scrapy/issues/1078) ） * 派送PEP8（ [issue 992](https://github.com/scrapy/scrapy/issues/992) ） * 已從walk_modules（）中刪除未使用的“load=false”參數（ [issue 871](https://github.com/scrapy/scrapy/issues/871) ） * 為了保持一致，請使用 `job_dir` 幫手 `SpiderState` 延伸。（ [issue 805](https://github.com/scrapy/scrapy/issues/805) ） * 將“sflo”局部變量重命名為不那么神秘的“log_observer”（ [issue 775](https://github.com/scrapy/scrapy/issues/775) ） ## Scrapy 0.24.6（2015-04-20） * 使用py2下的unicode_轉義對無效的xpath進行編碼（ [commit 07cb3e5](https://github.com/scrapy/scrapy/commit/07cb3e5) ） * 修復ipython shell作用域問題并加載ipython用戶配置（ [commit 2c8e573](https://github.com/scrapy/scrapy/commit/2c8e573) ） * 修復文檔中的小錯誤（ [commit d694019](https://github.com/scrapy/scrapy/commit/d694019) ） * 固定小打字錯誤（ [commit f92fa83](https://github.com/scrapy/scrapy/commit/f92fa83) ） * 在提取數據時已將sel.xpath（）調用轉換為response.xpath（）。（ [commit c2c6d15](https://github.com/scrapy/scrapy/commit/c2c6d15) ） ## Scrapy 0.24.5（2015-02-25） * 在Twisted 15.0.0上支持新的getEndpoint代理簽名（ [commit 540b9bc](https://github.com/scrapy/scrapy/commit/540b9bc) ） * 多了幾個參考文獻（ [commit b4c454b](https://github.com/scrapy/scrapy/commit/b4c454b) ） * 文檔修復引用（ [commit e3c1260](https://github.com/scrapy/scrapy/commit/e3c1260) ） * T.I.B.ThreadeDresolver現在是一個新的類（ [commit 9e13f42](https://github.com/scrapy/scrapy/commit/9e13f42) ） * S3DownloadHandler:修復帶引用路徑/查詢參數的請求的身份驗證（ [commit cdb9a0b](https://github.com/scrapy/scrapy/commit/cdb9a0b) ） * 修復了mailsender文檔中的變量類型（ [commit bb3a848](https://github.com/scrapy/scrapy/commit/bb3a848) ） * 重置項目而不是項目計數（ [commit edb07a4](https://github.com/scrapy/scrapy/commit/edb07a4) ） * 關于閱讀什么文件供貢獻的暫定注意信息（ [commit 7ee6f7a](https://github.com/scrapy/scrapy/commit/7ee6f7a) ） * Mitmproxy 0.10.1也需要Netlib 0.10.1（ [commit 874fcdd](https://github.com/scrapy/scrapy/commit/874fcdd) ） * 銷Mitmproxy 0.10.1 as>0.11不適用于測試（ [commit c6b21f0](https://github.com/scrapy/scrapy/commit/c6b21f0) ） * 在本地測試parse命令，而不是針對外部URL（ [commit c3a6628](https://github.com/scrapy/scrapy/commit/c3a6628) ） * 關閉httpDownloadHandler上的連接池時出現補丁扭曲問題（ [commit d0bf957](https://github.com/scrapy/scrapy/commit/d0bf957) ） * 更新動態項類的文檔。（ [commit eeb589a](https://github.com/scrapy/scrapy/commit/eeb589a) ） * 來自Lazar-T/Patch-3的合并請求943（ [commit 5fdab02](https://github.com/scrapy/scrapy/commit/5fdab02) ） * 打字錯誤（ [commit b0ae199](https://github.com/scrapy/scrapy/commit/b0ae199) ） * Twisted需要pywin32。關閉α937 [commit 5cb0cfb](https://github.com/scrapy/scrapy/commit/5cb0cfb) ） * 更新install.rst（ [commit 781286b](https://github.com/scrapy/scrapy/commit/781286b) ） * 來自Lazar-T/Patch-1的合并請求928（ [commit b415d04](https://github.com/scrapy/scrapy/commit/b415d04) ） * 逗號而不是句號（ [commit 627b9ba](https://github.com/scrapy/scrapy/commit/627b9ba) ） * 合并請求885來自JSMA/Patch-1（ [commit de909ad](https://github.com/scrapy/scrapy/commit/de909ad) ） * 更新request-response.rst（ [commit 3f3263d](https://github.com/scrapy/scrapy/commit/3f3263d) ） * sgmlinkextractor-用于解析存在unicode的<area>標記的修復程序（ [commit 49b40f0](https://github.com/scrapy/scrapy/commit/49b40f0) ） ## Scrapy 0.24.4（2014-08-09） * mockserver使用PEM文件，scrapy bench需要。（ [commit 5eddc68](https://github.com/scrapy/scrapy/commit/5eddc68) ） * 下腳料臺需要下腳料。測試*（ [commit d6cb999](https://github.com/scrapy/scrapy/commit/d6cb999) ） ## Scrapy 0.24.3（2014-08-09） * 無需在PY3上浪費Travis CI時間0.24（ [commit 8e080c1](https://github.com/scrapy/scrapy/commit/8e080c1) ） * 更新安裝文檔（ [commit 1d0c096](https://github.com/scrapy/scrapy/commit/1d0c096) ） * 有一個特洛夫分類器為 Scrapy 框架！（ [commit 4c701d7](https://github.com/scrapy/scrapy/commit/4c701d7) ） * 更新提到w3lib版本的其他位置（ [commit d109c13](https://github.com/scrapy/scrapy/commit/d109c13) ） * 將w3lib要求更新為1.8.0（ [commit 39d2ce5](https://github.com/scrapy/scrapy/commit/39d2ce5) ） * 使用w3lib.html.replace_entities（）（不推薦使用remove_entities（））（ [commit 180d3ad](https://github.com/scrapy/scrapy/commit/180d3ad) ） * 設置zip_safe=false（ [commit a51ee8b](https://github.com/scrapy/scrapy/commit/a51ee8b) ） * 不裝運測試包（ [commit ee3b371](https://github.com/scrapy/scrapy/commit/ee3b371) ） * 不再需要scrappy.bat（ [commit c3861cf](https://github.com/scrapy/scrapy/commit/c3861cf) ） * 現代化設置.py（ [commit 362e322](https://github.com/scrapy/scrapy/commit/362e322) ） * 頭不能處理非字符串值（ [commit 94a5c65](https://github.com/scrapy/scrapy/commit/94a5c65) ） * 修復FTP測試用例（ [commit a274a7f](https://github.com/scrapy/scrapy/commit/a274a7f) ） * Travis CI構建的總結大約需要50分鐘才能完成。（ [commit ae1e2cc](https://github.com/scrapy/scrapy/commit/ae1e2cc) ） * 更新shell.rst typo（ [commit e49c96a](https://github.com/scrapy/scrapy/commit/e49c96a) ） * 刪除shell結果中的奇怪縮進（ [commit 1ca489d](https://github.com/scrapy/scrapy/commit/1ca489d) ） * 改進了解釋，澄清了博客文章的來源，在規范中添加了xpath字符串函數的鏈接（ [commit 65c8f05](https://github.com/scrapy/scrapy/commit/65c8f05) ） * 已重命名usertimeouterrror和servertimeouterrror 583（ [commit 037f6ab](https://github.com/scrapy/scrapy/commit/037f6ab) ） * 向選擇器文檔添加一些XPath提示（ [commit 2d103e0](https://github.com/scrapy/scrapy/commit/2d103e0) ） * 修復測試以解釋https://github.com/scrappy/w3lib/pull/23（ [commit f8d366a](https://github.com/scrapy/scrapy/commit/f8d366a) ） * 獲取_func_參數最大遞歸修復728（ [commit 81344ea](https://github.com/scrapy/scrapy/commit/81344ea) ） * 根據560更新輸入/輸出處理器示例。（ [commit f7c4ea8](https://github.com/scrapy/scrapy/commit/f7c4ea8) ） * 修復了教程中的python語法。（ [commit db59ed9](https://github.com/scrapy/scrapy/commit/db59ed9) ） * 為隧道代理添加測試用例（ [commit f090260](https://github.com/scrapy/scrapy/commit/f090260) ） * 使用隧道時將代理授權頭泄漏到遠程主機的錯誤修復（ [commit d8793af](https://github.com/scrapy/scrapy/commit/d8793af) ） * 從具有mime類型“application/xml”的xhtml文檔中提取鏈接（ [commit ed1f376](https://github.com/scrapy/scrapy/commit/ed1f376) ） * 合并請求來自Roysc/Patch-1的793（ [commit 91a1106](https://github.com/scrapy/scrapy/commit/91a1106) ） * 修復commands.rst中的拼寫錯誤（ [commit 743e1e2](https://github.com/scrapy/scrapy/commit/743e1e2) ） * settings.overrides.setdefault的更好測試用例（ [commit e22daaf](https://github.com/scrapy/scrapy/commit/e22daaf) ） * 根據HTTP 1.1定義使用CRLF作為行標記（ [commit 5ec430b](https://github.com/scrapy/scrapy/commit/5ec430b) ） ## Scrapy 0.24.2（2014-07-08） * 使用可變映射來代理不推薦使用的設置。overrides和settings.defaults屬性（ [commit e5e8133](https://github.com/scrapy/scrapy/commit/e5e8133) ） * 尚未支持python3（ [commit 3cd6146](https://github.com/scrapy/scrapy/commit/3cd6146) ） * 將python兼容版本集更新為debian包（ [commit fa5d76b](https://github.com/scrapy/scrapy/commit/fa5d76b) ） * 發行說明中的文檔修復格式（ [commit c6a9e20](https://github.com/scrapy/scrapy/commit/c6a9e20) ） ## Scrapy 0.24.1（2014-06-27） * 修復不推薦使用的Crawlersettings并提高與.defaults屬性的向后兼容性（ [commit 8e3f20a](https://github.com/scrapy/scrapy/commit/8e3f20a) ） ## Scrapy 0.24.0（2014-06-26） ### 增強功能 * 改進殘缺的頂級命名空間（ [issue 494](https://github.com/scrapy/scrapy/issues/494) ， [issue 684](https://github.com/scrapy/scrapy/issues/684) ） * 向響應添加選擇器快捷方式（ [issue 554](https://github.com/scrapy/scrapy/issues/554) ， [issue 690](https://github.com/scrapy/scrapy/issues/690) ） * 添加新的基于lxml的linkextractor以替換未包含的sgmlinkextractor（ [issue 559](https://github.com/scrapy/scrapy/issues/559) ， [issue 761](https://github.com/scrapy/scrapy/issues/761) ， [issue 763](https://github.com/scrapy/scrapy/issues/763) ） * 清理設置API-每個 Spider 設置的一部分 **GSoC project** （ [issue 737](https://github.com/scrapy/scrapy/issues/737) ） * 將utf8編碼頭添加到模板（ [issue 688](https://github.com/scrapy/scrapy/issues/688) ， [issue 762](https://github.com/scrapy/scrapy/issues/762) ） * Telnet控制臺現在默認綁定到127.0.0.1（ [issue 699](https://github.com/scrapy/scrapy/issues/699) ） * 更新debian/ubuntu安裝說明（ [issue 509](https://github.com/scrapy/scrapy/issues/509) ， [issue 549](https://github.com/scrapy/scrapy/issues/549) ） * 禁用LXML XPath計算中的智能字符串（ [issue 535](https://github.com/scrapy/scrapy/issues/535) ） * 將基于文件系統的緩存還原為HTTP緩存中間件的默認緩存（ [issue 541](https://github.com/scrapy/scrapy/issues/541) ， [issue 500](https://github.com/scrapy/scrapy/issues/500) ， [issue 571](https://github.com/scrapy/scrapy/issues/571) ） * 將當前爬行器暴露在 Scrapy 殼中（ [issue 557](https://github.com/scrapy/scrapy/issues/557) ） * 改進測試套件，比較csv和xml導出器（ [issue 570](https://github.com/scrapy/scrapy/issues/570) ） * 新的 `offsite/filtered` 和 `offsite/domains` 統計數據（統計） [issue 566](https://github.com/scrapy/scrapy/issues/566) ） * 在Crawlspiper中支持進程鏈接作為生成器（ [issue 555](https://github.com/scrapy/scrapy/issues/555) ） * DupeFilter的詳細日志記錄和新統計計數器（ [issue 553](https://github.com/scrapy/scrapy/issues/553) ） * 將mimetype參數添加到 `MailSender.send()` （ [issue 602](https://github.com/scrapy/scrapy/issues/602) ） * 通用化文件管道日志消息（ [issue 622](https://github.com/scrapy/scrapy/issues/622) ） * 用sgmlinkextractor中的HTML實體替換不可編碼的代碼點（ [issue 565](https://github.com/scrapy/scrapy/issues/565) ） * 已將SEP文檔轉換為RST格式（ [issue 629](https://github.com/scrapy/scrapy/issues/629) ， [issue 630](https://github.com/scrapy/scrapy/issues/630) ， [issue 638](https://github.com/scrapy/scrapy/issues/638) ， [issue 632](https://github.com/scrapy/scrapy/issues/632) ， [issue 636](https://github.com/scrapy/scrapy/issues/636) ， [issue 640](https://github.com/scrapy/scrapy/issues/640) ， [issue 635](https://github.com/scrapy/scrapy/issues/635) ， [issue 634](https://github.com/scrapy/scrapy/issues/634) ， [issue 639](https://github.com/scrapy/scrapy/issues/639) ， [issue 637](https://github.com/scrapy/scrapy/issues/637) ， [issue 631](https://github.com/scrapy/scrapy/issues/631) ， [issue 633](https://github.com/scrapy/scrapy/issues/633) ， [issue 641](https://github.com/scrapy/scrapy/issues/641) ， [issue 642](https://github.com/scrapy/scrapy/issues/642) ） * 用于表單請求中ClickData的nr索引的測試和文檔（ [issue 646](https://github.com/scrapy/scrapy/issues/646) ， [issue 645](https://github.com/scrapy/scrapy/issues/645) ） * 允許像禁用任何其他組件一樣禁用下載程序處理程序（ [issue 650](https://github.com/scrapy/scrapy/issues/650) ） * 在重定向過多后放棄請求時記錄（ [issue 654](https://github.com/scrapy/scrapy/issues/654) ） * 如果 Spider 回調不處理錯誤響應，則記錄錯誤響應（ [issue 612](https://github.com/scrapy/scrapy/issues/612) ， [issue 656](https://github.com/scrapy/scrapy/issues/656) ） * 向HTTP壓縮mw添加內容類型檢查（ [issue 193](https://github.com/scrapy/scrapy/issues/193) ， [issue 660](https://github.com/scrapy/scrapy/issues/660) ） * 使用來自ppa的最新pypi運行pypypy測試（ [issue 674](https://github.com/scrapy/scrapy/issues/674) ） * 使用pytest而不是trial運行測試套件（ [issue 679](https://github.com/scrapy/scrapy/issues/679) ） * 建立文檔并檢查毒物環境中的死鏈接（ [issue 687](https://github.com/scrapy/scrapy/issues/687) ） * 使scrappy.versionu info成為整數的元組（ [issue 681](https://github.com/scrapy/scrapy/issues/681) ， [issue 692](https://github.com/scrapy/scrapy/issues/692) ） * 從文件擴展名推斷導出程序的輸出格式（ [issue 546](https://github.com/scrapy/scrapy/issues/546) ， [issue 659](https://github.com/scrapy/scrapy/issues/659) ， [issue 760](https://github.com/scrapy/scrapy/issues/760) ） * 在中支持不區分大小寫的域 `url_is_from_any_domain()` （ [issue 693](https://github.com/scrapy/scrapy/issues/693) ） * 刪除項目和Spider模板中的PEP8警告（ [issue 698](https://github.com/scrapy/scrapy/issues/698) ） * 測試和文檔 `request_fingerprint` 功能（ [issue 597](https://github.com/scrapy/scrapy/issues/597) ） * GSOC項目9月19日更新 `per-spider settings` （ [issue 705](https://github.com/scrapy/scrapy/issues/705) ） * 合同失敗時，將退出代碼設置為非零（ [issue 727](https://github.com/scrapy/scrapy/issues/727) ） * 添加一個設置以控制作為下載程序組件的類（ [issue 738](https://github.com/scrapy/scrapy/issues/738) ） * 傳入響應 `item_dropped` 信號（信號） [issue 724](https://github.com/scrapy/scrapy/issues/724) ） * 改進 `scrapy check` 合同指揮部（ [issue 733](https://github.com/scrapy/scrapy/issues/733) ， [issue 752](https://github.com/scrapy/scrapy/issues/752) ） * 文件 `spider.closed()` 快捷方式（捷徑” [issue 719](https://github.com/scrapy/scrapy/issues/719) ） * 文件 `request_scheduled` 信號（信號） [issue 746](https://github.com/scrapy/scrapy/issues/746) ） * 添加有關報告安全問題的說明（ [issue 697](https://github.com/scrapy/scrapy/issues/697) ） * 添加LevelDB HTTP緩存存儲后端（ [issue 626](https://github.com/scrapy/scrapy/issues/626) ， [issue 500](https://github.com/scrapy/scrapy/issues/500) ） * 排序 Spider 列表輸出 `scrapy list` 命令（ [issue 742](https://github.com/scrapy/scrapy/issues/742) ） * 多文檔增強和修復（ [issue 575](https://github.com/scrapy/scrapy/issues/575) ， [issue 587](https://github.com/scrapy/scrapy/issues/587) ， [issue 590](https://github.com/scrapy/scrapy/issues/590) ， [issue 596](https://github.com/scrapy/scrapy/issues/596) ， [issue 610](https://github.com/scrapy/scrapy/issues/610) ， [issue 617](https://github.com/scrapy/scrapy/issues/617) ， [issue 618](https://github.com/scrapy/scrapy/issues/618) ， [issue 627](https://github.com/scrapy/scrapy/issues/627) ， [issue 613](https://github.com/scrapy/scrapy/issues/613) ， [issue 643](https://github.com/scrapy/scrapy/issues/643) ， [issue 654](https://github.com/scrapy/scrapy/issues/654) ， [issue 675](https://github.com/scrapy/scrapy/issues/675) ， [issue 663](https://github.com/scrapy/scrapy/issues/663) ， [issue 711](https://github.com/scrapy/scrapy/issues/711) ， [issue 714](https://github.com/scrapy/scrapy/issues/714) ） ### 錯誤修正 * 在regexlinkextractor中創建鏈接時編碼unicode url值（ [issue 561](https://github.com/scrapy/scrapy/issues/561) ） * 忽略項加載器處理器中的無值（ [issue 556](https://github.com/scrapy/scrapy/issues/556) ） * 當sgmlinkxtractor和htmlparserlinkextractor中存在內部標記時修復鏈接文本（ [issue 485](https://github.com/scrapy/scrapy/issues/485) ， [issue 574](https://github.com/scrapy/scrapy/issues/574) ） * 修復對已棄用類的子類的錯誤檢查（ [issue 581](https://github.com/scrapy/scrapy/issues/581) ， [issue 584](https://github.com/scrapy/scrapy/issues/584) ） * 處理由inspect.stack（）失敗引起的錯誤（ [issue 582](https://github.com/scrapy/scrapy/issues/582) ） * 修復對不存在的引擎屬性的引用（ [issue 593](https://github.com/scrapy/scrapy/issues/593) ， [issue 594](https://github.com/scrapy/scrapy/issues/594) ） * 修復類型（）的動態項類示例用法（ [issue 603](https://github.com/scrapy/scrapy/issues/603) ） * 使用lucasdemarchi/codespell修復拼寫錯誤（ [issue 628](https://github.com/scrapy/scrapy/issues/628) ） * 將sgmlinkextractor中attrs參數的默認值固定為tuple（ [issue 661](https://github.com/scrapy/scrapy/issues/661) ） * 修復站點地圖閱讀器中的XXE缺陷（ [issue 676](https://github.com/scrapy/scrapy/issues/676) ） * 修復引擎以支持篩選的啟動請求（ [issue 707](https://github.com/scrapy/scrapy/issues/707) ） * 在沒有主機名的URL上修復非現場中間件案例（ [issue 745](https://github.com/scrapy/scrapy/issues/745) ） * 測試套件不再需要PIL（ [issue 585](https://github.com/scrapy/scrapy/issues/585) ） ## Scrapy 0.22.2（2014-02-14發布） * 修復對不存在的engine.slots的引用。關閉α593 [commit 13c099a](https://github.com/scrapy/scrapy/commit/13c099a) ） * 下載ermw-doc-typo（spidermw-doc-copy-remark）（ [commit 8ae11bf](https://github.com/scrapy/scrapy/commit/8ae11bf) ） * 正確的拼寫錯誤 [commit 1346037](https://github.com/scrapy/scrapy/commit/1346037) ） ## Scrapy 0.22.1（2014-02-08發布） * localhost666在某些情況下可以解決（ [commit 2ec2279](https://github.com/scrapy/scrapy/commit/2ec2279) ） * 測試檢查。堆棧故障（ [commit cc3eda3](https://github.com/scrapy/scrapy/commit/cc3eda3) ） * 當inspect.stack（）失敗時處理案例（ [commit 8cb44f9](https://github.com/scrapy/scrapy/commit/8cb44f9) ） * 修復對已棄用類的子類的錯誤檢查。關閉α581 [commit 46d98d6](https://github.com/scrapy/scrapy/commit/46d98d6) ） * 文檔：最終spider示例的4空間縮進（ [commit 13846de](https://github.com/scrapy/scrapy/commit/13846de) ） * 修復htmlparserlinktextractor并在485合并后進行測試（ [commit 368a946](https://github.com/scrapy/scrapy/commit/368a946) ） * basesgmlinkextractor：修復了鏈接具有內部標記時缺少的空間（ [commit b566388](https://github.com/scrapy/scrapy/commit/b566388) ） * basesgmlinkextractor：添加帶有內部標記的鏈接的單元測試（ [commit c1cb418](https://github.com/scrapy/scrapy/commit/c1cb418) ） * basesgmlinkextractor:修復了未知的_end tag（），以便在結束標記與開始標記匹配時只設置當前的_link=none（ [commit 7e4d627](https://github.com/scrapy/scrapy/commit/7e4d627) ） * 修復Travis CI構建的測試（ [commit 76c7e20](https://github.com/scrapy/scrapy/commit/76c7e20) ） * 用HTML實體替換不可編碼的代碼點。修復562和285（ [commit 5f87b17](https://github.com/scrapy/scrapy/commit/5f87b17) ） * regexlinkextractor:創建鏈接時編碼URL Unicode值（ [commit d0ee545](https://github.com/scrapy/scrapy/commit/d0ee545) ） * 用最新的輸出更新了教程的爬行輸出。（ [commit 8da65de](https://github.com/scrapy/scrapy/commit/8da65de) ） * 使用爬蟲引用更新了shell文檔，并修復了實際shell輸出。（ [commit 875b9ab](https://github.com/scrapy/scrapy/commit/875b9ab) ） * PEP8小編輯。（ [commit f89efaf](https://github.com/scrapy/scrapy/commit/f89efaf) ） * 將當前爬行器暴露在 Scrapy 殼中。（ [commit 5349cec](https://github.com/scrapy/scrapy/commit/5349cec) ） * 未使用的重新導入和PEP8小編輯。（ [commit 387f414](https://github.com/scrapy/scrapy/commit/387f414) ） * 使用itemloader時忽略none的值。（ [commit 0632546](https://github.com/scrapy/scrapy/commit/0632546) ） * Doc修復了默認值中的httpcache_存儲錯誤，該默認值現在是filesystem而不是dbm。（ [commit cde9a8c](https://github.com/scrapy/scrapy/commit/cde9a8c) ） * 將Ubuntu安裝指令顯示為文本代碼（ [commit fb5c9c5](https://github.com/scrapy/scrapy/commit/fb5c9c5) ） * 更新Ubuntu安裝說明（ [commit 70fb105](https://github.com/scrapy/scrapy/commit/70fb105) ） * 合并請求550來自Missist Leone/Patch-1（ [commit 6f70b6a](https://github.com/scrapy/scrapy/commit/6f70b6a) ） * 修改scrappy-ubuntu包的版本（ [commit 725900d](https://github.com/scrapy/scrapy/commit/725900d) ） * 確定0.22.0發布日期（ [commit af0219a](https://github.com/scrapy/scrapy/commit/af0219a) ） * 修復news.rst中的拼寫錯誤并刪除（尚未發布）標題（ [commit b7f58f4](https://github.com/scrapy/scrapy/commit/b7f58f4) ） ## Scrapy 0.22.0（2014-01-17發布） ### 增強功能 * [**向后不兼容**]將httpcachemiddleware后端切換到文件系統（ [issue 541](https://github.com/scrapy/scrapy/issues/541) ）還原舊的后端集 `HTTPCACHE_STORAGE` 到 `scrapy.contrib.httpcache.DbmCacheStorage` * 使用connect方法的代理服務器https://urls（ [issue 392](https://github.com/scrapy/scrapy/issues/392) ， [issue 397](https://github.com/scrapy/scrapy/issues/397) ） * 添加一個中間件來對由Google定義的Ajax可爬行頁面進行爬行。（ [issue 343](https://github.com/scrapy/scrapy/issues/343) ） * 將scrapy.spider.basespider重命名為scrapy.spider.spider（ [issue 510](https://github.com/scrapy/scrapy/issues/510) ， [issue 519](https://github.com/scrapy/scrapy/issues/519) ） * 選擇器默認注冊exslt命名空間（ [issue 472](https://github.com/scrapy/scrapy/issues/472) ） * 統一與選擇器重命名類似的項加載器（ [issue 461](https://github.com/scrapy/scrapy/issues/461) ） * 制作 `RFPDupeFilter` 類容易子類化（ [issue 533](https://github.com/scrapy/scrapy/issues/533) ） * 提高測試覆蓋率和即將推出的python 3支持（ [issue 525](https://github.com/scrapy/scrapy/issues/525) ） * 將設置和中間件的啟動信息提升到信息級別（ [issue 520](https://github.com/scrapy/scrapy/issues/520) ） * 支持部分 `get_func_args` 烏蒂爾 [issue 506](https://github.com/scrapy/scrapy/issues/506) ，問題：“504” * 允許通過tox運行獨立測試（ [issue 503](https://github.com/scrapy/scrapy/issues/503) ） * 鏈接提取程序忽略了更新擴展（ [issue 498](https://github.com/scrapy/scrapy/issues/498) ） * 添加中間件方法以獲取文件/圖像/拇指路徑（ [issue 490](https://github.com/scrapy/scrapy/issues/490) ） * 改進非現場中間件測試（ [issue 478](https://github.com/scrapy/scrapy/issues/478) ） * 添加一種跳過由refermiddleware設置的默認referer頭的方法（ [issue 475](https://github.com/scrapy/scrapy/issues/475) ） * 請勿發送 `x-gzip` 默認情況下 `Accept-Encoding` 報頭（ [issue 469](https://github.com/scrapy/scrapy/issues/469) ） * 支持使用設置定義HTTP錯誤處理（ [issue 466](https://github.com/scrapy/scrapy/issues/466) ） * 使用現代的python習慣用法，無論你在哪里找到遺產（ [issue 497](https://github.com/scrapy/scrapy/issues/497) ） * 改進和更正文檔（ [issue 527](https://github.com/scrapy/scrapy/issues/527) ， [issue 524](https://github.com/scrapy/scrapy/issues/524) ， [issue 521](https://github.com/scrapy/scrapy/issues/521) ， [issue 517](https://github.com/scrapy/scrapy/issues/517) ， [issue 512](https://github.com/scrapy/scrapy/issues/512) ， [issue 505](https://github.com/scrapy/scrapy/issues/505) ， [issue 502](https://github.com/scrapy/scrapy/issues/502) ， [issue 489](https://github.com/scrapy/scrapy/issues/489) ， [issue 465](https://github.com/scrapy/scrapy/issues/465) ， [issue 460](https://github.com/scrapy/scrapy/issues/460) ， [issue 425](https://github.com/scrapy/scrapy/issues/425) ， [issue 536](https://github.com/scrapy/scrapy/issues/536) ） ### 修正 * 更新Crawlspiper模板中的選擇器類導入（ [issue 484](https://github.com/scrapy/scrapy/issues/484) ） * 修復不存在的引用 `engine.slots` （ [issue 464](https://github.com/scrapy/scrapy/issues/464) ） * 不要調用 `body_as_unicode()` 在非文本響應實例上（ [issue 462](https://github.com/scrapy/scrapy/issues/462) ） * 在XpathitemLoader子類化時發出警告，以前它只在實例化時發出警告。（ [issue 523](https://github.com/scrapy/scrapy/issues/523) ） * 在XpathSelector子類化時發出警告，以前它只在實例化時發出警告。（ [issue 537](https://github.com/scrapy/scrapy/issues/537) ） * 對內存狀態的多個修復（ [issue 531](https://github.com/scrapy/scrapy/issues/531) ， [issue 530](https://github.com/scrapy/scrapy/issues/530) ， [issue 529](https://github.com/scrapy/scrapy/issues/529) ） * 修復中的重寫URL `FormRequest.from_response()` （ [issue 507](https://github.com/scrapy/scrapy/issues/507) ） * 在PIP 1.5下修復測試運行程序（ [issue 513](https://github.com/scrapy/scrapy/issues/513) ） * 修復spider名稱為unicode時的日志記錄錯誤（ [issue 479](https://github.com/scrapy/scrapy/issues/479) ） ## Scrapy 0.20.2（2013-12-09發布） * 使用選擇器更改更新Crawlspiper模板（ [commit 6d1457d](https://github.com/scrapy/scrapy/commit/6d1457d) ） * 在教程中修復方法名。關閉GH-480（GH-480） [commit b4fc359](https://github.com/scrapy/scrapy/commit/b4fc359) ## Scrapy 0.20.1（2013-11-28發布） * 包含u軟件包u從發布的源代碼構建車輪需要數據（ [commit 5ba1ad5](https://github.com/scrapy/scrapy/commit/5ba1ad5) ） * 進程并行正在泄漏內部延遲的故障。關閉α458 [commit 419a780](https://github.com/scrapy/scrapy/commit/419a780) ） ## Scrapy 0.20.0（2013-11-08發布） ### 增強功能 * 新選擇器的API，包括CSS選擇器（ [issue 395](https://github.com/scrapy/scrapy/issues/395) 和 [issue 426](https://github.com/scrapy/scrapy/issues/426) ） * 請求/響應URL/主體屬性現在是不可變的（修改它們已經被棄用了很長時間） * [`ITEM_PIPELINES`](topics/settings.html#std:setting-ITEM_PIPELINES) 現在定義為dict（而不是列表） * SitemapSpider可以獲取備用URL（ [issue 360](https://github.com/scrapy/scrapy/issues/360) ） * `Selector.remove_namespaces()` 現在從元素的屬性中移除名稱空間。（ [issue 416](https://github.com/scrapy/scrapy/issues/416) ） * 為python 3.3鋪平道路+（ [issue 435](https://github.com/scrapy/scrapy/issues/435) ， [issue 436](https://github.com/scrapy/scrapy/issues/436) ， [issue 431](https://github.com/scrapy/scrapy/issues/431) ， [issue 452](https://github.com/scrapy/scrapy/issues/452) ） * 使用具有嵌套支持的本機python類型的新項導出器（ [issue 366](https://github.com/scrapy/scrapy/issues/366) ） * 調整http1.1池大小，使其與設置定義的并發性匹配（ [commit b43b5f575](https://github.com/scrapy/scrapy/commit/b43b5f575) ） * scrappy.mail.mailsender現在可以通過tls連接或使用starttls升級（ [issue 327](https://github.com/scrapy/scrapy/issues/327) ） * 從ImageSpipeline中分解出功能的新文件管道（ [issue 370](https://github.com/scrapy/scrapy/issues/370) ， [issue 409](https://github.com/scrapy/scrapy/issues/409) ） * 建議用枕頭代替PIL來處理圖像（ [issue 317](https://github.com/scrapy/scrapy/issues/317) ） * 為Ubuntu Quantal和Raring添加Debian軟件包（ [commit 86230c0](https://github.com/scrapy/scrapy/commit/86230c0) ） * 模擬服務器（用于測試）可以偵聽HTTPS請求（ [issue 410](https://github.com/scrapy/scrapy/issues/410) ） * 從多個核心組件上拆下多個十字軸支架（ [issue 422](https://github.com/scrapy/scrapy/issues/422) ， [issue 421](https://github.com/scrapy/scrapy/issues/421) ， [issue 420](https://github.com/scrapy/scrapy/issues/420) ， [issue 419](https://github.com/scrapy/scrapy/issues/419) ， [issue 423](https://github.com/scrapy/scrapy/issues/423) ， [issue 418](https://github.com/scrapy/scrapy/issues/418) ） * Travis CI現在根據開發版本測試 Scrapy 更改 `w3lib` 和 `queuelib` python包。 * 將PYPY 2.1添加到持續集成測試中（ [commit ecfa7431](https://github.com/scrapy/scrapy/commit/ecfa7431) ） * pylinted、pep8并從源中刪除了舊樣式異常（ [issue 430](https://github.com/scrapy/scrapy/issues/430) ， [issue 432](https://github.com/scrapy/scrapy/issues/432) ） * 將importlib用于參數導入（ [issue 445](https://github.com/scrapy/scrapy/issues/445) ） * 處理python 2.7.5中引入的影響xmlItemExporter的回歸（ [issue 372](https://github.com/scrapy/scrapy/issues/372) ） * 修正了SIGINT上的爬行關閉（ [issue 450](https://github.com/scrapy/scrapy/issues/450) ） * 不提交 `reset` 在FormRequest.From響應中鍵入輸入（ [commit b326b87](https://github.com/scrapy/scrapy/commit/b326b87) ） * 當請求errback引發異常時，不要消除下載錯誤（ [commit 684cfc0](https://github.com/scrapy/scrapy/commit/684cfc0) ） ### 錯誤修正 * 在Django 1.6下修復測試（ [commit b6bed44c](https://github.com/scrapy/scrapy/commit/b6bed44c) ） * 使用HTTP1.1下載處理程序在斷開連接的情況下重試中間件的許多錯誤修復 * 修復扭曲釋放之間的不一致（ [issue 406](https://github.com/scrapy/scrapy/issues/406) ） * 修復廢殼蟲（ [issue 418](https://github.com/scrapy/scrapy/issues/418) ， [issue 407](https://github.com/scrapy/scrapy/issues/407) ） * 修復setup.py中的無效變量名（ [issue 429](https://github.com/scrapy/scrapy/issues/429) ） * 修復教程引用（ [issue 387](https://github.com/scrapy/scrapy/issues/387) ） * 改進請求響應文檔（ [issue 391](https://github.com/scrapy/scrapy/issues/391) ） * 改進最佳實踐文檔（ [issue 399](https://github.com/scrapy/scrapy/issues/399) ， [issue 400](https://github.com/scrapy/scrapy/issues/400) ， [issue 401](https://github.com/scrapy/scrapy/issues/401) ， [issue 402](https://github.com/scrapy/scrapy/issues/402) ） * 改進Django集成文檔（ [issue 404](https://github.com/scrapy/scrapy/issues/404) ） * 文件 `bindaddress` 請求元 [commit 37c24e01d7](https://github.com/scrapy/scrapy/commit/37c24e01d7) ） * 改進 `Request` 類文檔（ [issue 226](https://github.com/scrapy/scrapy/issues/226) ） ### 其他 * 丟棄的python 2.6支持（ [issue 448](https://github.com/scrapy/scrapy/issues/448) ） * 添加 [cssselect](https://github.com/SimonSapin/cssselect) python包作為安裝依賴項 * 刪除libxml2和多選擇器的后端支持， [lxml](http://lxml.de/) 從現在開始是必需的。 * 最小扭曲版本增加到10.0.0，下降扭曲8.0支持。 * 現在運行測試套件需要 `mock` python庫（ [issue 390](https://github.com/scrapy/scrapy/issues/390) ） ### 謝謝感謝所有為這次發布做出貢獻的人！按提交次數排序的參與者列表： ```py 69 Daniel Gra?a <dangra@...> 37 Pablo Hoffman <pablo@...> 13 Mikhail Korobov <kmike84@...> 9 Alex Cepoi <alex.cepoi@...> 9 alexanderlukanin13 <alexander.lukanin.13@...> 8 Rolando Espinoza La fuente <darkrho@...> 8 Lukasz Biedrycki <lukasz.biedrycki@...> 6 Nicolas Ramirez <nramirez.uy@...> 3 Paul Tremberth <paul.tremberth@...> 2 Martin Olveyra <molveyra@...> 2 Stefan <misc@...> 2 Rolando Espinoza <darkrho@...> 2 Loren Davie <loren@...> 2 irgmedeiros <irgmedeiros@...> 1 Stefan Koch <taikano@...> 1 Stefan <cct@...> 1 scraperdragon <dragon@...> 1 Kumara Tharmalingam <ktharmal@...> 1 Francesco Piccinno <stack.box@...> 1 Marcos Campal <duendex@...> 1 Dragon Dave <dragon@...> 1 Capi Etheriel <barraponto@...> 1 cacovsky <amarquesferraz@...> 1 Berend Iwema <berend@...> ``` ## Scrapy 0.18.4（2013-10-10發布） * IPython拒絕更新命名空間。FixY 396 [commit 3d32c4f](https://github.com/scrapy/scrapy/commit/3d32c4f) ） * 修復alreadycallederror替換shell命令中的請求。關閉α407 [commit b1d8919](https://github.com/scrapy/scrapy/commit/b1d8919) ） * 修復啟動請求延遲和提前掛起（ [commit 89faf52](https://github.com/scrapy/scrapy/commit/89faf52) ） ## Scrapy 0.18.3（2013-10-03發布） * 修復對啟動請求的延遲評估的回歸（ [commit 12693a5](https://github.com/scrapy/scrapy/commit/12693a5) ） * 表單：不提交重置輸入（ [commit e429f63](https://github.com/scrapy/scrapy/commit/e429f63) ） * 增加UnitTest超時以減少Travis假陽性故障（ [commit 912202e](https://github.com/scrapy/scrapy/commit/912202e) ） * json導出器的后臺主修復程序（ [commit cfc2d46](https://github.com/scrapy/scrapy/commit/cfc2d46) ） * 在生成sdist tarball之前，修復權限并設置umask（ [commit 06149e0](https://github.com/scrapy/scrapy/commit/06149e0) ） ## Scrapy 0.18.2（2013-09-03發布） * 后端 `scrapy check` 命令修復和向后兼容的多爬蟲進程（ [issue 339](https://github.com/scrapy/scrapy/issues/339) ） ## Scrapy 0.18.1（2013-08-27發布） * 刪除由cherry-picked更改添加的額外導入（ [commit d20304e](https://github.com/scrapy/scrapy/commit/d20304e) ） * 在扭曲pre 11.0.0下修復爬行測試（ [commit 1994f38](https://github.com/scrapy/scrapy/commit/1994f38) ） * PY26不能格式化零長度字段（ [commit abf756f](https://github.com/scrapy/scrapy/commit/abf756f) ） * 測試未綁定響應的潛在數據丟失錯誤（ [commit b15470d](https://github.com/scrapy/scrapy/commit/b15470d) ） * 將沒有內容長度或傳輸編碼的響應視為良好響應（ [commit c4bf324](https://github.com/scrapy/scrapy/commit/c4bf324) ） * 如果未啟用http11處理程序，則不包括responsefailed（ [commit 6cbe684](https://github.com/scrapy/scrapy/commit/6cbe684) ） * 新的HTTP客戶端將連接丟失包裝為responsefailed異常。FixY 373 [commit 1a20bba](https://github.com/scrapy/scrapy/commit/1a20bba) ） * 限制Travis CI構建矩陣（ [commit 3b01bb8](https://github.com/scrapy/scrapy/commit/3b01bb8) ） * 合并請求375來自Peterarenot/Patch-1（ [commit fa766d7](https://github.com/scrapy/scrapy/commit/fa766d7) ） * 已修復，因此它引用了正確的文件夾（ [commit 3283809](https://github.com/scrapy/scrapy/commit/3283809) ） * 添加Quantal和Raring以支持Ubuntu版本（ [commit 1411923](https://github.com/scrapy/scrapy/commit/1411923) ） * 修復在升級到http1客戶端后沒有重試某些連接錯誤的重試中間件，關閉GH-373（ [commit bb35ed0](https://github.com/scrapy/scrapy/commit/bb35ed0) ） * 在python 2.7.4和2.7.5中修復xmlItemExporter（ [commit de3e451](https://github.com/scrapy/scrapy/commit/de3e451) ） * 0.18發行說明的小更新（ [commit c45e5f1](https://github.com/scrapy/scrapy/commit/c45e5f1) ） * 修復控件列表格式（ [commit 0b60031](https://github.com/scrapy/scrapy/commit/0b60031) ） ## Scrapy 0.18.0（2013-08-09發布） * 使用tox對testsuite運行進行了很多改進，包括在pypi上進行測試的方法 * 處理Ajax可爬行URL的get參數（ [commit 3fe2a32](https://github.com/scrapy/scrapy/commit/3fe2a32) ） * 使用lxml recover選項分析站點地圖（ [issue 347](https://github.com/scrapy/scrapy/issues/347) ） * 錯誤修復cookie按主機名而不是按netloc合并（ [issue 352](https://github.com/scrapy/scrapy/issues/352) ） * 支持禁用 `HttpCompressionMiddleware` 使用標志設置（ [issue 359](https://github.com/scrapy/scrapy/issues/359) ） * 使用支持XML命名空間 `iternodes` 語法分析器 `XMLFeedSpider` （ [issue 12](https://github.com/scrapy/scrapy/issues/12) ） * 支持 `dont_cache` 請求元標志（ [issue 19](https://github.com/scrapy/scrapy/issues/19) ） * 修正錯誤 `scrapy.utils.gz.gunzip` 被python 2.7.4中的更改打斷（ [commit 4dc76e](https://github.com/scrapy/scrapy/commit/4dc76e) ） * 錯誤修復上的URL編碼 `SgmlLinkExtractor` （ [issue 24](https://github.com/scrapy/scrapy/issues/24) ） * 修正錯誤 `TakeFirst` 處理器不應丟棄零（0）值（ [issue 59](https://github.com/scrapy/scrapy/issues/59) ） * 支持XML導出器中的嵌套項（ [issue 66](https://github.com/scrapy/scrapy/issues/66) ） * 提高cookie處理性能（ [issue 77](https://github.com/scrapy/scrapy/issues/77) ） * 記錄重復篩選的請求一次（ [issue 105](https://github.com/scrapy/scrapy/issues/105) ） * 將重定向中間件拆分為狀態中間件和基于元的中間件（ [issue 78](https://github.com/scrapy/scrapy/issues/78) ） * 使用http1.1作為默認的下載程序處理程序（ [issue 109](https://github.com/scrapy/scrapy/issues/109) 和 [issue 318](https://github.com/scrapy/scrapy/issues/318) ） * 支持上的XPath表單選擇 `FormRequest.from_response` （ [issue 185](https://github.com/scrapy/scrapy/issues/185) ） * 修正上的Unicode解碼錯誤 `SgmlLinkExtractor` （ [issue 199](https://github.com/scrapy/scrapy/issues/199) ） * Pypi解釋器上的錯誤修復信號調度（ [issue 205](https://github.com/scrapy/scrapy/issues/205) ） * 改進請求延遲和并發處理（ [issue 206](https://github.com/scrapy/scrapy/issues/206) ） * 將rfc2616緩存策略添加到 `HttpCacheMiddleware` （ [issue 212](https://github.com/scrapy/scrapy/issues/212) ） * 允許自定義引擎記錄的消息（ [issue 214](https://github.com/scrapy/scrapy/issues/214) ） * 多方面的改進 `DjangoItem` （ [issue 217](https://github.com/scrapy/scrapy/issues/217) ， [issue 218](https://github.com/scrapy/scrapy/issues/218) ， [issue 221](https://github.com/scrapy/scrapy/issues/221) ） * 使用SETUPTOOLS入口點擴展廢料命令（ [issue 260](https://github.com/scrapy/scrapy/issues/260) ） * 允許 Spider `allowed_domains` 要設置的值/元組（ [issue 261](https://github.com/scrapy/scrapy/issues/261) ） * 支持 `settings.getdict` （ [issue 269](https://github.com/scrapy/scrapy/issues/269) ） * 簡化內部 `scrapy.core.scraper` 插槽處理 [issue 271](https://github.com/scrapy/scrapy/issues/271) ） * 補充 `Item.copy` （ [issue 290](https://github.com/scrapy/scrapy/issues/290) ） * 收集空閑下載器插槽（ [issue 297](https://github.com/scrapy/scrapy/issues/297) ） * 添加 `ftp://` 方案下載程序處理程序（ [issue 329](https://github.com/scrapy/scrapy/issues/329) ） * 添加了Downloader Benchmark Web服務器和Spider工具 [Benchmarking](topics/benchmarking.html#benchmarking) * 已將持久（磁盤上）隊列移動到單獨的項目（queuelib_u），而該項目現在依賴于 * 使用外部庫添加 Scrapy 命令（ [issue 260](https://github.com/scrapy/scrapy/issues/260) ） * 補充 `--pdb` 選擇權 `scrapy` 命令行工具 * 補充 `XPathSelector.remove_namespaces()` 它允許從XML文檔中刪除所有名稱空間以方便（使用不含名稱空間的xpaths）。記錄在 [選擇器](topics/selectors.html#topics-selectors) . * Spider 合約的幾個改進 * 名為metarefreshmiddldeware的新默認中間件，用于處理meta refresh html標記重定向， * MetaRefreshMiddlDeware和RedirectMiddleware有不同的優先級來解決62 * 從爬蟲方法添加到 Spider * 使用模擬服務器添加系統測試 * Mac OS兼容性的更多改進（感謝Alex Cepoi） * 多個單件清潔和多 Spider 支持（感謝Nicolas Ramirez） * 支持自定義下載插槽 * 在“shell”命令中添加了--spider選項。 * 當Scrapy啟動時記錄覆蓋的設置感謝所有為這次發布做出貢獻的人。以下是按提交次數排序的參與者列表： ```py 130 Pablo Hoffman <pablo@...> 97 Daniel Gra?a <dangra@...> 20 Nicolás Ramírez <nramirez.uy@...> 13 Mikhail Korobov <kmike84@...> 12 Pedro Faustino <pedrobandim@...> 11 Steven Almeroth <sroth77@...> 5 Rolando Espinoza La fuente <darkrho@...> 4 Michal Danilak <mimino.coder@...> 4 Alex Cepoi <alex.cepoi@...> 4 Alexandr N Zamaraev (aka tonal) <tonal@...> 3 paul <paul.tremberth@...> 3 Martin Olveyra <molveyra@...> 3 Jordi Llonch <llonchj@...> 3 arijitchakraborty <myself.arijit@...> 2 Shane Evans <shane.evans@...> 2 joehillen <joehillen@...> 2 Hart <HartSimha@...> 2 Dan <ellisd23@...> 1 Zuhao Wan <wanzuhao@...> 1 whodatninja <blake@...> 1 vkrest <v.krestiannykov@...> 1 tpeng <pengtaoo@...> 1 Tom Mortimer-Jones <tom@...> 1 Rocio Aramberri <roschegel@...> 1 Pedro <pedro@...> 1 notsobad <wangxiaohugg@...> 1 Natan L <kuyanatan.nlao@...> 1 Mark Grey <mark.grey@...> 1 Luan <luanpab@...> 1 Libor Nenadál <libor.nenadal@...> 1 Juan M Uys <opyate@...> 1 Jonas Brunsgaard <jonas.brunsgaard@...> 1 Ilya Baryshev <baryshev@...> 1 Hasnain Lakhani <m.hasnain.lakhani@...> 1 Emanuel Schorsch <emschorsch@...> 1 Chris Tilden <chris.tilden@...> 1 Capi Etheriel <barraponto@...> 1 cacovsky <amarquesferraz@...> 1 Berend Iwema <berend@...> ``` ## Scrapy 0.16.5（2013-05-30發布） * 當Scrapy Deploy重定向到新的端點時，遵守請求方法（ [commit 8c4fcee](https://github.com/scrapy/scrapy/commit/8c4fcee) ） * 修復不準確的下載器中間件文檔。參考文獻280 [commit 40667cb](https://github.com/scrapy/scrapy/commit/40667cb) ） * 文檔：刪除diveintopython.org的鏈接，該鏈接不再可用。關閉α246 [commit bd58bfa](https://github.com/scrapy/scrapy/commit/bd58bfa) ） * 在無效的HTML5文檔中查找表單節點（ [commit e3d6945](https://github.com/scrapy/scrapy/commit/e3d6945) ） * 修正了錯誤的標簽屬性類型bool而不是list（ [commit a274276](https://github.com/scrapy/scrapy/commit/a274276) ） ## Scrapy 0.16.4（2013-01-23發布） * 修復文檔中的拼寫錯誤（ [commit 6d2b3aa](https://github.com/scrapy/scrapy/commit/6d2b3aa) ） * 添加關于禁用擴展的文檔。參考文獻132 [commit c90de33](https://github.com/scrapy/scrapy/commit/c90de33) ） * 已修復錯誤消息格式。log.err（）不支持酷格式，出現錯誤時，消息為：“錯誤：錯誤處理%（item）s”（ [commit c16150c](https://github.com/scrapy/scrapy/commit/c16150c) ） * 整理和改進圖像管道錯誤記錄（ [commit 56b45fc](https://github.com/scrapy/scrapy/commit/56b45fc) ） * 固定文檔錯誤（ [commit 243be84](https://github.com/scrapy/scrapy/commit/243be84) ） * 添加文檔主題：廣泛的爬行和常見實踐（ [commit 1fbb715](https://github.com/scrapy/scrapy/commit/1fbb715) ） * 當沒有顯式指定spider時，修復scrapy parse命令中的錯誤。關閉α209 [commit c72e682](https://github.com/scrapy/scrapy/commit/c72e682) ） * 更新docs/topics/commands.rst（ [commit 28eac7a](https://github.com/scrapy/scrapy/commit/28eac7a) ） ## Scrapy 0.16.3（2012-12-07發布） * 在使用下載延遲時刪除并發限制，并仍然確保強制執行請求間延遲（ [commit 487b9b5](https://github.com/scrapy/scrapy/commit/487b9b5) ） * 當圖像管道失敗時添加錯誤詳細信息（ [commit 8232569](https://github.com/scrapy/scrapy/commit/8232569) ） * 改善Mac OS兼容性（ [commit 8dcf8aa](https://github.com/scrapy/scrapy/commit/8dcf8aa) ） * setup.py:使用readme.rst填充long_描述（ [commit 7b5310d](https://github.com/scrapy/scrapy/commit/7b5310d) ） * 文檔：刪除了對ClientForm的過時引用（ [commit 80f9bb6](https://github.com/scrapy/scrapy/commit/80f9bb6) ） * 為默認存儲后端更正文檔（ [commit 2aa491b](https://github.com/scrapy/scrapy/commit/2aa491b) ） * 文檔：從常見問題解答中刪除了斷開的proxyhub鏈接（ [commit bdf61c4](https://github.com/scrapy/scrapy/commit/bdf61c4) ） * SpiderOpenCloseLogging示例中的固定文檔拼寫錯誤（ [commit 7184094](https://github.com/scrapy/scrapy/commit/7184094) ） ## Scrapy 0.16.2（2012-11-09發布） * 廢料合同：python2.6兼容（ [commit a4a9199](https://github.com/scrapy/scrapy/commit/a4a9199) ） * 殘缺合同詳細選項（ [commit ec41673](https://github.com/scrapy/scrapy/commit/ec41673) ） * 適當的單元測試，如殘缺合同的輸出（ [commit 86635e4](https://github.com/scrapy/scrapy/commit/86635e4) ） * 在調試文檔中添加了“在瀏覽器中打開”（ [commit c9b690d](https://github.com/scrapy/scrapy/commit/c9b690d) ） * 已從設置文檔中刪除對全局 Scrapy 狀態的引用（ [commit dd55067](https://github.com/scrapy/scrapy/commit/dd55067) ） * 修復Windows平臺中的spiderstate錯誤（ [commit 58998f4](https://github.com/scrapy/scrapy/commit/58998f4) ） ## Scrapy 0.16.1（2012-10-26發布） * 修復了logstats擴展，它在0.16版本之前的錯誤合并后被破壞。（ [commit 8c780fd](https://github.com/scrapy/scrapy/commit/8c780fd) ） * 更好地向后兼容scrapy.conf.settings（ [commit 3403089](https://github.com/scrapy/scrapy/commit/3403089) ） * 有關如何從擴展訪問爬蟲統計信息的擴展文檔（ [commit c4da0b5](https://github.com/scrapy/scrapy/commit/c4da0b5) ） * 刪除了.hgtags（現在scriby使用git就不再需要了）（ [commit d52c188](https://github.com/scrapy/scrapy/commit/d52c188) ） * 固定RST標題下的破折號（ [commit fa4f7f9](https://github.com/scrapy/scrapy/commit/fa4f7f9) ） * 在新聞中設置0.16.0的發布日期（ [commit e292246](https://github.com/scrapy/scrapy/commit/e292246) ） ## Scrapy 0.16.0（2012-10-18發布） Scrapy 變化： * 補充 [Spider 合約](topics/contracts.html#topics-contracts) 以正式/可復制的方式測試 Spider 的機制。 * 增加選項 `-o` 和 `-t` 到 [`runspider`](topics/commands.html#std:command-runspider) 命令 * 文件化的 [AutoThrottle 擴展](topics/autothrottle.html) 并添加到默認安裝的擴展。您仍然需要啟用它 [`AUTOTHROTTLE_ENABLED`](topics/autothrottle.html#std:setting-AUTOTHROTTLE_ENABLED) * 主要統計數據收集重構：刪除全局/每個 Spider 統計數據的分離，刪除與統計數據相關的信號（ `stats_spider_opened` 等）。統計信息現在要簡單得多，在統計信息收集器API和信號上保持向后兼容性。 * 補充 `process_start_requests()` Spider 中間商的方法 * 信號單件丟失。現在應該通過crawler.signals屬性接受信號。有關更多信息，請參閱信號文檔。 * 信號單件丟失。現在應該通過crawler.signals屬性接受信號。有關更多信息，請參閱信號文檔。 * 刪除了統計收集器singleton。現在可以通過crawler.stats屬性訪問狀態。有關詳細信息，請參閱統計信息收集文檔。 * 文件化的 [核心API](topics/api.html#topics-api) * `lxml` is now the default selectors backend instead of `libxml2` * 將formRequest.from_response（）移植到 [lxml](http://lxml.de/) 而不是 [ClientForm](http://wwwsearch.sourceforge.net/old/ClientForm/) * 刪除的模塊： `scrapy.xlib.BeautifulSoup` 和 `scrapy.xlib.ClientForm` * SiteMapSpider:添加了對以.xml和.xml.gz結尾的站點地圖URL的支持，即使它們公布了錯誤的內容類型（ [commit 10ed28b](https://github.com/scrapy/scrapy/commit/10ed28b) ） * stacktracedump擴展：同時轉儲trackref活動引用（ [commit fe2ce93](https://github.com/scrapy/scrapy/commit/fe2ce93) ） * 現在JSON和JSONLINES導出器完全支持嵌套項 * 補充 [`cookiejar`](topics/downloader-middleware.html#std:reqmeta-cookiejar) 請求meta-key以支持每個spider的多個cookie會話 * 去耦編碼檢測碼 [w3lib.encoding](https://github.com/scrapy/w3lib/blob/master/w3lib/encoding.py) 并移植了一些垃圾代碼以使用該模塊 * 放棄了對python 2.5的支持。見https://blog.scrapinghub.com/2012/02/27/scrapy-0-15-dropping-support-for-python-2-5/ * 扭曲2.5的下降支架 * 補充 [`REFERER_ENABLED`](topics/spider-middleware.html#std:setting-REFERER_ENABLED) 設置，控制引用中間件 * 已將默認用戶代理更改為： `Scrapy/VERSION (+http://scrapy.org)` * 已刪除（未記錄） `HTMLImageLinkExtractor` 類從 `scrapy.contrib.linkextractors.image` * 根據 Spider 設置刪除（替換為實例化多個爬行器對象） * `USER_AGENT` Spider 屬性將不再工作，請使用 `user_agent` 改為屬性 * `DOWNLOAD_TIMEOUT` Spider 屬性將不再工作，請使用 `download_timeout` 改為屬性 * 遠離的 `ENCODING_ALIASES` 設置，因為編碼自動檢測已移動到 [w3lib](https://github.com/scrapy/w3lib) 類庫 * 促進 [DjangoItem](topics/djangoitem.html#topics-djangoitem) 對主控 * logformatter方法現在返回dict（而不是字符串）以支持惰性格式。（ [issue 164](https://github.com/scrapy/scrapy/issues/164) ， [commit dcef7b0](https://github.com/scrapy/scrapy/commit/dcef7b0) ） * 下載程序處理程序（ [`DOWNLOAD_HANDLERS`](topics/settings.html#std:setting-DOWNLOAD_HANDLERS) setting）現在接收設置作為構造函數的第一個參數 * 已將內存使用率替換為（更便攜） [resource](https://docs.python.org/2/library/resource.html) 移除模塊 `scrapy.utils.memory` 模塊 * 刪除信號： `scrapy.mail.mail_sent` * 遠離的 `TRACK_REFS` 設置，現在 [trackrefs](topics/leaks.html#topics-leaks-trackrefs) 始終啟用 * DBM現在是HTTP緩存中間件的默認存儲后端 * 日志消息的數量（每個級別）現在通過碎片統計（stat name: `log_count/LEVEL` ） * 接收到的響應數現在通過scrapy stats（stat name: `response_received_count` ） * 遠離的 `scrapy.log.started` 屬性 ## Scrapy 0.144 * 為支持的Ubuntu發行版增加了精確性（ [commit b7e46df](https://github.com/scrapy/scrapy/commit/b7e46df) ） * 修復了在https://groups.google.com/forum/中報告的json-rpc-webservice中的錯誤！主題/垃圾用戶/QGVBMFYBNAQ/討論。也從extras/scrapy-ws.py中刪除了不再支持的“run”命令（ [commit 340fbdb](https://github.com/scrapy/scrapy/commit/340fbdb) ） * 內容類型http equiv的元標記屬性可以是任意順序。（123） [commit 0cb68af](https://github.com/scrapy/scrapy/commit/0cb68af) ） * 將“導入圖像”替換為更標準的“從PIL導入圖像”。關閉α88 [commit 4d17048](https://github.com/scrapy/scrapy/commit/4d17048) ） * 將試用狀態返回為bin/runtests.sh exit值。（118） [commit b7b2e7f](https://github.com/scrapy/scrapy/commit/b7b2e7f) ） ## Scrapy 0.143 * 忘記包含PyDispatch許可證。（118） [commit fd85f9c](https://github.com/scrapy/scrapy/commit/fd85f9c) ） * 包括testsuite在源分發中使用的egg文件。（118） [commit c897793](https://github.com/scrapy/scrapy/commit/c897793) ） * 更新項目模板中的docstring以避免與genspider命令混淆，這可能被視為高級功能。參考文獻107 [commit 2548dcc](https://github.com/scrapy/scrapy/commit/2548dcc) ） * 在docs/topics/firebug.rst中添加了關于關閉google目錄的注釋（ [commit 668e352](https://github.com/scrapy/scrapy/commit/668e352) ） * 空的時候不要丟棄插槽，只需保存在另一個dict中，以便在需要時再次回收。（ [commit 8e9f607](https://github.com/scrapy/scrapy/commit/8e9f607) ） * 在支持libxml2的選擇器中處理unicode xpaths不會失敗（ [commit b830e95](https://github.com/scrapy/scrapy/commit/b830e95) ） * 修正了請求對象文檔中的小錯誤（ [commit bf3c9ee](https://github.com/scrapy/scrapy/commit/bf3c9ee) ） * 修復了鏈接提取器文檔中的次要缺陷（ [commit ba14f38](https://github.com/scrapy/scrapy/commit/ba14f38) ） * 刪除了一些與Scrapy中的sqlite支持相關的過時的剩余代碼（ [commit 0665175](https://github.com/scrapy/scrapy/commit/0665175) ） ## Scrapy 0.142 * 在計算校驗和之前，移動指向文件開頭的緩沖區。參考文獻92 [commit 6a5bef2](https://github.com/scrapy/scrapy/commit/6a5bef2) ） * 在保存圖像之前計算圖像校驗和。關閉α92 [commit 9817df1](https://github.com/scrapy/scrapy/commit/9817df1) ） * 刪除緩存失敗中的泄漏引用（ [commit 673a120](https://github.com/scrapy/scrapy/commit/673a120) ） * 修正了memoryusage擴展中的錯誤：get_engine_status（）只接受1個參數（給定0）（ [commit 11133e9](https://github.com/scrapy/scrapy/commit/11133e9) ） * 修復了HTTP壓縮中間件上的struct.error。關閉α87 [commit 1423140](https://github.com/scrapy/scrapy/commit/1423140) ） * Ajax爬網沒有擴展Unicode URL（ [commit 0de3fb4](https://github.com/scrapy/scrapy/commit/0de3fb4) ） * catch start_請求迭代器錯誤。參考文獻83 [commit 454a21d](https://github.com/scrapy/scrapy/commit/454a21d) ） * 加速libxml2 xpathselector（ [commit 2fbd662](https://github.com/scrapy/scrapy/commit/2fbd662) ） * 根據最近的更改更新版本文檔（ [commit 0a070f5](https://github.com/scrapy/scrapy/commit/0a070f5) ） * Scrapy ：固定文檔鏈接（ [commit 2b4e4c3](https://github.com/scrapy/scrapy/commit/2b4e4c3) ） * extras/makedeb.py:不再從git獲取版本（ [commit caffe0e](https://github.com/scrapy/scrapy/commit/caffe0e) ） ## Scrapy 0.141 * extras/makedeb.py:不再從git獲取版本（ [commit caffe0e](https://github.com/scrapy/scrapy/commit/caffe0e) ） * 緩沖版本為0.14.1（ [commit 6cb9e1c](https://github.com/scrapy/scrapy/commit/6cb9e1c) ） * 修復了對教程目錄的引用（ [commit 4b86bd6](https://github.com/scrapy/scrapy/commit/4b86bd6) ） * 文檔：從request.replace（）中刪除了重復的回調參數（ [commit 1aeccdd](https://github.com/scrapy/scrapy/commit/1aeccdd) ） * 固定 Scrapy 單格式（ [commit 8bf19e6](https://github.com/scrapy/scrapy/commit/8bf19e6) ） * 為所有正在運行的線程轉儲堆棧并修復StackTraceDump擴展轉儲的引擎狀態（ [commit 14a8e6e](https://github.com/scrapy/scrapy/commit/14a8e6e) ） * 添加了關于為什么我們在boto圖像上傳上禁用SSL的注釋（ [commit 5223575](https://github.com/scrapy/scrapy/commit/5223575) ） * 當與S3進行太多并行連接時，SSL握手掛起（ [commit 63d583d](https://github.com/scrapy/scrapy/commit/63d583d) ） * 更改教程以跟蹤dmoz網站上的更改（ [commit bcb3198](https://github.com/scrapy/scrapy/commit/bcb3198) ） * 避免在Twisted中出現斷開連接的deferred attributeerror異常>=11.1.0（ [commit 98f3f87](https://github.com/scrapy/scrapy/commit/98f3f87) ） * 允許spider設置autothrottle最大并發性（ [commit 175a4b5](https://github.com/scrapy/scrapy/commit/175a4b5) ） ## Scrapy 0.14 ### 新功能和設置 * 支持 [AJAX crawleable urls](https://developers.google.com/webmasters/ajax-crawling/docs/getting-started?csw=1) * 在磁盤上存儲請求的新的永久性計劃程序，允許掛起和恢復爬網（ [r2737](http://hg.scrapy.org/scrapy/changeset/2737) ） * 補充 `-o` 選擇權 `scrapy crawl` ，將刮掉的項目轉儲到文件（或使用 `-` ） * 添加了對將自定義設置傳遞到ScrapyD的支持 `schedule.json` 原料藥（API） [r2779](http://hg.scrapy.org/scrapy/changeset/2779) ， [r2783](http://hg.scrapy.org/scrapy/changeset/2783) ） * 新的 `ChunkedTransferMiddleware` （默認啟用）以支持 [chunked transfer encoding](https://en.wikipedia.org/wiki/Chunked_transfer_encoding) ([r2769](http://hg.scrapy.org/scrapy/changeset/2769)) * 添加對S3下載器處理程序的boto 2.0支持（ [r2763](http://hg.scrapy.org/scrapy/changeset/2763) ） * 補充 [marshal](https://docs.python.org/2/library/marshal.html) to formats supported by feed exports ([r2744](http://hg.scrapy.org/scrapy/changeset/2744)) * 在請求錯誤回復中，有問題的請求現在接收到 `failure.request` 屬性（屬性） [r2738](http://hg.scrapy.org/scrapy/changeset/2738) ） * ```py 大下載重構以支持每個域/IP并發限制（） ``` * ```py CONCURRENT_REQUESTS_PER_SPIDER 設置已被棄用，并替換為： ``` * [`CONCURRENT_REQUESTS`](topics/settings.html#std:setting-CONCURRENT_REQUESTS), [`CONCURRENT_REQUESTS_PER_DOMAIN`](topics/settings.html#std:setting-CONCURRENT_REQUESTS_PER_DOMAIN), [`CONCURRENT_REQUESTS_PER_IP`](topics/settings.html#std:setting-CONCURRENT_REQUESTS_PER_IP) * 查看文檔了解更多詳細信息 * 添加了內置緩存DNS解析程序（ [r2728](http://hg.scrapy.org/scrapy/changeset/2728) ） * 將與Amazon AWS相關的組件/擴展（sqs spider queue，simpledb stats collector）移動到單獨的項目：[scaws]（[https://github.com/scrapinghub/scaws](https://github.com/scrapinghub/scaws)）（ [r2706](http://hg.scrapy.org/scrapy/changeset/2706) ， [r2714](http://hg.scrapy.org/scrapy/changeset/2714) ） * 已將spider隊列移動到scrapyd: `scrapy.spiderqueue` > `scrapyd.spiderqueue` （ [r2708](http://hg.scrapy.org/scrapy/changeset/2708) ） * 已將sqlite utils移動到scrapyd: `scrapy.utils.sqlite` > `scrapyd.sqlite` （ [r2781](http://hg.scrapy.org/scrapy/changeset/2781) ） * 對返回迭代器的真正支持 `start_requests()` 方法。當 Spider 空閑時，迭代器現在在爬行過程中被消耗。（ [r2704](http://hg.scrapy.org/scrapy/changeset/2704) ） * 補充 [`REDIRECT_ENABLED`](topics/downloader-middleware.html#std:setting-REDIRECT_ENABLED) 快速啟用/禁用重定向中間件的設置（ [r2697](http://hg.scrapy.org/scrapy/changeset/2697) ） * 補充 [`RETRY_ENABLED`](topics/downloader-middleware.html#std:setting-RETRY_ENABLED) 設置為快速啟用/禁用重試中間件（ [r2694](http://hg.scrapy.org/scrapy/changeset/2694) ） * 補充 `CloseSpider` 手動關閉星形齒輪的例外情況（ [r2691](http://hg.scrapy.org/scrapy/changeset/2691) ） * 通過添加對HTML5元字符集聲明的支持來改進編碼檢測（ [r2690](http://hg.scrapy.org/scrapy/changeset/2690) ） * 重構CloseSpider行為，等待所有下載完成并由Spider處理，然后關閉Spider（ [r2688](http://hg.scrapy.org/scrapy/changeset/2688) ） * 補充 `SitemapSpider` （見Spiders頁面中的文檔）（ [r2658](http://hg.scrapy.org/scrapy/changeset/2658) ） * 補充 `LogStats` 用于定期記錄基本統計信息（如已爬網頁和已擦除項）的擴展（ [r2657](http://hg.scrapy.org/scrapy/changeset/2657) ） * 使gzipped響應的處理更加可靠（319， [r2643](http://hg.scrapy.org/scrapy/changeset/2643) ）現在，scrappy將嘗試盡可能多地從gzip響應中解壓縮，而不是使用 `IOError` . * 簡化！memoryDebugger擴展，用于轉儲內存調試信息（ [r2639](http://hg.scrapy.org/scrapy/changeset/2639) ） * 添加了編輯spider的新命令： `scrapy edit` （ [r2636](http://hg.scrapy.org/scrapy/changeset/2636) ） `-e` 旗到 `genspider` 使用它的命令（ [r2653](http://hg.scrapy.org/scrapy/changeset/2653) ） * 已將項目的默認表示形式更改為打印精美的dict。（ [r2631](http://hg.scrapy.org/scrapy/changeset/2631) ）這提高了默認日志記錄的可讀性，使日志在默認情況下，既可用于刮掉的行，也可用于丟棄的行。 * 補充 [`spider_error`](topics/signals.html#std:signal-spider_error) 信號（信號） [r2628](http://hg.scrapy.org/scrapy/changeset/2628) ） * 補充 [`COOKIES_ENABLED`](topics/downloader-middleware.html#std:setting-COOKIES_ENABLED) 設置（ [r2625](http://hg.scrapy.org/scrapy/changeset/2625) ） * 統計信息現在被轉儲到 Scrapy 日志（默認值為 [`STATS_DUMP`](topics/settings.html#std:setting-STATS_DUMP) 設置已更改為 `True` ）這是為了讓Scrapy用戶更加了解Scrapy統計和在那里收集的數據。 * 增加了對動態調整下載延遲和最大并發請求的支持（ [r2599](http://hg.scrapy.org/scrapy/changeset/2599) ） * 添加了新的DBM HTTP緩存存儲后端（ [r2576](http://hg.scrapy.org/scrapy/changeset/2576) ） * 補充 `listjobs.json` API到ScrapyDy（ [r2571](http://hg.scrapy.org/scrapy/changeset/2571) ） * `CsvItemExporter` ：增加 `join_multivalued` 參數（ [r2578](http://hg.scrapy.org/scrapy/changeset/2578) ） * 向添加了命名空間支持 `xmliter_lxml` （ [r2552](http://hg.scrapy.org/scrapy/changeset/2552) ） * 改進了cookies中間件 `COOKIES_DEBUG` 更好的記錄它（ [r2579](http://hg.scrapy.org/scrapy/changeset/2579) ） * 廢輪胎和連桿拆卸機的幾點改進 ### 重新排列和刪除代碼 * ```py 合并的項傳遞和項抓取概念，因為它們在過去常常被證明是混淆的。這意味著：） ``` * 原始項目刮傷信號被移除 * 原始項目通過信號被重命名為項目刮除 * 老原木 `Scraped Item...` 被移除 * 老原木 `Passed Item...` 已重命名為 `Scraped Item...` 并降級至 `DEBUG` 水平 * ```py 通過將部分廢棄代碼分為兩個新庫來減少廢棄代碼庫： ``` * [w3lib](https://github.com/scrapy/w3lib) （幾個函數來自 `scrapy.utils.{{http,markup,multipart,response,url}}` ，在做 [r2584](http://hg.scrapy.org/scrapy/changeset/2584) ） * [scrapely](https://github.com/scrapy/scrapely) 是 `scrapy.contrib.ibl` ，在做 [r2586](http://hg.scrapy.org/scrapy/changeset/2586) ） * 刪除了未使用的功能： `scrapy.utils.request.request_info()` （ [r2577](http://hg.scrapy.org/scrapy/changeset/2577) ） * 已從中刪除googledir項目 `examples/googledir` . 現在有一個新的示例項目叫做 `dirbot` GitHub上提供：[https://github.com/scrappy/dirbot](https://github.com/scrappy/dirbot) * 已刪除對 Scrapy 項目中默認字段值的支持（ [r2616](http://hg.scrapy.org/scrapy/changeset/2616) ） * 移除實驗爬行 Spider 2（ [r2632](http://hg.scrapy.org/scrapy/changeset/2632) ） * 刪除了調度程序中間件以簡化體系結構。重復過濾器現在在調度程序本身中完成，使用與以前相同的重復過濾類。（ `DUPEFILTER_CLASS` 設置） [r2640](http://hg.scrapy.org/scrapy/changeset/2640) ） * 已刪除對將URL傳遞到的支持 `scrapy crawl` 命令（使用） `scrapy parse` 取而代之的是） [r2704](http://hg.scrapy.org/scrapy/changeset/2704) ） * 已刪除不推薦使用的執行隊列（ [r2704](http://hg.scrapy.org/scrapy/changeset/2704) ） * 已刪除（未記錄）spider上下文擴展（來自scrapy.contrib.spiderContext）（ [r2780](http://hg.scrapy.org/scrapy/changeset/2780) ） * 遠離的 `CONCURRENT_SPIDERS` 設置（使用ScrapyD MaxProc代替）（ [r2789](http://hg.scrapy.org/scrapy/changeset/2789) ） * 核心組件的重命名屬性：downloader.sites->downloader.slots、scraper.sites->scraper.slots（ [r2717](http://hg.scrapy.org/scrapy/changeset/2717) ， [r2718](http://hg.scrapy.org/scrapy/changeset/2718) ） * 重命名設置 `CLOSESPIDER_ITEMPASSED` 到 [`CLOSESPIDER_ITEMCOUNT`](topics/extensions.html#std:setting-CLOSESPIDER_ITEMCOUNT) （ [r2655](http://hg.scrapy.org/scrapy/changeset/2655) ）保持向后兼容性。 ## Scrapy 0.12 舊問題追蹤器（trac）中的nnn參考票等數字不再可用。 ### 新功能和改進 * 傳遞的項現在發送到 `item` 論證 `item_passed` （273） * 向添加了詳細選項 `scrapy version` 命令，用于錯誤報告（298） * HTTP緩存現在默認存儲在項目數據目錄中（279） * 增加了項目數據存儲目錄（276，277） * Scrapy 項目的文檔結構（見命令行工具文檔） * xpath選擇器的新lxml后端（147） * 每個 Spider 設置（245） * 支持退出代碼，以在scrapy命令中發出錯誤信號（248） * 補充 `-c` 參數 `scrapy shell` 命令 * 制造 `libxml2` 可選擇的（第260） * 新的 `deploy` 命令（第261） * 補充 [`CLOSESPIDER_PAGECOUNT`](topics/extensions.html#std:setting-CLOSESPIDER_PAGECOUNT) 設置（α253） * 補充 [`CLOSESPIDER_ERRORCOUNT`](topics/extensions.html#std:setting-CLOSESPIDER_ERRORCOUNT) 設置（α254） ### 抓取變化 * ScrapyD現在每個 Spider 使用一個進程 * 它為每個 Spider 運行存儲一個日志文件，并將其旋轉以保持每個 Spider 最新的5個日志（默認情況下） * 添加了一個最小的Web UI，默認情況下可從http://localhost:6800獲得。 * 現在有一個 `scrapy server` 啟動當前項目的ScrapyD服務器的命令 ### 對設置的更改 * 補充 `HTTPCACHE_ENABLED` 設置（默認為false）以啟用HTTP緩存中間件 * 改變 `HTTPCACHE_EXPIRATION_SECS` 語義：現在零意味著“永不過期”。 ### 棄用/廢棄功能 * 已棄用 `runserver` 有利于…的命令 `server` 啟動ScrapyD服務器的命令。另請參見：ScrapyD更改 * 已棄用 `queue` 有利于使用ScrapyD的命令 `schedule.json` 應用程序編程接口。另請參見：ScrapyD更改 * 移除了！lxmlitemloader（從未升級到主控件的實驗控件） ## Scrapy 0.10 舊問題追蹤器（trac）中的nnn參考票等數字不再可用。 ### 新功能和改進 * 調用了新的Scrapy服務 `scrapyd` 用于在生產中部署 Scrapy 爬蟲（218）（提供文檔） * 簡化的圖像管道使用，現在無需對自己的圖像管道進行子類化（217） * Scrapy Shell現在默認顯示Scrapy日志（206） * 重構公共基本代碼中的執行隊列和稱為“spider隊列”的可插拔后端（220） * 新的持久 Spider 隊列（基于sqlite）（198），默認情況下可用，允許在服務器模式下啟動scrappy，然后安排 Spider 運行。 * 添加了scrapy命令行工具及其所有可用子命令的文檔。（提供文件） * 具有可插拔后端的飼料出口商（197）（提供文檔） * 延遲信號（193） * 向item pipeline open_spider（）添加了兩個新方法，使用延遲支持關閉_spider（）（195） * 支持覆蓋每個spider的默認請求頭（181） * 將默認的spider管理器替換為具有類似功能但不依賴于雙絞線插件的管理器（186） * 將Debian包拆分為兩個包-庫和服務（187） * Scrapy 原木重構（188） * 在不同的運行中保持持久的 Spider 上下文的新擴展（203） * 補充 `dont_redirect` 避免重定向的request.meta鍵（233） * 補充 `dont_retry` 用于避免重試的request.meta密鑰（234） ### 命令行工具更改 * 新的 `scrapy` 替換舊命令的命令 `scrapy-ctl.py` （199）-只有一個全局 `scrapy` 現在命令，而不是一個 `scrapy-ctl.py` 每個項目-已添加 `scrapy.bat` 用于從Windows更方便地運行的腳本 * 將bash完成添加到命令行工具（210） * 重命名命令 `start` 到 `runserver` （209） ### API更改 * `url` 和 `body` 請求對象的屬性現在是只讀的（230） * `Request.copy()` 和 `Request.replace()` 現在也復制他們的 `callback` 和 `errback` 屬性（231） * 遠離的 `UrlFilterMiddleware` 從 `scrapy.contrib` （默認情況下已禁用） * 非現場Middelware不會過濾掉來自沒有允許域屬性的spider的任何請求（225） * 刪除 Spider 管理器 `load()` 方法。現在spider被加載到構造函數本身中。 * ```py 對Scrapy Manager（現在稱為“crawler”）的更改： ``` * `scrapy.core.manager.ScrapyManager` class renamed to `scrapy.crawler.Crawler` * `scrapy.core.manager.scrapymanager` singleton moved to `scrapy.project.crawler` * 移動模塊： `scrapy.contrib.spidermanager` 到 `scrapy.spidermanager` * Spider 經理辛格爾頓從 `scrapy.spider.spiders` 到 [``](#id1)spiders` attribute of `` crapy.project.crawler``單件。 * ```py 已移動的統計信息收集器類：（204） ``` * `scrapy.stats.collector.StatsCollector` to `scrapy.statscol.StatsCollector` * `scrapy.stats.collector.SimpledbStatsCollector` to `scrapy.contrib.statscol.SimpledbStatsCollector` * 默認的每個命令設置現在在 `default_settings` 命令對象類的屬性（201） * ```py 已更改項管道的參數 process_item() 方法從 (spider, item) 到 (item, spider) ``` * 保持向后兼容性（帶有反預測警告） * ```py 感動 scrapy.core.signals 模塊到 scrapy.signals ``` * 保持向后兼容性（帶有反預測警告） * ```py 感動 scrapy.core.exceptions 模塊到 scrapy.exceptions ``` * 保持向后兼容性（帶有反預測警告） * 補充 `handles_request()` 類方法 `BaseSpider` * 下降 `scrapy.log.exc()` 功能（使用） `scrapy.log.err()` 相反） * 下降 `component` 的參數 `scrapy.log.msg()` 功能 * 下降 `scrapy.log.log_level` 屬性 * 補充 `from_settings()` 向Spider管理器和項目管道管理器提供類方法 ### 對設置的更改 * 補充 `HTTPCACHE_IGNORE_SCHEMES` 設置為忽略某些方案打開！httpcachemiddleware（225） * 補充 `SPIDER_QUEUE_CLASS` 定義要使用的 Spider 隊列的設置（220） * 補充 `KEEP_ALIVE` 設置（α220） * 遠離的 `SERVICE_QUEUE` 設置（α220） * 遠離的 `COMMANDS_SETTINGS_MODULE` 設置（α201） * 更名 `REQUEST_HANDLERS` 到 `DOWNLOAD_HANDLERS` 并使下載處理程序類（而不是函數） ## Scrapy 0.9 舊問題追蹤器（trac）中的nnn參考票等數字不再可用。 ### 新功能和改進 * 向scrappy.mail添加了smtp-auth支持 * 添加的新設置： `MAIL_USER` ， `MAIL_PASS` （ [r2065](http://hg.scrapy.org/scrapy/changeset/2065) （149） * 添加了新的scrappy ctl view命令-在瀏覽器中查看url，如scrappy所見（ [r2039](http://hg.scrapy.org/scrapy/changeset/2039) ） * 添加了用于控制 Scrapy 進程的Web服務（這也會取消Web控制臺的支持）。（ [r2053](http://hg.scrapy.org/scrapy/changeset/2053) （167） * 支持將Scrapy作為服務運行，用于生產系統（ [r1988](http://hg.scrapy.org/scrapy/changeset/1988) ， [r2054](http://hg.scrapy.org/scrapy/changeset/2054) ， [r2055](http://hg.scrapy.org/scrapy/changeset/2055) ， [r2056](http://hg.scrapy.org/scrapy/changeset/2056) ， [r2057](http://hg.scrapy.org/scrapy/changeset/2057) （168） * 添加了包裝感應庫（文檔目前僅在源代碼中可用）。（ [r2011](http://hg.scrapy.org/scrapy/changeset/2011) ） * 簡化和改進的響應編碼支持（ [r1961](http://hg.scrapy.org/scrapy/changeset/1961) ， [r1969](http://hg.scrapy.org/scrapy/changeset/1969) ） * 補充 `LOG_ENCODING` 設置（ [r1956](http://hg.scrapy.org/scrapy/changeset/1956) ，文檔可用） * 補充 `RANDOMIZE_DOWNLOAD_DELAY` 設置（默認啟用）（ [r1923](http://hg.scrapy.org/scrapy/changeset/1923) ，文檔可用） * `MailSender` 不再是IO阻塞（ [r1955](http://hg.scrapy.org/scrapy/changeset/1955) （146） * Linkextractor和新的Crawlspider現在處理相對的基標記URL（ [r1960](http://hg.scrapy.org/scrapy/changeset/1960) （148） * 項目加載器和處理器的幾個改進（ [r2022](http://hg.scrapy.org/scrapy/changeset/2022) ， [r2023](http://hg.scrapy.org/scrapy/changeset/2023) ， [r2024](http://hg.scrapy.org/scrapy/changeset/2024) ， [r2025](http://hg.scrapy.org/scrapy/changeset/2025) ， [r2026](http://hg.scrapy.org/scrapy/changeset/2026) ， [r2027](http://hg.scrapy.org/scrapy/changeset/2027) ， [r2028](http://hg.scrapy.org/scrapy/changeset/2028) ， [r2029](http://hg.scrapy.org/scrapy/changeset/2029) ， [r2030](http://hg.scrapy.org/scrapy/changeset/2030) ） * 增加了對向telnet控制臺添加變量的支持（ [r2047](http://hg.scrapy.org/scrapy/changeset/2047) （165） * 支持不帶回調的請求（ [r2050](http://hg.scrapy.org/scrapy/changeset/2050) （166） ### API更改 * 變化 `Spider.domain_name` 到 `Spider.name` （SET-012， [r1975](http://hg.scrapy.org/scrapy/changeset/1975) ） * `Response.encoding` 現在是檢測到的編碼（ [r1961](http://hg.scrapy.org/scrapy/changeset/1961) ） * `HttpErrorMiddleware` 現在不返回任何值或引發異常（ [r2006](http://hg.scrapy.org/scrapy/changeset/2006) （157） * `scrapy.command` 模塊重新定位（ [r2035](http://hg.scrapy.org/scrapy/changeset/2035) ， [r2036](http://hg.scrapy.org/scrapy/changeset/2036) ， [r2037](http://hg.scrapy.org/scrapy/changeset/2037) ） * 補充 `ExecutionQueue` 用來喂 Spider （ [r2034](http://hg.scrapy.org/scrapy/changeset/2034) ） * 遠離的 `ExecutionEngine` 獨生子女 [r2039](http://hg.scrapy.org/scrapy/changeset/2039) ） * 端口 `S3ImagesStore` （圖像管道）使用boto和線程（ [r2033](http://hg.scrapy.org/scrapy/changeset/2033) ） * 移動模塊： `scrapy.management.telnet` 到 `scrapy.telnet` （ [r2047](http://hg.scrapy.org/scrapy/changeset/2047) ） ### 更改為默認設置 * 更改的默認值 `SCHEDULER_ORDER` 到 `DFO` （ [r1939](http://hg.scrapy.org/scrapy/changeset/1939) ） ## Scrapy 0.8 舊問題追蹤器（trac）中的nnn參考票等數字不再可用。 ### 新特點 * 添加了默認的響應編碼設置（ [r1809](http://hg.scrapy.org/scrapy/changeset/1809) ） * 補充 `dont_click` 參數 `FormRequest.from_response()` 方法（ [r1813](http://hg.scrapy.org/scrapy/changeset/1813) ， [r1816](http://hg.scrapy.org/scrapy/changeset/1816) ） * 補充 `clickdata` 參數 `FormRequest.from_response()` 方法（ [r1802](http://hg.scrapy.org/scrapy/changeset/1802) ， [r1803](http://hg.scrapy.org/scrapy/changeset/1803) ） * 添加了對HTTP代理的支持（ `HttpProxyMiddleware` （ [r1781](http://hg.scrapy.org/scrapy/changeset/1781) ， [r1785](http://hg.scrapy.org/scrapy/changeset/1785) ） * 當過濾掉請求時，異地 Spider 中間件現在記錄消息。（ [r1841](http://hg.scrapy.org/scrapy/changeset/1841) ） ### 向后不兼容的更改 * 改變 `scrapy.utils.response.get_meta_refresh()` 簽名（簽名） [r1804](http://hg.scrapy.org/scrapy/changeset/1804) ） * 已刪除，已棄用 `scrapy.item.ScrapedItem` 類使用 `scrapy.item.Item instead` （ [r1838](http://hg.scrapy.org/scrapy/changeset/1838) ） * 已刪除，已棄用 `scrapy.xpath` 模塊使用 `scrapy.selector` 相反。（ [r1836](http://hg.scrapy.org/scrapy/changeset/1836) ） * 已刪除，已棄用 `core.signals.domain_open` 信號使用 `core.signals.domain_opened` 而不是（ [r1822](http://hg.scrapy.org/scrapy/changeset/1822) ） * ```py log.msg() 現在收到一個 spider 論證（論證）） ``` * 舊的域參數已被棄用，將在0.9中刪除。對于 Spider ，你應該經常使用 `spider` 參數并傳遞spider引用。如果確實要傳遞字符串，請使用 `component` 改為參數。 * 改變核心信號 `domain_opened` ， `domain_closed` ， `domain_idle` * ```py 將項目管道更改為使用spider而不是域 ``` * 這個 `domain` 的參數 `process_item()` 項目管道方法已更改為 `spider` ，新簽名為： `process_item(spider, item)` （ [r1827](http://hg.scrapy.org/scrapy/changeset/1827) （105） * 要快速移植代碼（使用Scrapy0.8），只需使用 `spider.domain_name` 你以前用過的地方 `domain` . * ```py 更改了stats API以使用spider而不是域（（113） ``` * `StatsCollector` 已更改為在其方法中接收 Spider 引用（而不是域）（ `set_value` ， `inc_value` 等）。 * 補充 `StatsCollector.iter_spider_stats()` 方法 * 遠離的 `StatsCollector.list_domains()` 方法 * 另外，stats信號被重命名，現在傳遞 Spider 引用（而不是域）。以下是更改的摘要： * 要快速移植代碼（使用Scrapy0.8），只需使用 `spider.domain_name` 你以前用過的地方 `domain` . `spider_stats` 包含與完全相同的數據 `domain_stats` . * ```py CloseDomain 擴展移動到 scrapy.contrib.closespider.CloseSpider （） ``` * ```py 其設置也被重命名： ``` * `CLOSEDOMAIN_TIMEOUT` to `CLOSESPIDER_TIMEOUT` * `CLOSEDOMAIN_ITEMCOUNT` to `CLOSESPIDER_ITEMCOUNT` * 已刪除，已棄用 `SCRAPYSETTINGS_MODULE` 環境變量-使用 `SCRAPY_SETTINGS_MODULE` 而不是（ [r1840](http://hg.scrapy.org/scrapy/changeset/1840) ） * 重命名的設置： `REQUESTS_PER_DOMAIN` 到 `CONCURRENT_REQUESTS_PER_SPIDER` （ [r1830](http://hg.scrapy.org/scrapy/changeset/1830) ， [r1844](http://hg.scrapy.org/scrapy/changeset/1844) ） * 重命名的設置： `CONCURRENT_DOMAINS` 到 `CONCURRENT_SPIDERS` （ [r1830](http://hg.scrapy.org/scrapy/changeset/1830) ） * 重構HTTP緩存中間件 * HTTP緩存中間件經過了大量的重構，保留了相同的功能，但刪除了域分段。（ [r1843](http://hg.scrapy.org/scrapy/changeset/1843) ） * 重命名的異常： `DontCloseDomain` 到 `DontCloseSpider` （ [r1859](http://hg.scrapy.org/scrapy/changeset/1859) （120） * 重命名的擴展名： `DelayedCloseDomain` 到 `SpiderCloseDelay` （ [r1861](http://hg.scrapy.org/scrapy/changeset/1861) （121） * 已刪除已過時 `scrapy.utils.markup.remove_escape_chars` 功能使用 `scrapy.utils.markup.replace_escape_chars` 而不是（ [r1865](http://hg.scrapy.org/scrapy/changeset/1865) ） ## Scrapy 0.7 第一次發行的Scrapy。