搭建Scrapy爬蟲的開發環境 · Python爬蟲

這一章主要介紹Scrapy的安裝、安裝過程中可能遇到的問題以及解決方式。由于我在Mac和Ubuntu環境下都嘗試過，所以會將兩個平臺上遇到的問題都記下來以供參考。在安裝Scrapy之前，首先需要安裝以下組件： * python 2.7 * pip * lxml * openssl 接下來分別介紹。 ### 1\. 安裝python 2.7 目前Scrapy 1.x僅支持python2.x（官方說以后會支持python 3.x，但目前不支持）。一般系統都預裝了python，可以通過`-V`命令查看版本： GuoDaniel:~ nkcoder$ python -V Python 2.7.10 當然你可以直接使用系統的python，但更好地做法是通過`virtualenv`虛擬化一個python環境，與系統的python隔離，避免依賴沖突等問題。安裝`virtualenv`： ~~~ $ [sudo] pip install virtualenv ~~~ 創建一個python虛擬環境： ~~~ GuoDaniel:start_scrapy nkcoder$ virtualenv startenv New python executable in /Users/nkcoder/Projects/python/start_scrapy/startenv/bin/python Installing setuptools, pip, wheel...done. GuoDaniel:start_scrapy nkcoder$ source startenv/bin/activate (startenv) GuoDaniel:start_scrapy nkcoder$ ls startenv (startenv) GuoDaniel:start_scrapy nkcoder$ which pip /Users/nkcoder/Projects/python/start_scrapy/startenv/bin/pip (startenv) GuoDaniel:start_scrapy nkcoder$ python -V Python 2.7.10 ~~~ virtualenv也可以指定python解釋器，默認使用PATH定義的python解釋器。比如創建一個python 3的虛擬環境： ~~~ GuoDaniel:start_scrapy nkcoder$ virtualenv -p /Library/Frameworks/Python.framework/Versions/3.5/bin/python3 python3env Running virtualenv with interpreter /Library/Frameworks/Python.framework/Versions/3.5/bin/python3 Using base prefix '/Library/Frameworks/Python.framework/Versions/3.5' New python executable in /Users/nkcoder/Projects/python/start_scrapy/python3env/bin/python3 Also creating executable in /Users/nkcoder/Projects/python/start_scrapy/python3env/bin/python Installing setuptools, pip, wheel...done. GuoDaniel:start_scrapy nkcoder$ source python3env/bin/activate (python3env) GuoDaniel:start_scrapy nkcoder$ which python /Users/nkcoder/Projects/python/start_scrapy/python3env/bin/python (python3env) GuoDaniel:start_scrapy nkcoder$ python -V Python 3.5.0 ~~~ * 參考：[VirtualEnv Installation](https://virtualenv.readthedocs.org/en/latest/installation.html) ### 2\. 安裝pip 通過virtualenv安裝的python，默認已經安裝了對應版本的pip，查看pip版本： ~~~ GuoDaniel:start_scrapy nkcoder$ source startenv/bin/activate (startenv) GuoDaniel:start_scrapy nkcoder$ which pip /Users/nkcoder/Projects/python/start_scrapy/startenv/bin/pip (startenv) GuoDaniel:start_scrapy nkcoder$ pip -V pip 7.1.2 from /Users/nkcoder/Projects/python/start_scrapy/startenv/lib/python2.7/site-packages (python 2.7) ~~~ 如果需要自己安裝pip，也很簡單，首先[下載get-pip.py腳本](https://bootstrap.pypa.io/get-pip.py)，然后安裝： ~~~ $ python get-pip.py ~~~ * 參考：[Pip Installation](http://pip.readthedocs.org/en/stable/installing/) ### 3\. 安裝lxml 通過`pip`安裝lxml： ~~~ $ sudo pip install lxml ~~~ **在Mac環境下安裝lxml可能會遇到以下錯誤**： ~~~ In file included from src/lxml/lxml.etree.c:314: /private/tmp/pip_build_root/lxml/src/lxml/includes/etree_defs.h:9:10: fatal error: 'libxml/xmlversion.h' file not found #include "libxml/xmlversion.h" ^ 1 error generated. error: command 'cc' failed with exit status 1 ~~~ 安裝或更新xcode-select即可： ~~~ $ xcode-select --install ~~~ * 參考：[Cannot install Lxml on Mac os x 10.9](http://stackoverflow.com/questions/19548011/cannot-install-lxml-on-mac-os-x-10-9) **在Ubuntu環境下可能會遇到以下錯誤**： ~~~ libxml/xmlversion.h: No such file or directory compilation terminated. ~~~ 需要安裝相應的dev包： ~~~ $ sudo apt-get install libxml2-dev libxslt1-dev python-dev ~~~ * 參考：[how-to-install-lxml-on-ubuntu](http://stackoverflow.com/questions/6504810/how-to-install-lxml-on-ubuntu) ### 4\. 安裝openssl 系統一般都預裝有openssl： ~~~ $ openssl version OpenSSL 0.9.8zg 14 July 2015 ~~~ ### 5\. 安裝Scrapy 通過pip安裝Scrapy： ~~~ $ sudo pip install scrapy ~~~ **在Mac環境下，如果系統版本是`OS X EI Capitan`，并且使用的是系統的python，而不是virtualenv的虛擬環境，則可能會遇到如下問題**： ~~~ OSError: [Errno 1] Operation not permitted: '/tmp/pip-nIfswi-uninstall/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/six-1.4.1-py2.7.egg-info' ~~~ 原因是Scrapy的依賴庫Six與系統搞得依賴庫Six發生了沖突，一種解決方式是將系統的`System Integrity Protection`臨時禁用，更好地解決方式當然是使用virtualenv創建一個隔離的python環境。 * 參考：[“OSError: [Errno 1] Operation not permitted”](http://stackoverflow.com/questions/31900008/oserror-errno-1-operation-not-permitted-when-installing-scrapy-in-osx-10-11) **在ubuntu環境下，可能會遇到這個問題**： ~~~ fatal error: openssl/aes.h: No such file or directory ~~~ 安裝`libssl-dev`即可： ~~~ $ sudo apt-get install libssl-dev ~~~ * 參考：[How to fix “fatal error: openssl/aes.h: No such file or directory”](http://ask.xmodulo.com/fix-fatal-error-openssl.html) 文／nkcoder（簡書作者）原文鏈接：http://www.jianshu.com/p/434bc6574c95 著作權歸作者所有，轉載請聯系作者獲得授權，并標注“簡書作者”。