selenium+driver · Python爬蟲

使用selenium + driver 可以抓取網頁動態數據。 **selenium：** selenium是一個web的自動化測試工具，最初是為網站自動化測試而開發的，selenium可以直接運行在瀏覽器上，它支持所有主流的瀏覽器，可以接收指令，讓瀏覽器自動加載頁面，獲取需要的數據，甚至頁面截屏。安裝：`pip install selenium` 官方文檔：[http://selenium-python.readthedocs.io/api.html](http://selenium-python.readthedocs.io/api.html) **driver：** driver是指瀏覽器的驅動，不同的瀏覽器有不同的驅動，使用驅動才能使Python驅動瀏覽器。 驅動器下載地址： ChromeDriver：https://sites.google.com/a/chromium.org/chromedriver/downloads ChromeDriver（淘寶鏡像）：https://npm.taobao.org/mirrors/chromedriver/ FirefoxDriver：https://github.com/mozilla/geckodriver/releases EdgeDriver：https://developer.microsoft.com/en-us/microsoft-edge/tools/webdriver/ SafariDriver：https://webkit.org/blog/6900/webdriver-support-in-safari-10/ 安裝：下載完成后解壓到指定的英文目錄即可。 **Phantomis：** Phantomjs是-個基于webkit的無界面瀏覽器，它會把網站加載到內存并執行頁面上的JavaScript。 **1. Phantomjs案例** ```python """ @Date 2021/3/18 """ from selenium import webdriver # 1. 加載驅動 # 或者將驅動放到Python的Scripts目錄下，則可以寫成 webdriver.Chrome() driver = webdriver.Chrome("D:/Drivers/ChromeDriver/chromedriver_win32/chromedriver.exe") # 2. 打開瀏覽器，get就會打開瀏覽器 driver.get("https://www.baidu.com") # 3. 我們對當前網頁截屏 driver.save_screenshot("E:/python/driver/baidu.png") # 定位和操作 driver.find_element_by_id("kw").send_keys("長城") driver.find_element_by_id("su").click() # 獲取網頁源碼 page_source = driver.page_source print(page_source) cookies = driver.get_cookies() print(cookies) current_url = driver.current_url print(current_url) # 4. 退出瀏覽器 driver.quit() ```