<ruby id="bdb3f"></ruby>

    <p id="bdb3f"><cite id="bdb3f"></cite></p>

      <p id="bdb3f"><cite id="bdb3f"><th id="bdb3f"></th></cite></p><p id="bdb3f"></p>
        <p id="bdb3f"><cite id="bdb3f"></cite></p>

          <pre id="bdb3f"></pre>
          <pre id="bdb3f"><del id="bdb3f"><thead id="bdb3f"></thead></del></pre>

          <ruby id="bdb3f"><mark id="bdb3f"></mark></ruby><ruby id="bdb3f"></ruby>
          <pre id="bdb3f"><pre id="bdb3f"><mark id="bdb3f"></mark></pre></pre><output id="bdb3f"></output><p id="bdb3f"></p><p id="bdb3f"></p>

          <pre id="bdb3f"><del id="bdb3f"><progress id="bdb3f"></progress></del></pre>

                <ruby id="bdb3f"></ruby>

                ??一站式輕松地調用各大LLM模型接口,支持GPT4、智譜、豆包、星火、月之暗面及文生圖、文生視頻 廣告
                ### 1.說明 利用selenium抓取淘寶商品并用PyQuery解析得到商品的圖片、名稱、價格、購買人數、店鋪名稱、店鋪所在地信息,將其保存在MongoDB ### 2.準備 [安裝selenium](/1kai-fa-huan-jing-pei-zhi/12-qing-qiu-ku-de-an-zhuang/122-seleniumde-an-zhuang.md) [安裝ChromeDriver](/1kai-fa-huan-jing-pei-zhi/12-qing-qiu-ku-de-an-zhuang/123-chromedriverde-an-zhuang.md) ### 3.接口分析 ### ![](/assets/7.4.1.png) ### 4.頁面數據分析 目的爬取商品信息 ![](/assets/7.4.2.png) 商品基本信息:商品圖片、名稱、價格、購買人數、店鋪名稱、店鋪所在地 抓取頁面:[https://s.taobao.com/search?q=蘋果plus正品](https://s.taobao.com/search?q=蘋果plus正品) 分頁:![](/assets/7.4.3.png) ### 5.獲取商品列表 抓取地址:[https://s.taobao.com/search?q=蘋果plus正品](https://s.taobao.com/search?q=蘋果plus正品) ``` def get_products(): """ 提取商品數據 """ print("數據提取中....") html = browser.page_source doc = pq(html) items = doc('#mainsrp-itemlist .items .item').items() for item in items: image = item.find(".pic-link .img").attr('data-src') # if image: # result = re.match('.*?!!(.*?)\..*?', image) # result = re.sub('[a-z\_\-]', '', result) # if result: # id = result.group(1) # print(id) data = { 'title': item.find(".title").text().strip(), 'image':image, 'price':item.find(".price").text().strip(), 'deal':item.find('.deal-cnt').text().strip(), 'shop':item.find('.shop').text().strip(), 'location':item.find(".location").text().strip(), } print(data) save_to_mongo(data) ``` ### 6.存儲數據到MongoDB中 ``` MONGO_URL = "localhost" MONGO_DB = "taobao" MONGO_COLLECTION = "products" client = pymongo.MongoClient(MONGO_URL) db = client[MONGO_DB] collection = db[MONGO_COLLECTION] def save_to_mongo(data): """ 保存數據到mongoDB中 :param data: :return: """ try: if collection.insert(data): print("success") except: print("fail") ``` ### 7.遍歷每頁 ``` def main(): for i in range(1,101): index_page(i) # browser.close() ``` ### 8.Chrome Headless模式 chrome無界面模式參數 ``` chrome_options = webdriver.ChromeOptions() chrome_options.add_argument('--headless') browser = webdriver.Chrome(chrome_options=chrome_options) ``` ### 9.對接Firefox ``` browser = webdriver.Firefox() ``` ### 10.對接PhantomJS ``` browser = webdriver.PhantomJS() ``` 設置緩存和禁用圖片加載的功能,進一步提高爬取效率 ``` SERVICE_ARGS = ['--load-images=false', '--disk-cache=true'] browser = webdriver.PhantomJS(service_args=SERVICE_ARGS) ``` ### 11.源代碼 ``` from selenium import webdriver from selenium.webdriver.support.wait import WebDriverWait from selenium.webdriver.support import expected_conditions from selenium.webdriver.common.by import By from pyquery import PyQuery as pq import re import pymongo browser = webdriver.Chrome() wait = WebDriverWait(browser,10) keyword = "" MONGO_URL = "localhost" MONGO_DB = "taobao" MONGO_COLLECTION = "products" client = pymongo.MongoClient(MONGO_URL) db = client[MONGO_DB] collection = db[MONGO_COLLECTION] def index_page(page): """ 抓取索引頁 :param page: 頁碼 """ print("正在爬取第{}頁".format(page)) try: url = "https://s.taobao.com/search?q={}".format(keyword) browser.get(url) if page > 1: input = wait.until(expected_conditions.presence_of_element_located((By.CSS_SELECTOR,"#mainsrp-pager div.form > input"))) submit = wait.until(expected_conditions.element_to_be_clickable((By.CSS_SELECTOR,"#mainsrp-pager div.form > span.btn.J_Submit"))) input.clear() input.send_keys(page) submit.click() wait.until(expected_conditions.text_to_be_present_in_element((By.CSS_SELECTOR,"#mainsrp-pager li.item.active > span"),str(page))) # 商品信息 wait.until(expected_conditions.presence_of_element_located((By.CSS_SELECTOR,".m-itemlist .items .item"))) get_products() except: index_page(page) def get_products(): """ 提取商品數據 """ print("數據提取中....") html = browser.page_source doc = pq(html) items = doc('#mainsrp-itemlist .items .item').items() for item in items: image = item.find(".pic-link .img").attr('data-src') # if image: # result = re.match('.*?!!(.*?)\..*?', image) # result = re.sub('[a-z\_\-]', '', result) # if result: # id = result.group(1) # print(id) data = { 'title': item.find(".title").text().strip(), 'image':image, 'price':item.find(".price").text().strip(), 'deal':item.find('.deal-cnt').text().strip(), 'shop':item.find('.shop').text().strip(), 'location':item.find(".location").text().strip(), } print(data) save_to_mongo(data) def save_to_mongo(data): """ 保存數據到mongoDB中 :param data: :return: """ try: if collection.insert(data): print("success") except: print("fail") def main(): for i in range(1,101): index_page(i) # browser.close() if __name__ == "__main__": keyword = input("請輸入關鍵詞:") main() ```
                  <ruby id="bdb3f"></ruby>

                  <p id="bdb3f"><cite id="bdb3f"></cite></p>

                    <p id="bdb3f"><cite id="bdb3f"><th id="bdb3f"></th></cite></p><p id="bdb3f"></p>
                      <p id="bdb3f"><cite id="bdb3f"></cite></p>

                        <pre id="bdb3f"></pre>
                        <pre id="bdb3f"><del id="bdb3f"><thead id="bdb3f"></thead></del></pre>

                        <ruby id="bdb3f"><mark id="bdb3f"></mark></ruby><ruby id="bdb3f"></ruby>
                        <pre id="bdb3f"><pre id="bdb3f"><mark id="bdb3f"></mark></pre></pre><output id="bdb3f"></output><p id="bdb3f"></p><p id="bdb3f"></p>

                        <pre id="bdb3f"><del id="bdb3f"><progress id="bdb3f"></progress></del></pre>

                              <ruby id="bdb3f"></ruby>

                              哎呀哎呀视频在线观看