<ruby id="bdb3f"></ruby>

    <p id="bdb3f"><cite id="bdb3f"></cite></p>

      <p id="bdb3f"><cite id="bdb3f"><th id="bdb3f"></th></cite></p><p id="bdb3f"></p>
        <p id="bdb3f"><cite id="bdb3f"></cite></p>

          <pre id="bdb3f"></pre>
          <pre id="bdb3f"><del id="bdb3f"><thead id="bdb3f"></thead></del></pre>

          <ruby id="bdb3f"><mark id="bdb3f"></mark></ruby><ruby id="bdb3f"></ruby>
          <pre id="bdb3f"><pre id="bdb3f"><mark id="bdb3f"></mark></pre></pre><output id="bdb3f"></output><p id="bdb3f"></p><p id="bdb3f"></p>

          <pre id="bdb3f"><del id="bdb3f"><progress id="bdb3f"></progress></del></pre>

                <ruby id="bdb3f"></ruby>

                ??一站式輕松地調用各大LLM模型接口,支持GPT4、智譜、豆包、星火、月之暗面及文生圖、文生視頻 廣告
                # xpath選擇器 表達式 描述 / 從根節點選取。 // 從匹配選擇的當前節點選擇文檔中的節點,而不考慮它們的位置。 . 選取當前節點。 .. 選取當前節點的父節點。 @ 選取屬性。 /text() 獲取標簽內的文字 /img/@src 獲取img標簽的src屬性 /a[1] 獲取選中的第一個a標簽 div[@class='main_pic'] 通過class選擇div,同樣id也行 # 示例 * 抓取class為`logo-image`a標簽下的img標簽的圖片地址 ``` <a class="logo-image"> <img src="https://fanyi-cdn.cdn.bcebos.com/static/translation/img/header/logo_e835568.png"> </a> ``` ``` print(response.xpath("//a[@class='logo-image']/img/@src").extract_first()) ``` * 抓取class為`copyright`div下所有a連接文字 ``` <div class="copyright"> <ul> <li> <a href="https://www.baidu.com/duty/" target="_blank">使用百度前必讀</a> <span class="split-line">|</span> </li> <li> <a href="https://www.baidu.com/duty/" target="_blank">使用百度前必讀</a> <span class="split-line">|</span> </li> </ul> </div> ``` ``` list = response.xpath("//div[@class='copyright']/ul/li/a") for i in list: print(i.xpath('text()').extract()) ``` # 實踐 實踐網址`http://m.soxs.cc/shuku/` 我們可以看到這個頁面有許多列表,我們要做的是利用xpath選擇器獲取到所有的列表信息<br/> * html 樣例 ``` <ul class="list"> <li> <a href="/book/JingLingShouFu.html"> <img src="//img.soxs.cc/310272/1451007.jpg" alt="精靈收服"> </a> <p class="bookname"> <a href="/JingLingShouFu/">精靈收服</a> </p> <p class="data"> <a href="/author/多肉的多/" class="layui-btn layui-btn-xs layui-bg-cyan">多肉的多</a> <span class="layui-btn layui-btn-xs layui-btn-radius">玄幻奇幻</span> <span class="layui-btn layui-btn-xs layui-btn-radius layui-btn-normal">連載</span> </p> <p class="intro">馮宇熙生活在精靈世界,這里有各種神奇的精靈,等你來收服</p> <p class="data">最新: <a href="/JingLingShouFu/2412122.html">第105章圍剿邪靈</a> </p> </li> </ul> ``` * 我們新建一個`booklist.py`文件 ``` import scrapy class booklistSpider(scrapy.Spider): name = 'booklist' allowed_domains = ['m.soxs.cc'] # 定義只爬取變量內的網站 start_urls = ["http://m.soxs.cc/shuku/"] # 定義爬取的url, def parse(self, response): # 爬蟲啟動后進入parse方法 print(response) # 輸出爬取狀態 200為成功獲取內容 list = response.xpath("//ul[@class='list']/li") for i in list: img_url = i.xpath("a[1]/img/@src").extract_first() bookName = i.xpath("p[1]/a/text()")[0].extract() bookLink = i.xpath("p[1]/a/@href").extract_first() author = i.xpath("p[2]/a/text()")[0].extract() categroy = i.xpath("p[2]/span[1]/text()")[0].extract() status = i.xpath("p[2]/span[2]/text()")[0].extract() info = i.xpath("p[3]/text()")[0].extract() news = i.xpath("p[4]/a/text()")[0].extract() print("==============================================================") print("img_url => "+img_url) print("bookName => " + bookName) print("bookLink => " + bookLink) print("author => " + author) print("categroy => " + categroy) print("status => " + status) print("info => " + info) print("news => " + news) ``` * 執行代碼: `scrapy crawl booklist` ![](https://img.kancloud.cn/a1/41/a141c0c41f75b5f6c55a2c45c653f70e_899x721.png)
                  <ruby id="bdb3f"></ruby>

                  <p id="bdb3f"><cite id="bdb3f"></cite></p>

                    <p id="bdb3f"><cite id="bdb3f"><th id="bdb3f"></th></cite></p><p id="bdb3f"></p>
                      <p id="bdb3f"><cite id="bdb3f"></cite></p>

                        <pre id="bdb3f"></pre>
                        <pre id="bdb3f"><del id="bdb3f"><thead id="bdb3f"></thead></del></pre>

                        <ruby id="bdb3f"><mark id="bdb3f"></mark></ruby><ruby id="bdb3f"></ruby>
                        <pre id="bdb3f"><pre id="bdb3f"><mark id="bdb3f"></mark></pre></pre><output id="bdb3f"></output><p id="bdb3f"></p><p id="bdb3f"></p>

                        <pre id="bdb3f"><del id="bdb3f"><progress id="bdb3f"></progress></del></pre>

                              <ruby id="bdb3f"></ruby>

                              哎呀哎呀视频在线观看