<ruby id="bdb3f"></ruby>

    <p id="bdb3f"><cite id="bdb3f"></cite></p>

      <p id="bdb3f"><cite id="bdb3f"><th id="bdb3f"></th></cite></p><p id="bdb3f"></p>
        <p id="bdb3f"><cite id="bdb3f"></cite></p>

          <pre id="bdb3f"></pre>
          <pre id="bdb3f"><del id="bdb3f"><thead id="bdb3f"></thead></del></pre>

          <ruby id="bdb3f"><mark id="bdb3f"></mark></ruby><ruby id="bdb3f"></ruby>
          <pre id="bdb3f"><pre id="bdb3f"><mark id="bdb3f"></mark></pre></pre><output id="bdb3f"></output><p id="bdb3f"></p><p id="bdb3f"></p>

          <pre id="bdb3f"><del id="bdb3f"><progress id="bdb3f"></progress></del></pre>

                <ruby id="bdb3f"></ruby>

                ??一站式輕松地調用各大LLM模型接口,支持GPT4、智譜、豆包、星火、月之暗面及文生圖、文生視頻 廣告
                **1. 安裝scrapy** ```shell D:\>pip install Scrapy ``` <br/> **2. 到你的工作空間創建爬蟲項目** ```python D:\>cd PycharmWorkspace # mySpider為項目名稱 # 將會創建D:\PycharmWorkspace\mySpider 目錄 D:\PycharmWorkspace>scrapy startproject mySpider ``` <br/> **3. 創建爬蟲** ```shell # 1. 切換到項目目錄下 D:\PycharmWorkspace>cd mySpider # 2. 創建爬蟲 # books 爬蟲名稱 # book.jd.com 為url,不需要寫http,或者https協議 D:\PycharmWorkspace\mySpider>scrapy genspider books book.jd.com ``` 當到這一步,scrapy會自動創建如下項目結構: ```xml mySpider |mySpider | spiders| __init__.py books.py __init__.py items.py middlewares.py pipelines.py settings.py scrapy.cfg ``` <br/> **4. 在`books.py`文件發起請求** ```python """ @Date 2021/4/7 """ import scrapy class BooksSpider(scrapy.Spider): name = 'books' # 爬蟲名稱 allowed_domains = ['book.jd.com'] # 爬取范圍 start_urls = ['http://book.jd.com/'] # 爬蟲的入口url def parse(self, response): """ (1) 該方法作為books爬蟲的入口請求 (2) Scrapy為Spider的 start_urls 屬性中的每個URL創建了 scrapy.Request 對象,并將 parse 方法 作為回調函數(callback)賦值給了Request。 (3) Request對象經過調度,執行生成 scrapy.http.Response 對象并送回給該parse() 方法。 """ # https://book.jd.com/ print(response.url) pass ``` <br/> **5. 啟動爬蟲** 啟動方式一:到項目的目錄下執行下面的命令; ``` # scrapy crawl 爬蟲名稱 D:\PycharmWorkspace\mySpider> scrapy crawl books ``` <br/> 啟動方式二:在代碼中啟動,在項目目錄下創建`start.py`腳本 ```python from scrapy import cmdline if __name__ == "__main__": cmdline.execute("scrapy crawl books".split()) 或者 cmdline.execute(["scrapy", "crawl", "books"]) ``` <br/> **6. 啟動爬蟲后會打印很多的logging信息,如果你不想打印到控制臺上,在`settings.py`中做如下配置** ```python LOG_LEVEL = 'WARNING' ```
                  <ruby id="bdb3f"></ruby>

                  <p id="bdb3f"><cite id="bdb3f"></cite></p>

                    <p id="bdb3f"><cite id="bdb3f"><th id="bdb3f"></th></cite></p><p id="bdb3f"></p>
                      <p id="bdb3f"><cite id="bdb3f"></cite></p>

                        <pre id="bdb3f"></pre>
                        <pre id="bdb3f"><del id="bdb3f"><thead id="bdb3f"></thead></del></pre>

                        <ruby id="bdb3f"><mark id="bdb3f"></mark></ruby><ruby id="bdb3f"></ruby>
                        <pre id="bdb3f"><pre id="bdb3f"><mark id="bdb3f"></mark></pre></pre><output id="bdb3f"></output><p id="bdb3f"></p><p id="bdb3f"></p>

                        <pre id="bdb3f"><del id="bdb3f"><progress id="bdb3f"></progress></del></pre>

                              <ruby id="bdb3f"></ruby>

                              哎呀哎呀视频在线观看