<ruby id="bdb3f"></ruby>

    <p id="bdb3f"><cite id="bdb3f"></cite></p>

      <p id="bdb3f"><cite id="bdb3f"><th id="bdb3f"></th></cite></p><p id="bdb3f"></p>
        <p id="bdb3f"><cite id="bdb3f"></cite></p>

          <pre id="bdb3f"></pre>
          <pre id="bdb3f"><del id="bdb3f"><thead id="bdb3f"></thead></del></pre>

          <ruby id="bdb3f"><mark id="bdb3f"></mark></ruby><ruby id="bdb3f"></ruby>
          <pre id="bdb3f"><pre id="bdb3f"><mark id="bdb3f"></mark></pre></pre><output id="bdb3f"></output><p id="bdb3f"></p><p id="bdb3f"></p>

          <pre id="bdb3f"><del id="bdb3f"><progress id="bdb3f"></progress></del></pre>

                <ruby id="bdb3f"></ruby>

                ??一站式輕松地調用各大LLM模型接口,支持GPT4、智譜、豆包、星火、月之暗面及文生圖、文生視頻 廣告
                scrapy實現登錄有兩種思路: 1. 直接攜帶cookie登錄; 應用場景: (1)cookie過期時間很長,常見于一些不規范的網站 (2)能在cookie過期之前把所有的數據拿到 (3)配合其他程序使用,比如其使用selenium把登陸之后的cookie獲取到保存到本地,scrapy發送請求之前先讀取本地cookie <br/> 2. 找到登錄的url,發送post請求存儲cookie; 例:登錄github <br/> **1. 直接攜帶cookie登錄** (1)創建爬蟲項目 ```shell > scrapy startproject git > cd git > scrapy genspider git1 github.com ``` <br/> (2)配置`settings.py` ```python # Crawl responsibly by identifying yourself (and your website) on the user-agent USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36' # Obey robots.txt rules ROBOTSTXT_OBEY = False ``` <br/> (3)先手動登錄到github,復制cookie ![](https://img.kancloud.cn/7a/ae/7aae9e676664575c584101b6874be222_1303x445.jpg) <br/> (4)重寫 `start_requests` 方法 ```python import scrapy class Git1Spider(scrapy.Spider): name = 'git1' allowed_domains = ['github.com'] # 注意:請求的url應該是 https://github.com/你的github用戶名 start_urls = ['https://github.com/你的github用戶名'] def parse(self, response): # 登錄前github上的title是 GitHub . GitHub # 登錄成功后為 用戶名 . GitHub # 輸出 用戶名 · GitHub,說明登錄成功 print(response.xpath('/html/head/title/text()').extract_first())) pass def start_requests(self): """ 重寫該方法 """ url = self.start_urls[0] cookie = '_ga=GA1.2.534025100(cookie太長了這里省略不寫了)...3D' # 1. 將cookie轉換為字典 cookies = {data.split('=')[0]: data.split('=')[-1] for data in cookie.split(';')} # 2. 攜帶cookies發送請求 yield scrapy.Request( url=url, callback=self.parse, cookies=cookies ) ``` <br/> **2. 找到的url,攜帶相關參數發送post請求** 其分析過程這里就省略了,下面只提供了scrapy中用于發送 POST 請求的代碼。 ```python import scrapy class Git2Spider(scrapy.Spider): name = 'git2' allowed_domains = ['github.com'] start_urls = ['http://github.com/login'] def parse(self, response): # 1. 解析出登錄需要的所有參數 post_data = {} # 2. 找到登錄的url,提交請求 # 發送 POST請求可以調用scrapy.FormRequest # 或者 scrapy.Request(url, method='POST') yield scrapy.FormRequest( url='https://github.com/session', # github提交表單的地址 callback=self.login_github, # 登錄成功后的解析函數 formdata=post_data # 進行登錄時所需要的參數 ) pass ```
                  <ruby id="bdb3f"></ruby>

                  <p id="bdb3f"><cite id="bdb3f"></cite></p>

                    <p id="bdb3f"><cite id="bdb3f"><th id="bdb3f"></th></cite></p><p id="bdb3f"></p>
                      <p id="bdb3f"><cite id="bdb3f"></cite></p>

                        <pre id="bdb3f"></pre>
                        <pre id="bdb3f"><del id="bdb3f"><thead id="bdb3f"></thead></del></pre>

                        <ruby id="bdb3f"><mark id="bdb3f"></mark></ruby><ruby id="bdb3f"></ruby>
                        <pre id="bdb3f"><pre id="bdb3f"><mark id="bdb3f"></mark></pre></pre><output id="bdb3f"></output><p id="bdb3f"></p><p id="bdb3f"></p>

                        <pre id="bdb3f"><del id="bdb3f"><progress id="bdb3f"></progress></del></pre>

                              <ruby id="bdb3f"></ruby>

                              哎呀哎呀视频在线观看