<ruby id="bdb3f"></ruby>

    <p id="bdb3f"><cite id="bdb3f"></cite></p>

      <p id="bdb3f"><cite id="bdb3f"><th id="bdb3f"></th></cite></p><p id="bdb3f"></p>
        <p id="bdb3f"><cite id="bdb3f"></cite></p>

          <pre id="bdb3f"></pre>
          <pre id="bdb3f"><del id="bdb3f"><thead id="bdb3f"></thead></del></pre>

          <ruby id="bdb3f"><mark id="bdb3f"></mark></ruby><ruby id="bdb3f"></ruby>
          <pre id="bdb3f"><pre id="bdb3f"><mark id="bdb3f"></mark></pre></pre><output id="bdb3f"></output><p id="bdb3f"></p><p id="bdb3f"></p>

          <pre id="bdb3f"><del id="bdb3f"><progress id="bdb3f"></progress></del></pre>

                <ruby id="bdb3f"></ruby>

                企業??AI智能體構建引擎,智能編排和調試,一鍵部署,支持知識庫和私有化部署方案 廣告
                [TOC] > 1. python 3.x中urllib庫和urilib2庫合并成了urllib庫。。 > 2. 其中urllib2.urlopen()變成了urllib.request.urlopen() > 3. urllib2.Request()變成了urllib.request.Request() ## 1. url編碼 > url編碼解碼,又叫百分號編碼,是統一資源定位(URL)編碼方式。URL地址(常說網址)規定了常用地數字,字母可以直接使用,另外一批作為特殊用戶字符也可以直接用(/,:@等),剩下的其它所有字符必須通過%xx編碼處理。 現在已經成為一種規范了,基本所有程序語言都有這種編碼,如js:有encodeURI、encodeURIComponent,PHP有 urlencode、urldecode等。編碼方法很簡單,在該字節ascii碼的的16進制字符前面加%. 如 空格字符,ascii碼是32,對應16進制是'20',那么urlencode編碼結果是:%20 例如百度搜索 中華人民共和國,會把真實的漢字進行url轉碼,每三個用%號分割的字符串代表一個字符 ![](https://box.kancloud.cn/1397fce0f2f35ce64396830124d9563f_806x154.png) 復制出來是這樣 `https://www.baidu.com/s?ie=utf-8&f=3&rsv_bp=1&tn=monline_3_dg&wd=%E4%B8%AD%E5%8D%8E%E4%BA%BA%E6%B0%91%E5%85%B1%E5%92%8C%E5%9B%BD` ~~~ In [3]: from urllib import parse In [8]: parse.urlencode({"wq":"中華人民共和國"}) # 傳入一個字典代表請求的鍵值對 Out[8]: 'wq=%E4%B8%AD%E5%8D%8E%E4%BA%BA%E6%B0%91%E5%85%B1%E5%92%8C%E5%9B%BD' In [9]: print(parse.unquote("%E4%B8%AD")) 中 ~~~ ## 2. 簡單爬取百度貼吧 ~~~ from urllib import request,parse """ 爬取貼吧某一話題數據 """ header = { "User-Agent":"Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:55.0) Gecko/20100101 Firefox/55.0" } # with自動關閉文件連接,因為默認寫入是gbk編碼,但是我們的數據是urf-8 # 所以encoding="utf-8" def writeFile(filename,data): with open(str(filename),"w",encoding="utf-8") as file: file.write(data) # 導入具體某一頁 def loadPage(url): requests = request.Request(url,headers = header) response = request.urlopen(requests) data = response.read().decode("utf-8") return data # 循環遍歷每一頁 def detectPage(url,start,end): for page in range (start,end+1): print(page) page_num = (page-1) * 50 num = parse.urlencode({"pn":page_num}) search_url = url + "&" + num print(search_url) result = loadPage(search_url) writeFile(page,result) if __name__ == "__main__": key = input("輸入貼吧關鍵字:") url = "http://tieba.baidu.com/f?ie=utf-8&" start_page = input("輸入起始頁:") end_page = input("輸入結束頁:") searchurl = url + parse.urlencode({"kw":key}) detectPage(searchurl,int(start_page),int(end_page)) ~~~ ## 3. cookie ~~~ Python2 import cookilib Python3 from http import cookiejar ~~~
                  <ruby id="bdb3f"></ruby>

                  <p id="bdb3f"><cite id="bdb3f"></cite></p>

                    <p id="bdb3f"><cite id="bdb3f"><th id="bdb3f"></th></cite></p><p id="bdb3f"></p>
                      <p id="bdb3f"><cite id="bdb3f"></cite></p>

                        <pre id="bdb3f"></pre>
                        <pre id="bdb3f"><del id="bdb3f"><thead id="bdb3f"></thead></del></pre>

                        <ruby id="bdb3f"><mark id="bdb3f"></mark></ruby><ruby id="bdb3f"></ruby>
                        <pre id="bdb3f"><pre id="bdb3f"><mark id="bdb3f"></mark></pre></pre><output id="bdb3f"></output><p id="bdb3f"></p><p id="bdb3f"></p>

                        <pre id="bdb3f"><del id="bdb3f"><progress id="bdb3f"></progress></del></pre>

                              <ruby id="bdb3f"></ruby>

                              哎呀哎呀视频在线观看