<ruby id="bdb3f"></ruby>

    <p id="bdb3f"><cite id="bdb3f"></cite></p>

      <p id="bdb3f"><cite id="bdb3f"><th id="bdb3f"></th></cite></p><p id="bdb3f"></p>
        <p id="bdb3f"><cite id="bdb3f"></cite></p>

          <pre id="bdb3f"></pre>
          <pre id="bdb3f"><del id="bdb3f"><thead id="bdb3f"></thead></del></pre>

          <ruby id="bdb3f"><mark id="bdb3f"></mark></ruby><ruby id="bdb3f"></ruby>
          <pre id="bdb3f"><pre id="bdb3f"><mark id="bdb3f"></mark></pre></pre><output id="bdb3f"></output><p id="bdb3f"></p><p id="bdb3f"></p>

          <pre id="bdb3f"><del id="bdb3f"><progress id="bdb3f"></progress></del></pre>

                <ruby id="bdb3f"></ruby>

                ??一站式輕松地調用各大LLM模型接口,支持GPT4、智譜、豆包、星火、月之暗面及文生圖、文生視頻 廣告
                ##一、關于抓包分析和debug Log信息 模擬登錄訪問需要設置request header信息,對于這個沒有概念的朋友可以參見本系列前面的java版爬蟲中提到的模擬登錄過程,主要就是添加請求頭request header。 而python抓包可以直接使用urllib2把debug Log打開,數據包的內容可以打印出來,這樣都可以不用抓包了,直接可以看到request header里的內容。 ~~~ import urllib2 httpHandler = urllib2.HTTPHandler(debuglevel = 1) httpsHandler = urllib2.HTTPSHandler(debuglevel = 1) opener = urllib2.build_opener(httpHandler, httpsHandler) urllib2.install_opener(opener) response = urllib2.urlopen(‘http://www.baidu.com’) html = response.read() ~~~ ![](https://box.kancloud.cn/2016-02-18_56c5641d2bc3a.jpg) 另外對于抓包,對比里各款瀏覽器自帶的開發者工具,覺得firefox的比Chrome的要好用,不僅數據包顯示清晰,而且各種操作也比Chrome的方便得多,還有一些Chrome沒有的功能。 分析下登錄新浪微博過程的數據包。 登錄前頁面: ![](https://box.kancloud.cn/2016-02-18_56c5641d40d8a.jpg) 點擊登錄,看下這個過程: ![](https://box.kancloud.cn/2016-02-18_56c5641d5b25e.jpg) 打開看數據包可以看到詳細的請求頭、發送Cookie、響應頭、傳回的文件/數據等信息。 ![](https://box.kancloud.cn/2016-02-18_56c5641d77ccc.jpg) 在Network選項卡里看看詳細的情況,這里是請求頭: ![](https://box.kancloud.cn/2016-02-18_56c5641d929e8.jpg) cookie存放的就是myuid和un賬號,之后模擬登錄要用到的cookie信息: ![](https://box.kancloud.cn/2016-02-18_56c5641db90d9.jpg) ##二、設置Headers到http請求 先看一個官方教程上的例子: ~~~ import urllib import urllib2 url = 'http://s.weibo.com' user_agent = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11' values = {'name':'denny', 'location':'BUPT', 'language':'Python' } headers = {'User-Agent':user_agent} data = urllib.urlencode(values, 1) request = urllib2.Request(url, data,headers) response = urllib2.urlopen(request) the_page = response.read() print the_page ~~~ 一個完整例子: ~~~ # -*- coding:utf8 -*- import urllib2 import re import StringIO import gzip ua = {#'User-Agent':'Mozilla/5.0 (compatible; Googlebot/2.1; +Googlebot - Webmaster Tools Help)', 'User-Agent':'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.124 Safari/537.36', 'Connection':'Keep-Alive', 'Accept-Language':'zh-CN,zh;q=0.8', 'Accept-Encoding':'gzip,deflate,sdch', 'Accept':'*/*', 'Accept-Charset':'GBK,utf-8;q=0.7,*;q=0.3', 'Cache-Control':'max-age=0' } def get_html(url_address): '''open url and read it''' req_http = urllib2.Request(url_address, headers = ua) html = urllib2.urlopen(req_http).read() return html def controller(): '''make url list and download page''' url = 'http://s.weibo.com/wb/iPhone&nodup=1&page=10' reget = re.compile('(<div class=\"post-wrapper.*?)<p class=\"pagination\">', re.DOTALL) fp = open("e:/weibo/head.txt", "w+") for i in range(1, 131): html_c = get_html(url % (i)) print url % (i) html_c = gzip.GzipFile(fileobj = StringIO.StringIO(html_c)).read() res = reget.findall(html_c) for x in res: fp.write(x) fp.write("\n\n\n") fp.close() return if __name__ == '__main__': controller() ~~~ 原創文章,轉載請注明出處:[http://blog.csdn.net/dianacody/article/details/39742711](http://blog.csdn.net/dianacody/article/details/39742711)
                  <ruby id="bdb3f"></ruby>

                  <p id="bdb3f"><cite id="bdb3f"></cite></p>

                    <p id="bdb3f"><cite id="bdb3f"><th id="bdb3f"></th></cite></p><p id="bdb3f"></p>
                      <p id="bdb3f"><cite id="bdb3f"></cite></p>

                        <pre id="bdb3f"></pre>
                        <pre id="bdb3f"><del id="bdb3f"><thead id="bdb3f"></thead></del></pre>

                        <ruby id="bdb3f"><mark id="bdb3f"></mark></ruby><ruby id="bdb3f"></ruby>
                        <pre id="bdb3f"><pre id="bdb3f"><mark id="bdb3f"></mark></pre></pre><output id="bdb3f"></output><p id="bdb3f"></p><p id="bdb3f"></p>

                        <pre id="bdb3f"><del id="bdb3f"><progress id="bdb3f"></progress></del></pre>

                              <ruby id="bdb3f"></ruby>

                              哎呀哎呀视频在线观看