<ruby id="bdb3f"></ruby>

    <p id="bdb3f"><cite id="bdb3f"></cite></p>

      <p id="bdb3f"><cite id="bdb3f"><th id="bdb3f"></th></cite></p><p id="bdb3f"></p>
        <p id="bdb3f"><cite id="bdb3f"></cite></p>

          <pre id="bdb3f"></pre>
          <pre id="bdb3f"><del id="bdb3f"><thead id="bdb3f"></thead></del></pre>

          <ruby id="bdb3f"><mark id="bdb3f"></mark></ruby><ruby id="bdb3f"></ruby>
          <pre id="bdb3f"><pre id="bdb3f"><mark id="bdb3f"></mark></pre></pre><output id="bdb3f"></output><p id="bdb3f"></p><p id="bdb3f"></p>

          <pre id="bdb3f"><del id="bdb3f"><progress id="bdb3f"></progress></del></pre>

                <ruby id="bdb3f"></ruby>

                ??一站式輕松地調用各大LLM模型接口,支持GPT4、智譜、豆包、星火、月之暗面及文生圖、文生視頻 廣告
                # HtmlParser介紹 <div><p style="color: rgb(54, 46, 43); font-family: Arial;"><span style="font-size:24px;">1、相關資料</span></p><p style="color: rgb(54, 46, 43); font-family: Arial;"> 官方文檔:http://htmlparser.sourceforge.net/samples.html</p><p style="color: rgb(54, 46, 43); font-family: Arial;"> API:http://htmlparser.sourceforge.net/javadoc/index.html</p><p style="color: rgb(54, 46, 43); font-family: Arial;"> 其它HTML 解釋器:jsoup等。由于HtmlParser自2006年以后就再沒更新,目前很多人推薦使用jsoup代替它。</p><p style="color: rgb(54, 46, 43); font-family: Arial;"><br></p><p style="color: rgb(54, 46, 43); font-family: Arial;"><span style="font-size:24px;">2、使用HtmlPaser的關鍵步驟</span></p><p style="color: rgb(54, 46, 43); font-family: Arial;"> (1)通過Parser類創建一個解釋器</p><p style="color: rgb(54, 46, 43); font-family: Arial;"> (2)創建Filter或者Visitor</p><p style="color: rgb(54, 46, 43); font-family: Arial;"> (3)使用parser根據filter或者visitor來取得所有符合條件的節點</p><p style="color: rgb(54, 46, 43); font-family: Arial;"> (4)對節點內容進行處理</p><p style="color: rgb(54, 46, 43); font-family: Arial;"><br></p><p style="color: rgb(54, 46, 43); font-family: Arial;"><span style="font-size:24px;">3、使用Parser的構造函數創建解釋器</span></p><p style="color: rgb(54, 46, 43); font-family: Arial;"></p><table border="1" cellpadding="2" cellspacing="0" width="100%" style="color: rgb(0, 0, 0); font-family: Simsun; font-size: 14px;"><tbody><tr style="background-color:rgb(238,238,238);"><td style="height: 41px;"><code><strong><a href="http://htmlparser.sourceforge.net/javadoc/org/htmlparser/Parser.html#Parser%28%29" target="_blank" style="color:rgb(106,57,6);">Parser</a></strong>()</code>&nbsp;<br> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Zero argument constructor.</td></tr><tr style="background-color:rgb(238,238,238);"><td><code><strong><a href="http://htmlparser.sourceforge.net/javadoc/org/htmlparser/Parser.html#Parser%28org.htmlparser.lexer.Lexer%29" target="_blank" style="color:rgb(106,57,6);">Parser</a></strong>(<a title="class in org.htmlparser.lexer" href="http://htmlparser.sourceforge.net/javadoc/org/htmlparser/lexer/Lexer.html" target="_blank" style="color:rgb(106,57,6);">Lexer</a>&nbsp;lexer)</code>&nbsp;<br> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Construct a parser using the provided lexer.</td></tr><tr style="background-color:rgb(238,238,238);"><td><code><strong><a href="http://htmlparser.sourceforge.net/javadoc/org/htmlparser/Parser.html#Parser%28org.htmlparser.lexer.Lexer,%20org.htmlparser.util.ParserFeedback%29" target="_blank" style="color:rgb(106,57,6);">Parser</a></strong>(<a title="class in org.htmlparser.lexer" href="http://htmlparser.sourceforge.net/javadoc/org/htmlparser/lexer/Lexer.html" target="_blank" style="color:rgb(106,57,6);">Lexer</a>&nbsp;lexer,&nbsp;<a title="interface in org.htmlparser.util" href="http://htmlparser.sourceforge.net/javadoc/org/htmlparser/util/ParserFeedback.html" target="_blank" style="color:rgb(106,57,6);">ParserFeedback</a>&nbsp;fb)</code>&nbsp;<br> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Construct a parser using the provided lexer and feedback object.</td></tr><tr style="background-color:rgb(238,238,238);"><td><code><strong><a href="http://htmlparser.sourceforge.net/javadoc/org/htmlparser/Parser.html#Parser%28java.lang.String%29" target="_blank" style="color:rgb(106,57,6);">Parser</a></strong>(<a title="class or interface in java.lang" href="http://java.sun.com/j2se/1.4.2/docs/api/java/lang/String.html" target="_blank" style="color:rgb(106,57,6);">String</a>&nbsp;resource)</code>&nbsp;<br> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Creates a Parser object with the location of the resource (URL or file).</td></tr><tr style="background-color:rgb(238,238,238);"><td><code><strong><a href="http://htmlparser.sourceforge.net/javadoc/org/htmlparser/Parser.html#Parser%28java.lang.String,%20org.htmlparser.util.ParserFeedback%29" target="_blank" style="color:rgb(106,57,6);">Parser</a></strong>(<a title="class or interface in java.lang" href="http://java.sun.com/j2se/1.4.2/docs/api/java/lang/String.html" target="_blank" style="color:rgb(106,57,6);">String</a>&nbsp;resource,&nbsp;<a title="interface in org.htmlparser.util" href="http://htmlparser.sourceforge.net/javadoc/org/htmlparser/util/ParserFeedback.html" target="_blank" style="color:rgb(106,57,6);">ParserFeedback</a>&nbsp;feedback)</code>&nbsp;<br> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Creates a Parser object with the location of the resource (URL or file) You would typically create a DefaultHTMLParserFeedback object and pass it in.</td></tr><tr style="background-color:rgb(238,238,238);"><td><code><strong><a href="http://htmlparser.sourceforge.net/javadoc/org/htmlparser/Parser.html#Parser%28java.net.URLConnection%29" target="_blank" style="color:rgb(106,57,6);">Parser</a></strong>(<a title="class or interface in java.net" href="http://java.sun.com/j2se/1.4.2/docs/api/java/net/URLConnection.html" target="_blank" style="color:rgb(106,57,6);">URLConnection</a>&nbsp;connection)</code>&nbsp;<br> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Construct a parser using the provided URLConnection.</td></tr><tr style="background-color:rgb(238,238,238);"><td><code><strong><a href="http://htmlparser.sourceforge.net/javadoc/org/htmlparser/Parser.html#Parser%28java.net.URLConnection,%20org.htmlparser.util.ParserFeedback%29" target="_blank" style="color:rgb(106,57,6);">Parser</a></strong>(<a title="class or interface in java.net" href="http://java.sun.com/j2se/1.4.2/docs/api/java/net/URLConnection.html" target="_blank" style="color:rgb(106,57,6);">URLConnection</a>&nbsp;connection,&nbsp;<a title="interface in org.htmlparser.util" href="http://htmlparser.sourceforge.net/javadoc/org/htmlparser/util/ParserFeedback.html" target="_blank" style="color:rgb(106,57,6);">ParserFeedback</a>&nbsp;fb)</code>&nbsp;<br> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Constructor for custom HTTP access.</td></tr></tbody></table><span style="color: rgb(54, 46, 43); font-family: Arial;">&nbsp; &nbsp; &nbsp; &nbsp; 對于大多數使用者來說,使用最多的是通過一個</span><span style="color: blue; font-family: Arial;">URLConnection</span><span style="color: rgb(54, 46, 43); font-family: Arial;">或者一個保存有網頁內容的字符串來初始化Parser,或者使用靜態函數來生成一個Parser對象。</span><span style="color: blue; font-family: Arial;">ParserFeedback</span><span style="color: rgb(54, 46, 43); font-family: Arial;">的代碼很簡單,是針對調試和跟蹤分析過程的,一般不需要改變。而使用</span><span style="color: green; font-family: Arial;">Lexer</span><span style="color: rgb(54, 46, 43); font-family: Arial;">則是一個相對比較高級的話題,放到以后再討論吧。</span><br style="color: rgb(54, 46, 43); font-family: Arial;"><span style="color: rgb(54, 46, 43); font-family: Arial;">&nbsp; &nbsp; &nbsp; &nbsp; 這里比較有趣的一點是,如果需要設置頁面的編碼方式的話,不使用Lexer就只有靜態函數一個方法了。對于大多數中文頁面來說,好像這是應該用得比較多的一個方法。</span><br style="color: rgb(54, 46, 43); font-family: Arial;"><p style="color: rgb(54, 46, 43); font-family: Arial;"></p><p style="color: rgb(54, 46, 43); font-family: Arial;"><br></p><p style="color: rgb(54, 46, 43); font-family: Arial;"><span style="font-size:24px;">4、HtmlPaser使用Node對象保存各節點信息</span></p><p style="color: rgb(54, 46, 43); font-family: Arial;"><img src="http://note.youdao.com/yws/res/10738/977917BD60E34D578F9EB0747420F7BB" data-media-type="image" /><br></p><p style="color: rgb(54, 46, 43); font-family: Arial;"> (1)訪問各個節點的方法<br> Node&nbsp;<span style="color:blue;">getParent</span>&nbsp;():取得父節點<br> NodeList&nbsp;<span style="color:blue;">getChildren</span>&nbsp;():取得子節點的列表<br> Node&nbsp;<span style="color:blue;">getFirstChild</span>&nbsp;():取得第一個子節點<br> Node&nbsp;<span style="color:blue;">getLastChild</span>&nbsp;():取得最后一個子節點<br> Node&nbsp;<span style="color:blue;">getPreviousSibling</span>&nbsp;():取得前一個兄弟(不好意思,英文是兄弟姐妹,直譯太麻煩而且不符合習慣,對不起女同胞了)<br> Node&nbsp;<span style="color:blue;">getNextSibling</span>&nbsp;():取得下一個兄弟節點<br> (2)取得<span style="color:fuchsia;">Node</span>內容的函數<br> String&nbsp;<span style="color:blue;">getText</span>&nbsp;():取得文本<br> String&nbsp;<span style="color:blue;">toPlainTextString</span>():取得純文本信息。<br> String&nbsp;<span style="color:blue;">toHtml</span>&nbsp;()&nbsp;:取得<span style="color:green;">HTML</span>信息(原始<span style="color:green;">HTML</span>)<br> String&nbsp;<span style="color:blue;">toHtml</span>&nbsp;(boolean verbatim):取得<span style="color:green;">HTML</span>信息(原始<span style="color:green;">HTML</span>)<br> String&nbsp;<span style="color:blue;">toString</span>&nbsp;():取得字符串信息(原始<span style="color:green;">HTML</span>)<br> Page&nbsp;<span style="color:blue;">getPage</span>&nbsp;():取得這個<span style="color:green;">Node</span>對應的<span style="color:green;">Page</span>對象<br> int&nbsp;<span style="color:blue;">getStartPosition</span>&nbsp;():取得這個<span style="color:green;">Node</span>在<span style="color:green;">HTML</span>頁面中的起始位置<br> int&nbsp;<span style="color:blue;">getEndPosition</span>&nbsp;():取得這個<span style="color:green;">Node</span>在<span style="color:green;">HTML</span>頁面中的結束位置</p><p style="color: rgb(54, 46, 43); font-family: Arial;"><br></p><p style="color: rgb(54, 46, 43); font-family: Arial;"><span style="font-size:24px;">5、使用Filter訪問Node節點及其內容</span></p><p style="color: rgb(54, 46, 43); font-family: Arial;"><span style="font-size:18px;">(1)Filter的種類</span></p><p style="color: rgb(54, 46, 43); font-family: Arial;"> 顧名思義,Filter就是對于結果進行過濾,取得需要的內容。</p><p style="color: rgb(54, 46, 43); font-family: Arial;"> 所有的Filter均實現了NodeFilter接口,此接口只有一個方法Boolean accept(Node node),用于確定某個節點是否屬于此Filter過濾的范圍。</p><p style="color: rgb(54, 46, 43); font-family: Arial;"> HTMLParser在org.htmlparser.filters包之內一共定義了16個不同的Filter,也可以分為幾類。<br><span style="color:green;"><a href="http://www.baizeju.com/html/HTMLParser/200807/07-121.html#%E5%88%A4%E6%96%AD%E7%B1%BBFilter" target="_blank" style="color:rgb(16,138,198);"><strong>判斷類<span style="color:green;">Filter</span>:</strong></a></span><br><span style="color:blue;">TagNameFilter</span><span style="color:blue;"><br> HasAttributeFilter</span><br> HasChildFilter<br> HasParentFilter<br> HasSiblingFilter<br> IsEqualFilter<br><span style="color:green;"><a href="http://www.baizeju.com/html/HTMLParser/200807/07-121.html#%E9%80%BB%E8%BE%91%E8%BF%90%E7%AE%97Filter" target="_blank" style="color:rgb(16,138,198);"><strong>邏輯運算<span style="color:green;">Filter</span>:</strong></a></span><br><span style="color:blue;">AndFilter</span><span style="color:blue;"><br> NotFilter</span><br> OrFilter<br> XorFilter<br><span style="color:green;"><a href="http://www.baizeju.com/html/HTMLParser/200807/07-121.html#%E5%85%B6%E4%BB%96Filter" target="_blank" style="color:rgb(16,138,198);"><strong>其他<span style="color:green;">Filter</span>:</strong></a></span><br><span style="color:blue;">NodeClassFilter</span><span style="color:blue;"><br> StringFilter</span><br> LinkStringFilter<br> LinkRegexFilter<br> RegexFilter<br> CssSelectorNodeFilter</p><p style="color: rgb(54, 46, 43); font-family: Arial;"> 除此以外,可以自定義一些Filter,用于完成特殊需求的過濾。<br><span style="font-size:18px;">(2)Filter的使用示例</span></p><p style="color: rgb(54, 46, 43); font-family: Arial;"> 以下示例用于提取HTML文件中的鏈接</p><p style="color: rgb(54, 46, 43); font-family: Arial;"></p><div style="background-color:rgb(231,229,220);color:rgb(54,46,43);font-family:Consolas,'Courier New',Courier,mono,serif;"><div><div style="background-color:rgb(248,248,248);color:silver;font-family:Verdana,Geneva,Arial,Helvetica,sans-serif;font-size:9px;"><strong>[java]</strong>&nbsp;<a title="view plain" href="http://blog.csdn.net/jediael_lu/article/details/26396705#" target="_blank" style="color: rgb(160, 160, 160);">view plain</a><a title="copy" href="http://blog.csdn.net/jediael_lu/article/details/26396705#" target="_blank" style="color: rgb(160, 160, 160);">copy</a><a title="在CODE上查看代碼片" href="https://code.csdn.net/snippets/356130" target="_blank" style="color: rgb(160, 160, 160);"><img src="http://note.youdao.com/yws/res/10737/F9100224A02B471E9B4A148E168E4281" alt="在CODE上查看代碼片" width="12" height="12" data-media-type="image" /></a><a title="派生到我的代碼片" href="https://code.csdn.net/snippets/356130/fork" target="_blank" style="color: rgb(160, 160, 160);"><img src="https://code.csdn.net/assets/ico_fork.svg" alt="派生到我的代碼片" width="12" height="12" data-media-type="image" /></a><div></div></div></div><ol start="1" style="background-color:rgb(255,255,255);color:rgb(92,92,92);"><li style="color:inherit;"><span style="color: black;"><span style="color: rgb(0, 102, 153); font-weight: bold;">package</span>&nbsp;org.ljh.search.html;&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;"><span style="color: rgb(0, 102, 153); font-weight: bold;">import</span>&nbsp;java.util.HashSet;&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;"><span style="color: rgb(0, 102, 153); font-weight: bold;">import</span>&nbsp;java.util.Set;&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;"><span style="color: rgb(0, 102, 153); font-weight: bold;">import</span>&nbsp;org.htmlparser.Node;&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;"><span style="color: rgb(0, 102, 153); font-weight: bold;">import</span>&nbsp;org.htmlparser.NodeFilter;&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;"><span style="color: rgb(0, 102, 153); font-weight: bold;">import</span>&nbsp;org.htmlparser.Parser;&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;"><span style="color: rgb(0, 102, 153); font-weight: bold;">import</span>&nbsp;org.htmlparser.filters.NodeClassFilter;&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;"><span style="color: rgb(0, 102, 153); font-weight: bold;">import</span>&nbsp;org.htmlparser.filters.OrFilter;&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;"><span style="color: rgb(0, 102, 153); font-weight: bold;">import</span>&nbsp;org.htmlparser.tags.LinkTag;&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;"><span style="color: rgb(0, 102, 153); font-weight: bold;">import</span>&nbsp;org.htmlparser.util.NodeList;&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;"><span style="color: rgb(0, 102, 153); font-weight: bold;">import</span>&nbsp;org.htmlparser.util.ParserException;&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;"><span style="color: rgb(0, 130, 0);">//本類創建用于HTML文件解釋工具</span>&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;"><span style="color: rgb(0, 102, 153); font-weight: bold;">public</span>&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">class</span>&nbsp;HtmlParserTool&nbsp;{&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: rgb(0, 130, 0);">//&nbsp;本方法用于提取某個html文檔中內嵌的鏈接</span>&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">public</span>&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">static</span>&nbsp;Set&lt;String&gt;&nbsp;extractLinks(String&nbsp;url,&nbsp;LinkFilter&nbsp;filter)&nbsp;{&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Set&lt;String&gt;&nbsp;links&nbsp;=&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">new</span>&nbsp;HashSet&lt;String&gt;();&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">try</span>&nbsp;{&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: rgb(0, 130, 0);">//&nbsp;1、構造一個Parser,并設置相關的屬性</span>&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Parser&nbsp;parser&nbsp;=&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">new</span>&nbsp;Parser(url);&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;parser.setEncoding(<span style="color: blue;">"gb2312"</span>);&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: rgb(0, 130, 0);">//&nbsp;2.1、自定義一個Filter,用于過濾&lt;Frame&nbsp;&gt;標簽,然后取得標簽中的src屬性值</span>&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;NodeFilter&nbsp;frameNodeFilter&nbsp;=&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">new</span>&nbsp;NodeFilter()&nbsp;{&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: rgb(100, 100, 100);">@Override</span>&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">public</span>&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">boolean</span>&nbsp;accept(Node&nbsp;node)&nbsp;{&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">if</span>&nbsp;(node.getText().startsWith(<span style="color: blue;">"frame&nbsp;src="</span>))&nbsp;{&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">return</span>&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">true</span>;&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">else</span>&nbsp;{&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">return</span>&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">false</span>;&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;};&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: rgb(0, 130, 0);">//2.2、創建第二個Filter,過濾&lt;a&gt;標簽</span>&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;NodeFilter&nbsp;aNodeFilter&nbsp;=&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">new</span>&nbsp;NodeClassFilter(LinkTag.<span style="color: rgb(0, 102, 153); font-weight: bold;">class</span>);&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: rgb(0, 130, 0);">//2.3、凈土上述2個Filter形成一個組合邏輯Filter。</span>&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;OrFilter&nbsp;linkFilter&nbsp;=&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">new</span>&nbsp;OrFilter(frameNodeFilter,&nbsp;aNodeFilter);&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: rgb(0, 130, 0);">//3、使用parser根據filter來取得所有符合條件的節點</span>&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;NodeList&nbsp;nodeList&nbsp;=&nbsp;parser.extractAllNodesThatMatch(linkFilter);&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: rgb(0, 130, 0);">//4、對取得的Node進行處理</span>&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">for</span>(<span style="color: rgb(0, 102, 153); font-weight: bold;">int</span>&nbsp;i&nbsp;=&nbsp;<span style="color: rgb(192, 0, 0);">0</span>;&nbsp;i&lt;nodeList.size();i++){&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Node&nbsp;node&nbsp;=&nbsp;nodeList.elementAt(i);&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;String&nbsp;linkURL&nbsp;=&nbsp;<span style="color: blue;">""</span>;&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: rgb(0, 130, 0);">//如果鏈接類型為&lt;a&nbsp;/&gt;</span>&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">if</span>(node&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">instanceof</span>&nbsp;LinkTag){&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;LinkTag&nbsp;link&nbsp;=&nbsp;(LinkTag)node;&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;linkURL=&nbsp;link.getLink();&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}<span style="color: rgb(0, 102, 153); font-weight: bold;">else</span>{&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: rgb(0, 130, 0);">//如果類型為&lt;frame&nbsp;/&gt;</span>&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;String&nbsp;nodeText&nbsp;=&nbsp;node.getText();&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">int</span>&nbsp;beginPosition&nbsp;=&nbsp;nodeText.indexOf(<span style="color: blue;">"src="</span>);&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;nodeText&nbsp;=&nbsp;nodeText.substring(beginPosition);&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">int</span>&nbsp;endPosition&nbsp;=&nbsp;nodeText.indexOf(<span style="color: blue;">"&nbsp;"</span>);&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">if</span>(endPosition&nbsp;==&nbsp;-<span style="color: rgb(192, 0, 0);">1</span>){&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;endPosition&nbsp;=&nbsp;nodeText.indexOf(<span style="color: blue;">"&gt;"</span>);&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;linkURL&nbsp;=&nbsp;nodeText.substring(<span style="color: rgb(192, 0, 0);">5</span>,&nbsp;endPosition&nbsp;-&nbsp;<span style="color: rgb(192, 0, 0);">1</span>);&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: rgb(0, 130, 0);">//判斷是否屬于本次搜索范圍的url</span>&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">if</span>(filter.accept(linkURL)){&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;links.add(linkURL);&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">catch</span>&nbsp;(ParserException&nbsp;e)&nbsp;{&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;e.printStackTrace();&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">return</span>&nbsp;links;&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">}&nbsp;&nbsp;</span></li></ol></div><p style="color: rgb(54, 46, 43); font-family: Arial;"> 程序中的一些說明:</p><p style="color: rgb(54, 46, 43); font-family: Arial;"> (1)通過Node#getText()取得節點的String。</p><p style="color: rgb(54, 46, 43); font-family: Arial;"> (2)node instanceof TagLink,即&lt;a/&gt;節點,其它還有很多的類似節點,如tableTag等,基本上每個常見的html標簽均會對應一個tag。官方文檔說明如下:</p><p style="color: rgb(54, 46, 43); font-family: Arial;"></p><table border="1" cellpadding="2" cellspacing="0" width="100%" style="color: rgb(0, 0, 0); font-family: Simsun; font-size: 14px;"><tbody><tr><td><strong><a href="http://htmlparser.sourceforge.net/javadoc/org/htmlparser/nodes/package-summary.html" target="_blank" style="color:rgb(106,57,6);">org.htmlparser.nodes</a></strong></td><td>The nodes package has the concrete node implementations.</td></tr><tr><td><strong><a href="http://htmlparser.sourceforge.net/javadoc/org/htmlparser/tags/package-summary.html" target="_blank" style="color:rgb(106,57,6);">org.htmlparser.tags</a></strong></td><td>The tags package contains specific tags.</td></tr></tbody></table><span style="color: rgb(54, 46, 43); font-family: Arial;">因此可以通過此方法直接判斷一個節點是否某個標簽內容。</span><p style="color: rgb(54, 46, 43); font-family: Arial;"></p><p style="color: rgb(54, 46, 43); font-family: Arial;"><br></p><p style="color: rgb(54, 46, 43); font-family: Arial;"> 其中用到的LinkFilter接口定義如下:</p><p style="color: rgb(54, 46, 43); font-family: Arial;"></p><div style="background-color:rgb(231,229,220);color:rgb(54,46,43);font-family:Consolas,'Courier New',Courier,mono,serif;"><div><div style="background-color:rgb(248,248,248);color:silver;font-family:Verdana,Geneva,Arial,Helvetica,sans-serif;font-size:9px;"><strong>[java]</strong><div></div></div></div><ol start="1" style="background-color:rgb(255,255,255);color:rgb(92,92,92);"><li style="color:inherit;"><span style="color: black;"><span style="color: rgb(0, 102, 153); font-weight: bold;">package</span>&nbsp;org.ljh.search.html;&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;"><span style="color: rgb(0, 130, 0);">//本接口所定義的過濾器,用于判斷url是否屬于本次搜索范圍。</span>&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;"><span style="color: rgb(0, 102, 153); font-weight: bold;">public</span>&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">interface</span>&nbsp;LinkFilter&nbsp;{&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">public</span>&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">boolean</span>&nbsp;accept(String&nbsp;url);&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">}&nbsp;&nbsp;</span></li></ol></div><p style="color: rgb(54, 46, 43); font-family: Arial;"><br></p><p style="color: rgb(54, 46, 43); font-family: Arial;"> 測試程序如下:</p><p style="color: rgb(54, 46, 43); font-family: Arial;"></p><div style="background-color:rgb(231,229,220);color:rgb(54,46,43);font-family:Consolas,'Courier New',Courier,mono,serif;"><div><div style="background-color:rgb(248,248,248);color:silver;font-family:Verdana,Geneva,Arial,Helvetica,sans-serif;font-size:9px;"><strong>[java]</strong>&nbsp;<div></div></div></div><ol start="1" style="background-color:rgb(255,255,255);color:rgb(92,92,92);"><li style="color:inherit;"><span style="color: black;"><span style="color: rgb(0, 102, 153); font-weight: bold;">package</span>&nbsp;org.ljh.search.html;&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;"><span style="color: rgb(0, 102, 153); font-weight: bold;">import</span>&nbsp;java.util.Iterator;&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;"><span style="color: rgb(0, 102, 153); font-weight: bold;">import</span>&nbsp;java.util.Set;&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;"><span style="color: rgb(0, 102, 153); font-weight: bold;">import</span>&nbsp;org.junit.Test;&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;"><span style="color: rgb(0, 102, 153); font-weight: bold;">public</span>&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">class</span>&nbsp;HtmlParserToolTest&nbsp;{&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: rgb(100, 100, 100);">@Test</span>&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">public</span>&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">void</span>&nbsp;testExtractLinks()&nbsp;{&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;String&nbsp;url&nbsp;=&nbsp;<span style="color: blue;">"http://www.baidu.com"</span>;&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;LinkFilter&nbsp;linkFilter&nbsp;=&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">new</span>&nbsp;LinkFilter(){&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: rgb(100, 100, 100);">@Override</span>&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">public</span>&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">boolean</span>&nbsp;accept(String&nbsp;url)&nbsp;{&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">if</span>(url.contains(<span style="color: blue;">"baidu"</span>)){&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">return</span>&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">true</span>;&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}<span style="color: rgb(0, 102, 153); font-weight: bold;">else</span>{&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">return</span>&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">false</span>;&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;};&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Set&lt;String&gt;&nbsp;urlSet&nbsp;=&nbsp;HtmlParserTool.extractLinks(url,&nbsp;linkFilter);&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Iterator&lt;String&gt;&nbsp;it&nbsp;=&nbsp;urlSet.iterator();&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">while</span>(it.hasNext()){&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;System.out.println(it.next());&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">}&nbsp;&nbsp;</span></li></ol></div><p style="color: rgb(54, 46, 43); font-family: Arial;"><br></p><span style="color: rgb(54, 46, 43); font-family: Arial;">輸出結果如下:</span><p style="color: rgb(54, 46, 43); font-family: Arial;"></p><p style="color: rgb(54, 46, 43); font-family: Arial;"> http://www.hao123.com<br> http://www.baidu.com/<br> http://www.baidu.com/duty/<br> http://v.baidu.com/v?ct=301989888&amp;rn=20&amp;pn=0&amp;db=0&amp;s=25&amp;word=<br> http://music.baidu.com<br> http://ir.baidu.com<br> http://www.baidu.com/gaoji/preferences.html<br> http://news.baidu.com<br> http://map.baidu.com<br> http://music.baidu.com/search?fr=ps&amp;key=<br> http://image.baidu.com<br> http://zhidao.baidu.com<br> http://image.baidu.com/i?tn=baiduimage&amp;ct=201326592&amp;lm=-1&amp;cl=2&amp;nc=1&amp;word=<br> http://www.baidu.com/more/<br> http://shouji.baidu.com/baidusearch/mobisearch.html?ref=pcjg&amp;from=1000139w<br> http://wenku.baidu.com<br> http://news.baidu.com/ns?cl=2&amp;rn=20&amp;tn=news&amp;word=<br> https://passport.baidu.com/v2/?login&amp;tpl=mn&amp;u=http%3A%2F%2Fwww.baidu.com%2F<br> http://www.baidu.com/cache/sethelp/index.html<br> http://zhidao.baidu.com/q?ct=17&amp;pn=0&amp;tn=ikaslist&amp;rn=10&amp;word=&amp;fr=wwwt<br> http://tieba.baidu.com/f?kw=&amp;fr=wwwt<br> http://home.baidu.com<br> https://passport.baidu.com/v2/?reg&amp;regType=1&amp;tpl=mn&amp;u=http%3A%2F%2Fwww.baidu.com%2F<br> http://v.baidu.com<br> http://e.baidu.com/?refer=888<br> ;<br> http://tieba.baidu.com<br> http://baike.baidu.com<br> http://wenku.baidu.com/search?word=&amp;lm=0&amp;od=0<br> http://top.baidu.com<br> http://map.baidu.com/m?word=&amp;fr=ps01000</p></div>
                  <ruby id="bdb3f"></ruby>

                  <p id="bdb3f"><cite id="bdb3f"></cite></p>

                    <p id="bdb3f"><cite id="bdb3f"><th id="bdb3f"></th></cite></p><p id="bdb3f"></p>
                      <p id="bdb3f"><cite id="bdb3f"></cite></p>

                        <pre id="bdb3f"></pre>
                        <pre id="bdb3f"><del id="bdb3f"><thead id="bdb3f"></thead></del></pre>

                        <ruby id="bdb3f"><mark id="bdb3f"></mark></ruby><ruby id="bdb3f"></ruby>
                        <pre id="bdb3f"><pre id="bdb3f"><mark id="bdb3f"></mark></pre></pre><output id="bdb3f"></output><p id="bdb3f"></p><p id="bdb3f"></p>

                        <pre id="bdb3f"><del id="bdb3f"><progress id="bdb3f"></progress></del></pre>

                              <ruby id="bdb3f"></ruby>

                              哎呀哎呀视频在线观看