# 解析部分文檔
如果僅僅因為想要查找文檔中的`<a>`標簽而將整片文檔進行解析,實在是浪費內存和時間.最快的方法是從一開始就把`<a>`標簽以外的東西都忽略掉. `SoupStrainer` 類可以定義文檔的某段內容,這樣搜索文檔時就不必先解析整篇文檔,只會解析在 `SoupStrainer` 中定義過的文檔. 創建一個 `SoupStrainer` 對象并作為 `parse_only` 參數給 `BeautifulSoup` 的構造方法即可.
## SoupStrainer
`SoupStrainer` 類接受與典型搜索方法相同的參數:[name](#id32) , [attrs](#css) , [recursive](#recursive) , [text](#text) , [**kwargs](#keyword) 。下面舉例說明三種 `SoupStrainer` 對象:
```
from bs4 import SoupStrainer
only_a_tags = SoupStrainer("a")
only_tags_with_id_link2 = SoupStrainer(id="link2")
def is_short_string(string):
return len(string) < 10
only_short_strings = SoupStrainer(text=is_short_string)
```
再拿“愛麗絲”文檔來舉例,來看看使用三種 `SoupStrainer` 對象做參數會有什么不同:
```
html_doc = """
<html><head><title>The Dormouse's story</title></head>
<p class="title"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>
"""
print(BeautifulSoup(html_doc, "html.parser", parse_only=only_a_tags).prettify())
# <a class="sister" href="http://example.com/elsie" id="link1">
# Elsie
# </a>
# <a class="sister" href="http://example.com/lacie" id="link2">
# Lacie
# </a>
# <a class="sister" href="http://example.com/tillie" id="link3">
# Tillie
# </a>
print(BeautifulSoup(html_doc, "html.parser", parse_only=only_tags_with_id_link2).prettify())
# <a class="sister" href="http://example.com/lacie" id="link2">
# Lacie
# </a>
print(BeautifulSoup(html_doc, "html.parser", parse_only=only_short_strings).prettify())
# Elsie
# ,
# Lacie
# and
# Tillie
# ...
#
```
還可以將 `SoupStrainer` 作為參數傳入 [搜索文檔樹](#id24) 中提到的方法.這可能不是個常用用法,所以還是提一下:
```
soup = BeautifulSoup(html_doc)
soup.find_all(only_short_strings)
# [u'\n\n', u'\n\n', u'Elsie', u',\n', u'Lacie', u' and\n', u'Tillie',
# u'\n\n', u'...', u'\n']
```