[Xunsearch PHP-SDK](http://www.xunsearch.com) v1.4.8 API 參考文檔
# XSTokenizerScws
[All Packages](#)| [方法(函數)](#)
| 包 | [XS.tokenizer](#) |
|-----|-----|
| 繼承關系 | class XSTokenizerScws |
| 實現接口 | [XSTokenizer](#) |
| 始于 | 1.3.1 |
| 版本 | 1.0.0 |
| 源代碼 | [sdk/php/lib/XSTokenizer.class.php](https://github.com/hightman/xunsearch/blob/master/sdk/php/lib/XSTokenizer.class.php) |
SCWS - 分詞器(與搜索服務端通訊)
### Public 方法
[隱去繼承來的方法](#)
| 名稱 | 描述 | 定義于 |
|-----|-----|-----|
| [__construct()](#) | 構造函數 | XSTokenizerScws |
| [addDict()](#) | 添加分詞詞典, 支持 TXT/XDB 格式 | XSTokenizerScws |
| [getResult()](#) | 獲取分詞結果 | XSTokenizerScws |
| [getTokens()](#) | XSTokenizer 接口 | XSTokenizerScws |
| [getTops()](#) | 獲取重要詞統計結果 | XSTokenizerScws |
| [getVersion()](#) | 獲取 scws 版本號 | XSTokenizerScws |
| [hasWord()](#) | 判斷是否包含指定詞性的詞 | XSTokenizerScws |
| [setCharset()](#) | 設置字符集 | XSTokenizerScws |
| [setDict()](#) | 設置分詞詞典, 支持 TXT/XDB 格式 | XSTokenizerScws |
| [setDuality()](#) | 設置散字二元組合 | XSTokenizerScws |
| [setIgnore()](#) | 設置忽略標點符號 | XSTokenizerScws |
| [setMulti()](#) | 設置復合分詞選項 | XSTokenizerScws |
### 方法明細
__construct()方法
<table class="summaryTable"><tr><td colspan="3"><div class="signature2">public void <b>__construct</b>(string $arg=NULL)</div></td></tr><tr><td class="paramNameCol">$arg</td> <td class="paramTypeCol">string</td> <td class="paramDescCol">復合等級參數,默認不指定</td></tr></table>
**源碼:**[sdk/php/lib/XSTokenizer.class.php#L188](https://github.com/hightman/xunsearch/blob/master/sdk/php/lib/XSTokenizer.class.php#L188) (**[顯示](#)**)
`public?function?__construct($arg?=?null)
{
????if?(self::$_server?===?null)?{
????????$xs?=?XS::getLastXS();
????????if?($xs?===?null)?{
????????????throw?new?XSException('An?XS?instance?should?be?created?before?using?'?.?__CLASS__);
????????}
????????self::$_server?=?$xs->getScwsServer();
????????self::$_server->setTimeout(0);
????????self::$_charset?=?$xs->getDefaultCharset();
????????//?constants
????????if?(!defined('SCWS_MULTI_NONE'))?{
????????????define('SCWS_MULTI_NONE',?0);
????????????define('SCWS_MULTI_SHORT',?1);
????????????define('SCWS_MULTI_DUALITY',?2);
????????????define('SCWS_MULTI_ZMAIN',?4);
????????????define('SCWS_MULTI_ZALL',?8);
????????}
????????if?(!defined('SCWS_XDICT_XDB'))?{
????????????define('SCWS_XDICT_XDB',?1);
????????????define('SCWS_XDICT_MEM',?2);
????????????define('SCWS_XDICT_TXT',?4);
????????}
????}
????if?($arg?!==?null?&&?$arg?!==?'')?{
????????$this->setMulti($arg);
????}
}`
構造函數初始化用于分詞的搜索服務端
addDict()方法
<table class="summaryTable"><tr><td colspan="3"><div class="signature2">public XSTokenizerScws <b>addDict</b>(string $fpath, int $mode=NULL)</div></td></tr><tr><td class="paramNameCol">$fpath</td> <td class="paramTypeCol">string</td> <td class="paramDescCol">服務端的詞典路徑</td></tr><tr><td class="paramNameCol">$mode</td> <td class="paramTypeCol">int</td> <td class="paramDescCol">詞典類型, 常量: SCWS_XDICT_XDB|SCWS_XDICT_TXT|SCWS_XDICT_MEM</td></tr><tr><td class="paramNameCol">{return}</td> <td class="paramTypeCol">XSTokenizerScws</td> <td class="paramDescCol">返回對象本身以支持串接操作</td></tr></table>
**源碼:**[sdk/php/lib/XSTokenizer.class.php#L299](https://github.com/hightman/xunsearch/blob/master/sdk/php/lib/XSTokenizer.class.php#L299) (**[顯示](#)**)
`public?function?addDict($fpath,?$mode?=?null)
{
????if?(!is_int($mode))?{
????????$mode?=?stripos($fpath,?'.txt')?!==?false???SCWS_XDICT_TXT?:?SCWS_XDICT_XDB;
????}
????if?(!isset($this->_setting['add_dict']))?{
????????$this->_setting['add_dict']?=?array();
????}
????$this->_setting['add_dict'][]?=?new?XSCommand(CMD_SEARCH_SCWS_SET,?CMD_SCWS_ADD_DICT,?$mode,?$fpath);
????return?$this;
}`
添加分詞詞典, 支持 TXT/XDB 格式
getResult()方法
<table class="summaryTable"><tr><td colspan="3"><div class="signature2">public array <b>getResult</b>(string $text)</div></td></tr><tr><td class="paramNameCol">$text</td> <td class="paramTypeCol">string</td> <td class="paramDescCol">待分詞的文本</td></tr><tr><td class="paramNameCol">{return}</td> <td class="paramTypeCol">array</td> <td class="paramDescCol">返回詞匯數組, 每個詞匯是包含 [off:詞在文本中的位置,attr:詞性,word:詞]</td></tr></table>
**源碼:**[sdk/php/lib/XSTokenizer.class.php#L339](https://github.com/hightman/xunsearch/blob/master/sdk/php/lib/XSTokenizer.class.php#L339) (**[顯示](#)**)
`public?function?getResult($text)
{
????$words?=?array();
????$text?=?$this->applySetting($text);
????$cmd?=?new?XSCommand(CMD_SEARCH_SCWS_GET,?CMD_SCWS_GET_RESULT,?0,?$text);
????$res?=?self::$_server->execCommand($cmd,?CMD_OK_SCWS_RESULT);
????while?($res->buf?!==?'')?{
????????$tmp?=?unpack('Ioff/a4attr/a*word',?$res->buf);
????????$tmp['word']?=?XS::convert($tmp['word'],?self::$_charset,?'UTF-8');
????????$words[]?=?$tmp;
????????$res?=?self::$_server->getRespond();
????}
????return?$words;
}`
獲取分詞結果
getTokens()方法
<table class="summaryTable"><tr><td colspan="3"><div class="signature2">public void <b>getTokens</b>($value, $doc=NULL)</div></td></tr><tr><td class="paramNameCol">$value</td> <td class="paramTypeCol"></td> <td class="paramDescCol"></td></tr><tr><td class="paramNameCol">$doc</td> <td class="paramTypeCol"></td> <td class="paramDescCol"></td></tr></table>
**源碼:**[sdk/php/lib/XSTokenizer.class.php#L220](https://github.com/hightman/xunsearch/blob/master/sdk/php/lib/XSTokenizer.class.php#L220) (**[顯示](#)**)
`public?function?getTokens($value,?XSDocument?$doc?=?null)
{
????$tokens?=?array();
????$this->setIgnore(true);
????//?save?charset,?force?to?use?UTF-8
????$_charset?=?self::$_charset;
????self::$_charset?=?'UTF-8';
????$words?=?$this->getResult($value);
????foreach?($words?as?$word)?{
????????$tokens[]?=?$word['word'];
????}
????//?restore?charset
????self::$_charset?=?$_charset;
????return?$tokens;
}`
XSTokenizer 接口
getTops()方法
<table class="summaryTable"><tr><td colspan="3"><div class="signature2">public array <b>getTops</b>(string $text, string $limit=10, $xattr='')</div></td></tr><tr><td class="paramNameCol">$text</td> <td class="paramTypeCol">string</td> <td class="paramDescCol">待分詞的文本</td></tr><tr><td class="paramNameCol">$limit</td> <td class="paramTypeCol">string</td> <td class="paramDescCol">在返回結果的詞性過濾, 多個詞性之間用逗號分隔, 以~開頭取反 如: 設為 n,v 表示只返回名詞和動詞; 設為 ~n,v 則表示返回名詞和動詞以外的其它詞</td></tr><tr><td class="paramNameCol">$xattr</td> <td class="paramTypeCol"></td> <td class="paramDescCol"></td></tr><tr><td class="paramNameCol">{return}</td> <td class="paramTypeCol">array</td> <td class="paramDescCol">返回詞匯數組, 每個詞匯是包含 [times:次數,attr:詞性,word:詞]</td></tr></table>
**源碼:**[sdk/php/lib/XSTokenizer.class.php#L361](https://github.com/hightman/xunsearch/blob/master/sdk/php/lib/XSTokenizer.class.php#L361) (**[顯示](#)**)
`public?function?getTops($text,?$limit?=?10,?$xattr?=?'')
{
????$words?=?array();
????$text?=?$this->applySetting($text);
????$cmd?=?new?XSCommand(CMD_SEARCH_SCWS_GET,?CMD_SCWS_GET_TOPS,?$limit,?$text,?$xattr);
????$res?=?self::$_server->execCommand($cmd,?CMD_OK_SCWS_TOPS);
????while?($res->buf?!==?'')?{
????????$tmp?=?unpack('Itimes/a4attr/a*word',?$res->buf);
????????$tmp['word']?=?XS::convert($tmp['word'],?self::$_charset,?'UTF-8');
????????$words[]?=?$tmp;
????????$res?=?self::$_server->getRespond();
????}
????return?$words;
}`
獲取重要詞統計結果
getVersion()方法
<table class="summaryTable"><tr><td colspan="3"><div class="signature2">public string <b>getVersion</b>()</div></td></tr><tr><td class="paramNameCol">{return}</td> <td class="paramTypeCol">string</td> <td class="paramDescCol">版本號</td></tr></table>
**源碼:**[sdk/php/lib/XSTokenizer.class.php#L327](https://github.com/hightman/xunsearch/blob/master/sdk/php/lib/XSTokenizer.class.php#L327) (**[顯示](#)**)
`public?function?getVersion()
{
????$cmd?=?new?XSCommand(CMD_SEARCH_SCWS_GET,?CMD_SCWS_GET_VERSION);
????$res?=?self::$_server->execCommand($cmd,?CMD_OK_INFO);
????return?$res->buf;
}`
獲取 scws 版本號
hasWord()方法
<table class="summaryTable"><tr><td colspan="3"><div class="signature2">public bool <b>hasWord</b>(string $text, string $xattr)</div></td></tr><tr><td class="paramNameCol">$text</td> <td class="paramTypeCol">string</td> <td class="paramDescCol">要判斷的文本</td></tr><tr><td class="paramNameCol">$xattr</td> <td class="paramTypeCol">string</td> <td class="paramDescCol">要判斷的詞性, 參見 <a href="XSTokenizerScws.html#getTops">getTops</a> 的說明</td></tr><tr><td class="paramNameCol">{return}</td> <td class="paramTypeCol">bool</td> <td class="paramDescCol">文本中是否包含指定詞性的詞匯</td></tr></table>
**源碼:**[sdk/php/lib/XSTokenizer.class.php#L382](https://github.com/hightman/xunsearch/blob/master/sdk/php/lib/XSTokenizer.class.php#L382) (**[顯示](#)**)
`public?function?hasWord($text,?$xattr)
{
????$text?=?$this->applySetting($text);
????$cmd?=?new?XSCommand(CMD_SEARCH_SCWS_GET,?CMD_SCWS_HAS_WORD,?0,?$text,?$xattr);
????$res?=?self::$_server->execCommand($cmd,?CMD_OK_INFO);
????return?$res->buf?===?'OK';
}`
判斷是否包含指定詞性的詞
setCharset()方法
<table class="summaryTable"><tr><td colspan="3"><div class="signature2">public XSTokenizerScws <b>setCharset</b>(string $charset)</div></td></tr><tr><td class="paramNameCol">$charset</td> <td class="paramTypeCol">string</td> <td class="paramDescCol"></td></tr><tr><td class="paramNameCol">{return}</td> <td class="paramTypeCol">XSTokenizerScws</td> <td class="paramDescCol">返回對象本身以支持串接操作</td></tr></table>
**源碼:**[sdk/php/lib/XSTokenizer.class.php#L242](https://github.com/hightman/xunsearch/blob/master/sdk/php/lib/XSTokenizer.class.php#L242) (**[顯示](#)**)
`public?function?setCharset($charset)
{
????self::$_charset?=?strtoupper($charset);
????if?(self::$_charset?==?'UTF8')?{
????????self::$_charset?=?'UTF-8';
????}
????return?$this;
}`
設置字符集默認字符集是 UTF-8, 這是指 [getResult](#) 系列函數的 $text 參數的字符集
setDict()方法
<table class="summaryTable"><tr><td colspan="3"><div class="signature2">public XSTokenizerScws <b>setDict</b>(string $fpath, int $mode=NULL)</div></td></tr><tr><td class="paramNameCol">$fpath</td> <td class="paramTypeCol">string</td> <td class="paramDescCol">服務端的詞典路徑</td></tr><tr><td class="paramNameCol">$mode</td> <td class="paramTypeCol">int</td> <td class="paramDescCol">詞典類型, 常量: SCWS_XDICT_XDB|SCWS_XDICT_TXT|SCWS_XDICT_MEM</td></tr><tr><td class="paramNameCol">{return}</td> <td class="paramTypeCol">XSTokenizerScws</td> <td class="paramDescCol">返回對象本身以支持串接操作</td></tr></table>
**源碼:**[sdk/php/lib/XSTokenizer.class.php#L283](https://github.com/hightman/xunsearch/blob/master/sdk/php/lib/XSTokenizer.class.php#L283) (**[顯示](#)**)
`public?function?setDict($fpath,?$mode?=?null)
{
????if?(!is_int($mode))?{
????????$mode?=?stripos($fpath,?'.txt')?!==?false???SCWS_XDICT_TXT?:?SCWS_XDICT_XDB;
????}
????$this->_setting['set_dict']?=?new?XSCommand(CMD_SEARCH_SCWS_SET,?CMD_SCWS_SET_DICT,?$mode,?$fpath);
????unset($this->_setting['add_dict']);
????return?$this;
}`
設置分詞詞典, 支持 TXT/XDB 格式
setDuality()方法
<table class="summaryTable"><tr><td colspan="3"><div class="signature2">public XSTokenizerScws <b>setDuality</b>(bool $yes=true)</div></td></tr><tr><td class="paramNameCol">$yes</td> <td class="paramTypeCol">bool</td> <td class="paramDescCol">是否開啟散字自動二分組合功能</td></tr><tr><td class="paramNameCol">{return}</td> <td class="paramTypeCol">XSTokenizerScws</td> <td class="paramDescCol">返回對象本身以支持串接操作</td></tr></table>
**源碼:**[sdk/php/lib/XSTokenizer.class.php#L316](https://github.com/hightman/xunsearch/blob/master/sdk/php/lib/XSTokenizer.class.php#L316) (**[顯示](#)**)
`public?function?setDuality($yes?=?true)
{
????$this->_setting['duality']?=?new?XSCommand(CMD_SEARCH_SCWS_SET,?CMD_SCWS_SET_DUALITY,?$yes?===?false
??????????????????????????0?:?1);
????return?$this;
}`
設置散字二元組合
setIgnore()方法
<table class="summaryTable"><tr><td colspan="3"><div class="signature2">public XSTokenizerScws <b>setIgnore</b>(bool $yes=true)</div></td></tr><tr><td class="paramNameCol">$yes</td> <td class="paramTypeCol">bool</td> <td class="paramDescCol">是否忽略</td></tr><tr><td class="paramNameCol">{return}</td> <td class="paramTypeCol">XSTokenizerScws</td> <td class="paramDescCol">返回對象本身以支持串接操作</td></tr></table>
**源碼:**[sdk/php/lib/XSTokenizer.class.php#L256](https://github.com/hightman/xunsearch/blob/master/sdk/php/lib/XSTokenizer.class.php#L256) (**[顯示](#)**)
`public?function?setIgnore($yes?=?true)
{
????$this->_setting['ignore']?=?new?XSCommand(CMD_SEARCH_SCWS_SET,?CMD_SCWS_SET_IGNORE,?$yes?===?false
??????????????????????????0?:?1);
????return?$this;
}`
設置忽略標點符號
setMulti()方法
<table class="summaryTable"><tr><td colspan="3"><div class="signature2">public XSTokenizerScws <b>setMulti</b>(int $mode=3)</div></td></tr><tr><td class="paramNameCol">$mode</td> <td class="paramTypeCol">int</td> <td class="paramDescCol">復合選項, 值范圍 0~15 默認為值為 3, 可使用常量組合: SCWS_MULTI_SHORT|SCWS_MULTI_DUALITY|SCWS_MULTI_ZMAIN|SCWS_MULTI_ZALL</td></tr><tr><td class="paramNameCol">{return}</td> <td class="paramTypeCol">XSTokenizerScws</td> <td class="paramDescCol">返回對象本身以支持串接操作</td></tr></table>
**源碼:**[sdk/php/lib/XSTokenizer.class.php#L270](https://github.com/hightman/xunsearch/blob/master/sdk/php/lib/XSTokenizer.class.php#L270) (**[顯示](#)**)
`public?function?setMulti($mode?=?3)
{
????$mode?=?intval($mode)?&?self::MULTI_MASK;
????$this->_setting['multi']?=?new?XSCommand(CMD_SEARCH_SCWS_SET,?CMD_SCWS_SET_MULTI,?$mode);
????return?$this;
}`
設置復合分詞選項
Copyright ? 2008-2011 by [杭州云圣網絡科技有限公司](http://www.xunsearch.com)
All Rights Reserved.
- 權威指南
- 新手上路
- 最新主要變動
- 概述
- 關于 Xunsearch PHP-SDK
- 安裝、升級
- 體驗 demo 項目
- 開發規范
- 開發流程
- 了解基礎對象
- 基礎對象概述
- XS 項目
- XSException 異常
- XSDocument 文檔
- XSIndex 索引管理
- XSSearch 搜索
- XSTokenizer 分詞接口
- 編寫項目配置文件
- 項目配置詳解
- 自定義分詞器
- 編寫第一個配置文件
- 管理索引
- 索引概述
- 添加文檔
- 更新、修改文檔
- 刪除文檔
- 清空索引
- 平滑重建索引
- 使用索引緩沖區
- 自定義SCWS詞庫
- 使用搜索
- 搜索概述
- 構建搜索語句
- 獲取搜索匹配結果
- 獲取搜索匹配數量
- 獲取熱門搜索詞
- 獲取相關搜索詞
- 搜索建議和糾錯
- 按字段值分面搜索
- 使用輔助工具
- RequiredCheck 運行檢測
- Indexer 索引管理器
- Quest 搜索測試工具
- SearchSkel 生成搜索代碼
- IniWizzard 配置文件向導
- Logger 搜索日志管理
- 專題
- 同義詞搜索功能
- 在SDK中使用SCWS分詞
- API 指南
- XS
- XS
- XSCommand
- XSComponent
- XSDocument
- XSErrorException
- XSException
- XSFieldMeta
- XSFieldScheme
- XSIndex
- XSSearch
- XSServer
- XS.tokenizer
- XSTokenizer
- XSTokenizerFull
- XSTokenizerNone
- XSTokenizerScws
- XSTokenizerSplit
- XSTokenizerXlen
- XSTokenizerXstep
- XS.util
- XSCsvDataSource
- XSDataFilter
- XSDatabaseDataSource
- XSDebugFilter
- XSJsonDataSource
- XSUtil
- XS.util.db
- XSDatabase
- XSDatabaseMySQL
- XSDatabaseMySQLI
- XSDatabasePDO
- XSDatabasePDO_MySQL
- XSDatabasePDO_PgSQL
- XSDatabasePDO_SQLite
- XSDatabasePgSQL
- XSDatabaseSQLite
- XSDatabaseSQLite3
- XS.utilf
- XSDataSource
- 其它文檔
- 關于 xunsearch
- 特色和優勢
- Xunsearch 架構簡圖
- 下載 Xunsearch
- 商業服務與支持
- XunSearch 授權許可證