# 6-字符串處理:分割,連接,填充
[原文鏈接](http://code.google.com/p/guava-libraries/wiki/StringsExplained) [譯文鏈接](http://ifeve.com/google-guava-strings) 譯者:沈義揚,校對:丁一
## 連接器[Joiner]
用分隔符把字符串序列連接起來也可能會遇上不必要的麻煩。如果字符串序列中含有null,那連接操作會更難。Fluent風格的[`Joiner`](http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/base/Joiner.html)讓連接字符串更簡單。
```
Joiner joiner = Joiner.on("; ").skipNulls();
return joiner.join("Harry", null, "Ron", "Hermione");
```
上述代碼返回”Harry; Ron; Hermione”。另外,useForNull(String)方法可以給定某個字符串來替換null,而不像skipNulls()方法是直接忽略null。 Joiner也可以用來連接對象類型,在這種情況下,它會把對象的toString()值連接起來。
```
Joiner.on(",").join(Arrays.asList(1, 5, 7)); // returns "1,5,7"
```
_警告:joiner實例總是不可變的。用來定義joiner目標語義的配置方法總會返回一個新的joiner實例。這使得joiner實例都是線程安全的,你可以將其定義為static final常量。_
## 拆分器[Splitter]
JDK內建的字符串拆分工具有一些古怪的特性。比如,String.split悄悄丟棄了尾部的分隔符。 問題:”,a,,b,”.split(“,”)返回?
1. “”, “a”, “”, “b”, “”
2. null, “a”, null, “b”, null
3. “a”, null, “b”
4. “a”, “b”
5. 以上都不對
正確答案是5:””, “a”, “”, “b”。只有尾部的空字符串被忽略了。 [`Splitter`](http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/Splitter.html)使用令人放心的、直白的流暢API模式對這些混亂的特性作了完全的掌控。
```
Splitter.on(',')
.trimResults()
.omitEmptyStrings()
.split("foo,bar,,?? qux");
```
上述代碼返回Iterable<String>,其中包含”foo”、”bar”和”qux”。Splitter可以被設置為按照任何模式、字符、字符串或字符匹配器拆分。
### 拆分器工廠
| **方法** | **描述** | **范例** |
|:--- |:--- |:--- |
| [`Splitter.on(char)`](http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/Splitter.html#on%28char%29) | 按單個字符拆分 | Splitter.on(‘;’) |
| [`Splitter.on(CharMatcher)`](http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/Splitter.html#on%28com.google.common.base.CharMatcher%29) | 按字符匹配器拆分 | Splitter.on(CharMatcher.BREAKING_WHITESPACE) |
| [`Splitter.on(String)`](http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/Splitter.html#on%28java.lang.String%29) | 按字符串拆分 | Splitter.on(“, ? “) |
| [`Splitter.on(Pattern)`](http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/Splitter.html#on%28java.util.regex.Pattern%29) [`Splitter.onPattern(String)`](http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/Splitter.html#onPattern%28java.lang.String%29) | 按正則表達式拆分 | Splitter.onPattern(“\r?\n”) |
| [`Splitter.fixedLength(int)`](http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/Splitter.html#fixedLength%28int%29) | 按固定長度拆分;最后一段可能比給定長度短,但不會為空。 | Splitter.fixedLength(3) |
### 拆分器修飾符
| **方法** | **描述** |
|:--- |:--- |
| [`omitEmptyStrings()`](http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/Splitter.html#omitEmptyStrings%28%29) | 從結果中自動忽略空字符串 |
| [`trimResults()`](http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/Splitter.html#trimResults%28%29) | 移除結果字符串的前導空白和尾部空白 |
| [`trimResults(CharMatcher)`](http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/…e/common/base/Splitter.html#trimResults%28com.google.common.base.CharMatcher%29) | 給定匹配器,移除結果字符串的前導匹配字符和尾部匹配字符 |
| [`limit(int)`](http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/Splitter.html#limit%28int%29) | 限制拆分出的字符串數量 |
如果你想要拆分器返回List,只要使用Lists.newArrayList(splitter.split(string))或類似方法。 _警告:splitter實例總是不可變的。用來定義splitter目標語義的配置方法總會返回一個新的splitter實例。這使得splitter實例都是線程安全的,你可以將其定義為static final常量。_
## 字符匹配器[CharMatcher]
在以前的Guava版本中,StringUtil類瘋狂地膨脹,其擁有很多處理字符串的方法:allAscii、collapse、collapseControlChars、collapseWhitespace、indexOfChars、lastIndexNotOf、numSharedChars、removeChars、removeCrLf、replaceChars、retainAllChars、strip、stripAndCollapse、stripNonDigits。 所有這些方法指向兩個概念上的問題:
1. 怎么才算匹配字符?
2. 如何處理這些匹配字符?
為了收拾這個泥潭,我們開發了CharMatcher。
直觀上,你可以認為一個CharMatcher實例代表著某一類字符,如數字或空白字符。事實上來說,CharMatcher實例就是對字符的布爾判斷——CharMatcher確實也實現了[Predicate<Character>](http://code.google.com/p/guava-libraries/wiki/FunctionalExplained#Predicate)——但類似”所有空白字符”或”所有小寫字母”的需求太普遍了,Guava因此創建了這一API。
然而使用CharMatcher的好處更在于它提供了一系列方法,讓你對字符作特定類型的操作:修剪[trim]、折疊[collapse]、移除[remove]、保留[retain]等等。CharMatcher實例首先代表概念1:怎么才算匹配字符?然后它還提供了很多操作概念2:如何處理這些匹配字符?這樣的設計使得API復雜度的線性增加可以帶來靈活性和功能兩方面的增長。
```
String noControl = CharMatcher.JAVA_ISO_CONTROL.removeFrom(string); //移除control字符
String theDigits = CharMatcher.DIGIT.retainFrom(string); //只保留數字字符
String spaced = CharMatcher.WHITESPACE.trimAndCollapseFrom(string, ' ');
//去除兩端的空格,并把中間的連續空格替換成單個空格
String noDigits = CharMatcher.JAVA_DIGIT.replaceFrom(string, "*"); //用*號替換所有數字
String lowerAndDigit = CharMatcher.JAVA_DIGIT.or(CharMatcher.JAVA_LOWER_CASE).retainFrom(string);
// 只保留數字和小寫字母
```
注:CharMatcher只處理char類型代表的字符;它不能理解0x10000到0x10FFFF的Unicode 增補字符。這些邏輯字符以代理對[surrogate pairs]的形式編碼進字符串,而CharMatcher只能將這種邏輯字符看待成兩個獨立的字符。
### 獲取字符匹配器
CharMatcher中的常量可以滿足大多數字符匹配需求:
| [`ANY`](http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/CharMatcher.html#ANY) | [`NONE`](http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/CharMatcher.html#NONE) | [`WHITESPACE`](http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/CharMatcher.html#WHITESPACE) | [`BREAKING_WHITESPACE`](http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/CharMatcher.html#BREAKING_WHITESPACE) |
|:--- |:--- |:--- |:--- |:--- |
| [`INVISIBLE`](http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/CharMatcher.html#INVISIBLE) | [`DIGIT`](http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/CharMatcher.html#DIGIT) | [`JAVA_LETTER`](http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/CharMatcher.html#JAVA_LETTER) | [`JAVA_DIGIT`](http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/CharMatcher.html#JAVA_DIGIT) |
| [`JAVA_LETTER_OR_DIGIT`](http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/CharMatcher.html#JAVA_LETTER_OR_DIGIT) | [`JAVA_ISO_CONTROL`](http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/CharMatcher.html#JAVA_ISO_CONTROL) | [`JAVA_LOWER_CASE`](http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/CharMatcher.html#JAVA_LOWER_CASE) | [`JAVA_UPPER_CASE`](http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/CharMatcher.html#JAVA_UPPER_CASE) |
| [`ASCII`](http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/CharMatcher.html#ASCII) | [`SINGLE_WIDTH`](http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/CharMatcher.html#SINGLE_WIDTH) |
其他獲取字符匹配器的常見方法包括:
| **方法** | **描述** |
|:--- |:--- |
| [`anyOf(CharSequence)`](http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/CharMatcher.html#anyOf%28java.lang.CharSequence%29) | 枚舉匹配字符。如CharMatcher.anyOf(“aeiou”)匹配小寫英語元音 |
| [`is(char)`](http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/CharMatcher.html#is%28char%29) | 給定單一字符匹配。 |
| [`inRange(char, char)`](http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/CharMatcher.html#inRange%28char, char%29) | 給定字符范圍匹配,如CharMatcher.inRange(‘a’, ‘z’) |
此外,CharMatcher還有[`negate()`](http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/CharMatcher.html#negate%28%29)、[`and(CharMatcher)`](http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/CharMatcher.html#and%28com.google.common.base.CharMatcher%29)和[`or(CharMatcher)`](http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/CharMatcher.html#or%28com.google.common.base.CharMatcher%29)方法。
### 使用字符匹配器
CharMatcher提供了[多種多樣的方法](http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/CharMatcher.html#method_summary)操作CharSequence中的特定字符。其中最常用的羅列如下:
| **方法** | **描述** |
|:--- |:--- |
| [`collapseFrom(CharSequence, ? char)`](http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/CharMatcher.html#collapseFrom%28java.lang.CharSequence, char%29) | 把每組連續的匹配字符替換為特定字符。如WHITESPACE.collapseFrom(string, ‘ ‘)把字符串中的連續空白字符替換為單個空格。 |
| [`matchesAllOf(CharSequence)`](http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/CharMatcher.html#matchesAllOf%28java.lang.CharSequence%29) | 測試是否字符序列中的所有字符都匹配。 |
| [`removeFrom(CharSequence)`](http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/CharMatcher.html#removeFrom%28java.lang.CharSequence%29) | 從字符序列中移除所有匹配字符。 |
| [`retainFrom(CharSequence)`](http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/CharMatcher.html#retainFrom%28java.lang.CharSequence%29) | 在字符序列中保留匹配字符,移除其他字符。 |
| [`trimFrom(CharSequence)`](http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/CharMatcher.html#trimFrom%28java.lang.CharSequence%29) | 移除字符序列的前導匹配字符和尾部匹配字符。 |
| [`replaceFrom(CharSequence, ? CharSequence)`](http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/CharMatcher.html#replaceFrom%28java.lang.CharSequence, java.lang.CharSequence%29) | 用特定字符序列替代匹配字符。 |
所有這些方法返回String,除了matchesAllOf返回的是boolean。
## 字符集[Charsets]
不要這樣做字符集處理:
```
try {
bytes = string.getBytes("UTF-8");
} catch (UnsupportedEncodingException e) {
// how can this possibly happen?
throw new AssertionError(e);
}
```
試試這樣寫:
```
bytes = string.getBytes(Charsets.UTF_8);
```
[`Charsets`](http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/Charsets.html)針對所有Java平臺都要保證支持的六種字符集提供了常量引用。嘗試使用這些常量,而不是通過名稱獲取字符集實例。
## 大小寫格式[CaseFormat]
CaseFormat被用來方便地在各種ASCII大小寫規范間轉換字符串——比如,編程語言的命名規范。CaseFormat支持的格式如下:
| **格式** | **范例** |
|:--- |:--- |
| [`LOWER_CAMEL`](http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/CaseFormat.html#LOWER_CAMEL) | lowerCamel |
| [`LOWER_HYPHEN`](http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/CaseFormat.html#LOWER_HYPHEN) | lower-hyphen |
| [`LOWER_UNDERSCORE`](http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/CaseFormat.html#LOWER_UNDERSCORE) | lower_underscore |
| [`UPPER_CAMEL`](http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/CaseFormat.html#UPPER_CAMEL) | UpperCamel |
| [`UPPER_UNDERSCORE`](http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/CaseFormat.html#UPPER_UNDERSCORE) | UPPER_UNDERSCORE |
CaseFormat的用法很直接:
```
CaseFormat.UPPER_UNDERSCORE.to(CaseFormat.LOWER_CAMEL, "CONSTANT_NAME")); // returns "constantName"
```
我們CaseFormat在某些時候尤其有用,比如編寫代碼生成器的時候。
- Google Guava官方教程(中文版)
- 1-基本工具
- 1.1-使用和避免null
- 1.2-前置條件
- 1.3-常見Object方法
- 1.4-排序: Guava強大的”流暢風格比較器”
- 1.5-Throwables:簡化異常和錯誤的傳播與檢查
- 2-集合
- 2.1-不可變集合
- 2.2-新集合類型
- 2.3-強大的集合工具類:java.util.Collections中未包含的集合工具
- 2.4-集合擴展工具類
- 3-緩存
- 4-函數式編程
- 5-并發
- 5.1-google Guava包的ListenableFuture解析
- 5.2-Google-Guava Concurrent包里的Service框架淺析
- 6-字符串處理:分割,連接,填充
- 7-原生類型
- 9-I/O
- 10-散列
- 11-事件總線
- 12-數學運算
- 13-反射