## Chapter 12. Serialization(序列化)
### Item 85: Prefer alternatives to Java serialization(優先選擇 Java 序列化的替代方案)
When serialization was added to Java in 1997, it was known to be somewhat risky. The approach had been tried in a research language (Modula-3) but never in a production language. While the promise of distributed objects with little effort on the part of the programmer was appealing, the price was invisible constructors and blurred lines between API and implementation, with the potential for problems with correctness, performance, security, and maintenance. Proponents believed the benefits outweighed the risks, but history has shown otherwise.
當序列化在 1997 年添加到 Java 中時,它被認為有一定的風險。這種方法曾在研究語言(Modula-3)中嘗試過,但從未在生產語言中使用過。雖然程序員不費什么力氣就能實現分布式對象,這一點很吸引人,但代價也不小,如:不可見的構造函數、API 與實現之間模糊的界線,還可能會出現正確性、性能、安全性和維護方面的問題。支持者認為收益大于風險,但歷史證明并非如此。
The security issues described in previous editions of this book turned out to be every bit as serious as some had feared. The vulnerabilities discussed in the early 2000s were transformed into serious exploits over the next decade, famously including a ransomware attack on the San Francisco Metropolitan Transit Agency Municipal Railway (SFMTA Muni) that shut down the entire fare collection system for two days in November 2016 [Gallagher16].
在本書之前的版本中描述的安全問題,和人們擔心的一樣嚴重。21 世紀初僅停留在討論的漏洞在接下來的 10 年間變成了真實嚴重的漏洞,其中最著名的包括 2016 年 11 月對舊金山大都會運輸署市政鐵路(SFMTA Muni)的勒索軟件攻擊,導致整個收費系統關閉了兩天 [Gallagher16]。
A fundamental problem with serialization is that its attack surface is too big to protect, and constantly growing: Object graphs are deserialized by invoking the readObject method on an ObjectInputStream. This method is essentially a magic constructor that can be made to instantiate objects of almost any type on the class path, so long as the type implements the Serializable interface. In the process of deserializing a byte stream, this method can execute code from any of these types, so the code for all of these types is part of the attack surface.
序列化的一個根本問題是它的可攻擊范圍太大,且難以保護,而且問題還在不斷增多:通過調用 ObjectInputStream 上的 readObject 方法反序列化對象圖。這個方法本質上是一個神奇的構造函數,可以用來實例化類路徑上幾乎任何類型的對象,只要該類型實現 Serializable 接口。在反序列化字節流的過程中,此方法可以執行來自任何這些類型的代碼,因此所有這些類型的代碼都在攻擊范圍內。
The attack surface includes classes in the Java platform libraries, in third-party libraries such as Apache Commons Collections, and in the application itself. Even if you adhere to all of the relevant best practices and succeed in writing serializable classes that are invulnerable to attack, your application may still be vulnerable. To quote Robert Seacord, technical manager of the CERT Coordination Center:
攻擊可涉及 Java 平臺庫、第三方庫(如 Apache Commons collection)和應用程序本身中的類。即使堅持履行實踐了所有相關的最佳建議,并成功地編寫了不受攻擊的可序列化類,應用程序仍然可能是脆弱的。引用 CERT 協調中心技術經理 Robert Seacord 的話:
Java deserialization is a clear and present danger as it is widely used both directly by applications and indirectly by Java subsystems such as RMI (Remote Method Invocation), JMX (Java Management Extension), and JMS (Java Messaging System). Deserialization of untrusted streams can result in remote code execution (RCE), denial-of-service (DoS), and a range of other exploits. Applications can be vulnerable to these attacks even if they did nothing wrong. [Seacord17]
Java 反序列化是一個明顯且真實的危險源,因為它被應用程序直接和間接地廣泛使用,比如 RMI(遠程方法調用)、JMX(Java 管理擴展)和 JMS(Java 消息傳遞系統)。不可信流的反序列化可能導致遠程代碼執行(RCE)、拒絕服務(DoS)和一系列其他攻擊。應用程序很容易受到這些攻擊,即使它們本身沒有錯誤。[Seacord17]
Attackers and security researchers study the serializable types in the Java libraries and in commonly used third-party libraries, looking for methods invoked during deserialization that perform potentially dangerous activities. Such methods are known as gadgets. Multiple gadgets can be used in concert, to form a gadget chain. From time to time, a gadget chain is discovered that is sufficiently powerful to allow an attacker to execute arbitrary native code on the underlying hardware, given only the opportunity to submit a carefully crafted byte stream for deserialization. This is exactly what happened in the SFMTA Muni attack. This attack was not isolated. There have been others, and there will be more.
攻擊者和安全研究人員研究 Java 庫和常用的第三方庫中的可序列化類型,尋找在反序列化過程中調用的潛在危險活動的方法稱為 gadget。多個小工具可以同時使用,形成一個小工具鏈。偶爾會發現一個小部件鏈,它的功能足夠強大,允許攻擊者在底層硬件上執行任意的本機代碼,允許提交精心設計的字節流進行反序列化。這正是 SFMTA Muni 襲擊中發生的事情。這次襲擊并不是孤立的。不僅已經存在,而且還會有更多。
Without using any gadgets, you can easily mount a denial-of-service attack by causing the deserialization of a short stream that requires a long time to deserialize. Such streams are known as deserialization bombs [Svoboda16]. Here’s an example by Wouter Coekaerts that uses only hash sets and a string [Coekaerts15]:
不使用任何 gadget,你都可以通過對需要很長時間才能反序列化的短流進行反序列化,輕松地發起拒絕服務攻擊。這種流被稱為反序列化炸彈 [Svoboda16]。下面是 Wouter Coekaerts 的一個例子,它只使用哈希集和字符串 [Coekaerts15]:
```
// Deserialization bomb - deserializing this stream takes forever
static byte[] bomb() {
Set<Object> root = new HashSet<>();
Set<Object> s1 = root;
Set<Object> s2 = new HashSet<>();
for (int i = 0; i < 100; i++) {
Set<Object> t1 = new HashSet<>();
Set<Object> t2 = new HashSet<>();
t1.add("foo"); // Make t1 unequal to t2
s1.add(t1); s1.add(t2);
s2.add(t1); s2.add(t2);
s1 = t1;
s2 = t2;
}
return serialize(root); // Method omitted for brevity
}
```
The object graph consists of 201 HashSet instances, each of which contains 3 or fewer object references. The entire stream is 5,744 bytes long, yet the sun would burn out long before you could deserialize it. The problem is that deserializing a HashSet instance requires computing the hash codes of its elements. The 2 elements of the root hash set are themselves hash sets containing 2 hash-set elements, each of which contains 2 hash-set elements, and so on, 100 levels deep. Therefore, deserializing the set causes the hashCode method to be invoked over 2100 times. Other than the fact that the deserialization is taking forever, the deserializer has no indication that anything is amiss. Few objects are produced, and the stack depth is bounded.
對象圖由 201 個 HashSet 實例組成,每個實例包含 3 個或更少的對象引用。整個流的長度為 5744 字節,但是在你對其進行反序列化之前,資源就已經耗盡了。問題在于,反序列化 HashSet 實例需要計算其元素的哈希碼。根哈希集的 2 個元素本身就是包含 2 個哈希集元素的哈希集,每個哈希集元素包含 2 個哈希集元素,以此類推,深度為 100。因此,反序列化 Set 會導致 hashCode 方法被調用超過 2100 次。除了反序列化會持續很長時間之外,反序列化器沒有任何錯誤的跡象。生成的對象很少,并且堆棧深度是有界的。
So what can you do defend against these problems? You open yourself up to attack whenever you deserialize a byte stream that you don’t trust. **The best way to avoid serialization exploits is never to deserialize anything.** In the words of the computer named Joshua in the 1983 movie WarGames, “the only winning move is not to play.” **There is no reason to use Java serialization in any new system you write.** There are other mechanisms for translating between objects and byte sequences that avoid many of the dangers of Java serialization, while offering numerous advantages, such as cross-platform support, high performance, a large ecosystem of tools, and a broad community of expertise. In this book, we refer to these mechanisms as cross-platform structured-data representations. While others sometimes refer to them as serialization systems, this book avoids that usage to prevent confusion with Java serialization.
那么你能做些什么來抵御這些問題呢?當你反序列化一個你不信任的字節流時,你就會受到攻擊。**避免序列化利用的最好方法是永遠不要反序列化任何東西。** 用 1983 年電影《戰爭游戲》(WarGames)中名為約書亞(Joshua)的電腦的話來說,「唯一的制勝絕招就是不玩。」**沒有理由在你編寫的任何新系統中使用 Java 序列化。** 還有其他一些機制可以在對象和字節序列之間進行轉換,從而避免了 Java 序列化的許多危險,同時還提供了許多優勢,比如跨平臺支持、高性能、大量工具和廣泛的專家社區。在本書中,我們將這些機制稱為跨平臺結構數據表示。雖然其他人有時將它們稱為序列化系統,但本書避免使用這種說法,以免與 Java 序列化混淆。
What these representations have in common is that they’re far simpler than Java serialization. They don’t support automatic serialization and deserialization of arbitrary object graphs. Instead, they support simple, structured data-objects consisting of a collection of attribute-value pairs. Only a few primitive and array data types are supported. This simple abstraction turns out to be sufficient for building extremely powerful distributed systems and simple enough to avoid the serious problems that have plagued Java serialization since its inception.
以上所述技術的共同點是它們比 Java 序列化簡單得多。它們不支持任意對象圖的自動序列化和反序列化。相反,它們支持簡單的結構化數據對象,由一組「屬性-值」對組成。只有少數基本數據類型和數組數據類型得到支持。事實證明,這個簡單的抽象足以構建功能極其強大的分布式系統,而且足夠簡單,可以避免 Java 序列化從一開始就存在的嚴重問題。
The leading cross-platform structured data representations are JSON [JSON] and Protocol Buffers, also known as protobuf [Protobuf]. JSON was designed by Douglas Crockford for browser-server communication, and protocol buffers were designed by Google for storing and interchanging structured data among its servers. Even though these representations are sometimes called languageneutral, JSON was originally developed for JavaScript and protobuf for C++; both representations retain vestiges of their origins.
領先的跨平臺結構化數據表示是 JSON 和 Protocol Buffers,也稱為 protobuf。JSON 由 Douglas Crockford 設計用于瀏覽器與服務器通信,Protocol Buffers 由谷歌設計用于在其服務器之間存儲和交換結構化數據。盡管這些技術有時被稱為「中性語言」,但 JSON 最初是為 JavaScript 開發的,而 protobuf 是為 c++ 開發的;這兩種技術都保留了其起源的痕跡。
The most significant differences between JSON and protobuf are that JSON is text-based and human-readable, whereas protobuf is binary and substantially more efficient; and that JSON is exclusively a data representation, whereas protobuf offers schemas (types) to document and enforce appropriate usage. Although protobuf is more efficient than JSON, JSON is extremely efficient for a text-based representation. And while protobuf is a binary representation, it does provide an alternative text representation for use where human-readability is desired (pbtxt).
JSON 和 protobuf 之間最顯著的區別是 JSON 是基于文本的,并且是人類可讀的,而 protobuf 是二進制的,但效率更高;JSON 是一種專門的數據表示,而 protobuf 提供模式(類型)來記錄和執行適當的用法。雖然 protobuf 比 JSON 更有效,但是 JSON 對于基于文本的表示非常有效。雖然 protobuf 是一種二進制表示,但它確實提供了另一種文本表示,可用于需要具備人類可讀性的場景(pbtxt)。
If you can’t avoid Java serialization entirely, perhaps because you’re working in the context of a legacy system that requires it, your next best alternative is to **never deserialize untrusted data.** In particular, you should never accept RMI traffic from untrusted sources. The official secure coding guidelines for Java say “Deserialization of untrusted data is inherently dangerous and should be avoided.” This sentence is set in large, bold, italic, red type, and it is the only text in the entire document that gets this treatment [Java-secure].
如果你不能完全避免 Java 序列化,可能是因為你需要在遺留系統環境中工作,那么你的下一個最佳選擇是 **永遠不要反序列化不可信的數據。** 特別要注意,你不應該接受來自不可信來源的 RMI 流量。Java 的官方安全編碼指南說:「反序列化不可信的數據本質上是危險的,應該避免。」這句話是用大號、粗體、斜體和紅色字體設置的,它是整個文檔中唯一得到這種格式處理的文本。[Java-secure]
If you can’t avoid serialization and you aren’t absolutely certain of the safety of the data you’re deserializing, use the object deserialization filtering added in Java 9 and backported to earlier releases (java.io.ObjectInputFilter). This facility lets you specify a filter that is applied to data streams before they’re deserialized. It operates at the class granularity, letting you accept or reject certain classes. Accepting classes by default and rejecting a list of potentially dangerous ones is known as blacklisting; rejecting classes by default and accepting a list of those that are presumed safe is known as whitelisting. **Prefer whitelisting to blacklisting,** as blacklisting only protects you against known threats. A tool called Serial Whitelist Application Trainer (SWAT) can be used to automatically prepare a whitelist for your application [Schneider16]. The filtering facility will also protect you against excessive memory usage, and excessively deep object graphs, but it will not protect you against serialization bombs like the one shown above.
如果無法避免序列化,并且不能絕對確定反序列化數據的安全性,那么可以使用 Java 9 中添加的對象反序列化篩選,并將其移植到早期版本(java.io.ObjectInputFilter)。該工具允許你指定一個過濾器,該過濾器在反序列化數據流之前應用于數據流。它在類粒度上運行,允許你接受或拒絕某些類。默認接受所有類,并拒絕已知潛在危險類的列表稱為黑名單;在默認情況下拒絕其他類,并接受假定安全的類的列表稱為白名單。**優先選擇白名單而不是黑名單,** 因為黑名單只保護你免受已知的威脅。一個名為 Serial Whitelist Application Trainer(SWAT)的工具可用于為你的應用程序自動準備一個白名單 [Schneider16]。過濾工具還將保護你免受過度內存使用和過于深入的對象圖的影響,但它不能保護你免受如上面所示的序列化炸彈的影響。
Unfortunately, serialization is still pervasive in the Java ecosystem. If you are maintaining a system that is based on Java serialization, seriously consider migrating to a cross-platform structured-data representation, even though this may be a time-consuming endeavor. Realistically, you may still find yourself having to write or maintain a serializable class. It requires great care to write a serializable class that is correct, safe, and efficient. The remainder of this chapter provides advice on when and how to do this.
不幸的是,序列化在 Java 生態系統中仍然很普遍。如果你正在維護一個基于 Java 序列化的系統,請認真考慮遷移到跨平臺的結構化數據,盡管這可能是一項耗時的工作。實際上,你可能仍然需要編寫或維護一個可序列化的類。編寫一個正確、安全、高效的可序列化類需要非常小心。本章的其余部分將提供何時以及如何進行此操作的建議。
In summary, serialization is dangerous and should be avoided. If you are designing a system from scratch, use a cross-platform structured-data representation such as JSON or protobuf instead. Do not deserialize untrusted data. If you must do so, use object deserialization filtering, but be aware that it is not guaranteed to thwart all attacks. Avoid writing serializable classes. If you must do so, exercise great caution.
總之,序列化是危險的,應該避免。如果你從頭開始設計一個系統,可以使用跨平臺的結構化數據,如 JSON 或 protobuf。不要反序列化不可信的數據。如果必須這樣做,請使用對象反序列化過濾,但要注意,它不能保證阻止所有攻擊。避免編寫可序列化的類。如果你必須這樣做,一定要非常小心。
---
**[Back to contents of the chapter(返回章節目錄)](/Chapter-12/Chapter-12-Introduction.md)**
- **Previous Item(上一條目):[Item 84: Don’t depend on the thread scheduler(不要依賴線程調度器)](/Chapter-11/Chapter-11-Item-84-Don’t-depend-on-the-thread-scheduler.md)**
- **Next Item(下一條目):[Item 86: Implement Serializable with great caution(非常謹慎地實現 Serializable)](/Chapter-12/Chapter-12-Item-86-Implement-Serializable-with-great-caution.md)**
- Chapter 2. Creating and Destroying Objects(創建和銷毀對象)
- Item 1: Consider static factory methods instead of constructors(考慮以靜態工廠方法代替構造函數)
- Item 2: Consider a builder when faced with many constructor parameters(在面對多個構造函數參數時,請考慮構建器)
- Item 3: Enforce the singleton property with a private constructor or an enum type(使用私有構造函數或枚舉類型實施單例屬性)
- Item 4: Enforce noninstantiability with a private constructor(用私有構造函數實施不可實例化)
- Item 5: Prefer dependency injection to hardwiring resources(依賴注入優于硬連接資源)
- Item 6: Avoid creating unnecessary objects(避免創建不必要的對象)
- Item 7: Eliminate obsolete object references(排除過時的對象引用)
- Item 8: Avoid finalizers and cleaners(避免使用終結器和清除器)
- Item 9: Prefer try with resources to try finally(使用 try-with-resources 優于 try-finally)
- Chapter 3. Methods Common to All Objects(對象的通用方法)
- Item 10: Obey the general contract when overriding equals(覆蓋 equals 方法時應遵守的約定)
- Item 11: Always override hashCode when you override equals(當覆蓋 equals 方法時,總要覆蓋 hashCode 方法)
- Item 12: Always override toString(始終覆蓋 toString 方法)
- Item 13: Override clone judiciously(明智地覆蓋 clone 方法)
- Item 14: Consider implementing Comparable(考慮實現 Comparable 接口)
- Chapter 4. Classes and Interfaces(類和接口)
- Item 15: Minimize the accessibility of classes and members(盡量減少類和成員的可訪問性)
- Item 16: In public classes use accessor methods not public fields(在公共類中,使用訪問器方法,而不是公共字段)
- Item 17: Minimize mutability(減少可變性)
- Item 18: Favor composition over inheritance(優先選擇復合而不是繼承)
- Item 19: Design and document for inheritance or else prohibit it(繼承要設計良好并且具有文檔,否則禁止使用)
- Item 20: Prefer interfaces to abstract classes(接口優于抽象類)
- Item 21: Design interfaces for posterity(為后代設計接口)
- Item 22: Use interfaces only to define types(接口只用于定義類型)
- Item 23: Prefer class hierarchies to tagged classes(類層次結構優于帶標簽的類)
- Item 24: Favor static member classes over nonstatic(靜態成員類優于非靜態成員類)
- Item 25: Limit source files to a single top level class(源文件僅限有單個頂層類)
- Chapter 5. Generics(泛型)
- Item 26: Do not use raw types(不要使用原始類型)
- Item 27: Eliminate unchecked warnings(消除 unchecked 警告)
- Item 28: Prefer lists to arrays(list 優于數組)
- Item 29: Favor generic types(優先使用泛型)
- Item 30: Favor generic methods(優先使用泛型方法)
- Item 31: Use bounded wildcards to increase API flexibility(使用有界通配符增加 API 的靈活性)
- Item 32: Combine generics and varargs judiciously(明智地合用泛型和可變參數)
- Item 33: Consider typesafe heterogeneous containers(考慮類型安全的異構容器)
- Chapter 6. Enums and Annotations(枚舉和注解)
- Item 34: Use enums instead of int constants(用枚舉類型代替 int 常量)
- Item 35: Use instance fields instead of ordinals(使用實例字段替代序數)
- Item 36: Use EnumSet instead of bit fields(用 EnumSet 替代位字段)
- Item 37: Use EnumMap instead of ordinal indexing(使用 EnumMap 替換序數索引)
- Item 38: Emulate extensible enums with interfaces(使用接口模擬可擴展枚舉)
- Item 39: Prefer annotations to naming patterns(注解優于命名模式)
- Item 40: Consistently use the Override annotation(堅持使用 @Override 注解)
- Item 41: Use marker interfaces to define types(使用標記接口定義類型)
- Chapter 7. Lambdas and Streams(λ 表達式和流)
- Item 42: Prefer lambdas to anonymous classes(λ 表達式優于匿名類)
- Item 43: Prefer method references to lambdas(方法引用優于 λ 表達式)
- Item 44: Favor the use of standard functional interfaces(優先使用標準函數式接口)
- Item 45: Use streams judiciously(明智地使用流)
- Item 46: Prefer side effect free functions in streams(在流中使用無副作用的函數)
- Item 47: Prefer Collection to Stream as a return type(優先選擇 Collection 而不是流作為返回類型)
- Item 48: Use caution when making streams parallel(謹慎使用并行流)
- Chapter 8. Methods(方法)
- Item 49: Check parameters for validity(檢查參數的有效性)
- Item 50: Make defensive copies when needed(在需要時制作防御性副本)
- Item 51: Design method signatures carefully(仔細設計方法簽名)
- Item 52: Use overloading judiciously(明智地使用重載)
- Item 53: Use varargs judiciously(明智地使用可變參數)
- Item 54: Return empty collections or arrays, not nulls(返回空集合或數組,而不是 null)
- Item 55: Return optionals judiciously(明智地的返回 Optional)
- Item 56: Write doc comments for all exposed API elements(為所有公開的 API 元素編寫文檔注釋)
- Chapter 9. General Programming(通用程序設計)
- Item 57: Minimize the scope of local variables(將局部變量的作用域最小化)
- Item 58: Prefer for-each loops to traditional for loops(for-each 循環優于傳統的 for 循環)
- Item 59: Know and use the libraries(了解并使用庫)
- Item 60: Avoid float and double if exact answers are required(若需要精確答案就應避免使用 float 和 double 類型)
- Item 61: Prefer primitive types to boxed primitives(基本數據類型優于包裝類)
- Item 62: Avoid strings where other types are more appropriate(其他類型更合適時應避免使用字符串)
- Item 63: Beware the performance of string concatenation(當心字符串連接引起的性能問題)
- Item 64: Refer to objects by their interfaces(通過接口引用對象)
- Item 65: Prefer interfaces to reflection(接口優于反射)
- Item 66: Use native methods judiciously(明智地使用本地方法)
- Item 67: Optimize judiciously(明智地進行優化)
- Item 68: Adhere to generally accepted naming conventions(遵守被廣泛認可的命名約定)
- Chapter 10. Exceptions(異常)
- Item 69: Use exceptions only for exceptional conditions(僅在確有異常條件下使用異常)
- Item 70: Use checked exceptions for recoverable conditions and runtime exceptions for programming errors(對可恢復情況使用 checked 異常,對編程錯誤使用運行時異常)
- Item 71: Avoid unnecessary use of checked exceptions(避免不必要地使用 checked 異常)
- Item 72: Favor the use of standard exceptions(鼓勵復用標準異常)
- Item 73: Throw exceptions appropriate to the abstraction(拋出能用抽象解釋的異常)
- Item 74: Document all exceptions thrown by each method(為每個方法記錄會拋出的所有異常)
- Item 75: Include failure capture information in detail messages(異常詳細消息中應包含捕獲失敗的信息)
- Item 76: Strive for failure atomicity(盡力保證故障原子性)
- Item 77: Don’t ignore exceptions(不要忽略異常)
- Chapter 11. Concurrency(并發)
- Item 78: Synchronize access to shared mutable data(對共享可變數據的同步訪問)
- Item 79: Avoid excessive synchronization(避免過度同步)
- Item 80: Prefer executors, tasks, and streams to threads(Executor、task、流優于直接使用線程)
- Item 81: Prefer concurrency utilities to wait and notify(并發實用工具優于 wait 和 notify)
- Item 82: Document thread safety(文檔應包含線程安全屬性)
- Item 83: Use lazy initialization judiciously(明智地使用延遲初始化)
- Item 84: Don’t depend on the thread scheduler(不要依賴線程調度器)
- Chapter 12. Serialization(序列化)
- Item 85: Prefer alternatives to Java serialization(優先選擇 Java 序列化的替代方案)
- Item 86: Implement Serializable with great caution(非常謹慎地實現 Serializable)
- Item 87: Consider using a custom serialized form(考慮使用自定義序列化形式)
- Item 88: Write readObject methods defensively(防御性地編寫 readObject 方法)
- Item 89: For instance control, prefer enum types to readResolve(對于實例控制,枚舉類型優于 readResolve)
- Item 90: Consider serialization proxies instead of serialized instances(考慮以序列化代理代替序列化實例)