Item 85: Prefer alternatives to Java serialization（優先選擇 Java 序列化的替代方案） · Effective Java 3rd Edition

## Chapter 12. Serialization（序列化） ### Item 85: Prefer alternatives to Java serialization（優先選擇 Java 序列化的替代方案） When serialization was added to Java in 1997, it was known to be somewhat risky. The approach had been tried in a research language (Modula-3) but never in a production language. While the promise of distributed objects with little effort on the part of the programmer was appealing, the price was invisible constructors and blurred lines between API and implementation, with the potential for problems with correctness, performance, security, and maintenance. Proponents believed the benefits outweighed the risks, but history has shown otherwise. 當序列化在 1997 年添加到 Java 中時，它被認為有一定的風險。這種方法曾在研究語言（Modula-3）中嘗試過，但從未在生產語言中使用過。雖然程序員不費什么力氣就能實現分布式對象，這一點很吸引人，但代價也不小，如：不可見的構造函數、API 與實現之間模糊的界線，還可能會出現正確性、性能、安全性和維護方面的問題。支持者認為收益大于風險，但歷史證明并非如此。 The security issues described in previous editions of this book turned out to be every bit as serious as some had feared. The vulnerabilities discussed in the early 2000s were transformed into serious exploits over the next decade, famously including a ransomware attack on the San Francisco Metropolitan Transit Agency Municipal Railway (SFMTA Muni) that shut down the entire fare collection system for two days in November 2016 [Gallagher16]. 在本書之前的版本中描述的安全問題，和人們擔心的一樣嚴重。21 世紀初僅停留在討論的漏洞在接下來的 10 年間變成了真實嚴重的漏洞，其中最著名的包括 2016 年 11 月對舊金山大都會運輸署市政鐵路（SFMTA Muni）的勒索軟件攻擊，導致整個收費系統關閉了兩天 [Gallagher16]。 A fundamental problem with serialization is that its attack surface is too big to protect, and constantly growing: Object graphs are deserialized by invoking the readObject method on an ObjectInputStream. This method is essentially a magic constructor that can be made to instantiate objects of almost any type on the class path, so long as the type implements the Serializable interface. In the process of deserializing a byte stream, this method can execute code from any of these types, so the code for all of these types is part of the attack surface. 序列化的一個根本問題是它的可攻擊范圍太大，且難以保護，而且問題還在不斷增多：通過調用 ObjectInputStream 上的 readObject 方法反序列化對象圖。這個方法本質上是一個神奇的構造函數，可以用來實例化類路徑上幾乎任何類型的對象，只要該類型實現 Serializable 接口。在反序列化字節流的過程中，此方法可以執行來自任何這些類型的代碼，因此所有這些類型的代碼都在攻擊范圍內。 The attack surface includes classes in the Java platform libraries, in third-party libraries such as Apache Commons Collections, and in the application itself. Even if you adhere to all of the relevant best practices and succeed in writing serializable classes that are invulnerable to attack, your application may still be vulnerable. To quote Robert Seacord, technical manager of the CERT Coordination Center: 攻擊可涉及 Java 平臺庫、第三方庫（如 Apache Commons collection）和應用程序本身中的類。即使堅持履行實踐了所有相關的最佳建議，并成功地編寫了不受攻擊的可序列化類，應用程序仍然可能是脆弱的。引用 CERT 協調中心技術經理 Robert Seacord 的話： Java deserialization is a clear and present danger as it is widely used both directly by applications and indirectly by Java subsystems such as RMI (Remote Method Invocation), JMX (Java Management Extension), and JMS (Java Messaging System). Deserialization of untrusted streams can result in remote code execution (RCE), denial-of-service (DoS), and a range of other exploits. Applications can be vulnerable to these attacks even if they did nothing wrong. [Seacord17] Java 反序列化是一個明顯且真實的危險源，因為它被應用程序直接和間接地廣泛使用，比如 RMI（遠程方法調用）、JMX（Java 管理擴展）和 JMS（Java 消息傳遞系統）。不可信流的反序列化可能導致遠程代碼執行（RCE）、拒絕服務（DoS）和一系列其他攻擊。應用程序很容易受到這些攻擊，即使它們本身沒有錯誤。[Seacord17] Attackers and security researchers study the serializable types in the Java libraries and in commonly used third-party libraries, looking for methods invoked during deserialization that perform potentially dangerous activities. Such methods are known as gadgets. Multiple gadgets can be used in concert, to form a gadget chain. From time to time, a gadget chain is discovered that is sufficiently powerful to allow an attacker to execute arbitrary native code on the underlying hardware, given only the opportunity to submit a carefully crafted byte stream for deserialization. This is exactly what happened in the SFMTA Muni attack. This attack was not isolated. There have been others, and there will be more. 攻擊者和安全研究人員研究 Java 庫和常用的第三方庫中的可序列化類型，尋找在反序列化過程中調用的潛在危險活動的方法稱為 gadget。多個小工具可以同時使用，形成一個小工具鏈。偶爾會發現一個小部件鏈，它的功能足夠強大，允許攻擊者在底層硬件上執行任意的本機代碼，允許提交精心設計的字節流進行反序列化。這正是 SFMTA Muni 襲擊中發生的事情。這次襲擊并不是孤立的。不僅已經存在，而且還會有更多。 Without using any gadgets, you can easily mount a denial-of-service attack by causing the deserialization of a short stream that requires a long time to deserialize. Such streams are known as deserialization bombs [Svoboda16]. Here’s an example by Wouter Coekaerts that uses only hash sets and a string [Coekaerts15]: 不使用任何 gadget，你都可以通過對需要很長時間才能反序列化的短流進行反序列化，輕松地發起拒絕服務攻擊。這種流被稱為反序列化炸彈 [Svoboda16]。下面是 Wouter Coekaerts 的一個例子，它只使用哈希集和字符串 [Coekaerts15]： ``` // Deserialization bomb - deserializing this stream takes forever static byte[] bomb() { Set<Object> root = new HashSet<>(); Set<Object> s1 = root; Set<Object> s2 = new HashSet<>(); for (int i = 0; i < 100; i++) { Set<Object> t1 = new HashSet<>(); Set<Object> t2 = new HashSet<>(); t1.add("foo"); // Make t1 unequal to t2 s1.add(t1); s1.add(t2); s2.add(t1); s2.add(t2); s1 = t1; s2 = t2; } return serialize(root); // Method omitted for brevity } ``` The object graph consists of 201 HashSet instances, each of which contains 3 or fewer object references. The entire stream is 5,744 bytes long, yet the sun would burn out long before you could deserialize it. The problem is that deserializing a HashSet instance requires computing the hash codes of its elements. The 2 elements of the root hash set are themselves hash sets containing 2 hash-set elements, each of which contains 2 hash-set elements, and so on, 100 levels deep. Therefore, deserializing the set causes the hashCode method to be invoked over 2100 times. Other than the fact that the deserialization is taking forever, the deserializer has no indication that anything is amiss. Few objects are produced, and the stack depth is bounded. 對象圖由 201 個 HashSet 實例組成，每個實例包含 3 個或更少的對象引用。整個流的長度為 5744 字節，但是在你對其進行反序列化之前，資源就已經耗盡了。問題在于，反序列化 HashSet 實例需要計算其元素的哈希碼。根哈希集的 2 個元素本身就是包含 2 個哈希集元素的哈希集，每個哈希集元素包含 2 個哈希集元素，以此類推，深度為 100。因此，反序列化 Set 會導致 hashCode 方法被調用超過 2100 次。除了反序列化會持續很長時間之外，反序列化器沒有任何錯誤的跡象。生成的對象很少，并且堆棧深度是有界的。 So what can you do defend against these problems? You open yourself up to attack whenever you deserialize a byte stream that you don’t trust. **The best way to avoid serialization exploits is never to deserialize anything.** In the words of the computer named Joshua in the 1983 movie WarGames, “the only winning move is not to play.” **There is no reason to use Java serialization in any new system you write.** There are other mechanisms for translating between objects and byte sequences that avoid many of the dangers of Java serialization, while offering numerous advantages, such as cross-platform support, high performance, a large ecosystem of tools, and a broad community of expertise. In this book, we refer to these mechanisms as cross-platform structured-data representations. While others sometimes refer to them as serialization systems, this book avoids that usage to prevent confusion with Java serialization. 那么你能做些什么來抵御這些問題呢？當你反序列化一個你不信任的字節流時，你就會受到攻擊。**避免序列化利用的最好方法是永遠不要反序列化任何東西。** 用 1983 年電影《戰爭游戲》（WarGames）中名為約書亞（Joshua）的電腦的話來說，「唯一的制勝絕招就是不玩。」**沒有理由在你編寫的任何新系統中使用 Java 序列化。** 還有其他一些機制可以在對象和字節序列之間進行轉換，從而避免了 Java 序列化的許多危險，同時還提供了許多優勢，比如跨平臺支持、高性能、大量工具和廣泛的專家社區。在本書中，我們將這些機制稱為跨平臺結構數據表示。雖然其他人有時將它們稱為序列化系統，但本書避免使用這種說法，以免與 Java 序列化混淆。 What these representations have in common is that they’re far simpler than Java serialization. They don’t support automatic serialization and deserialization of arbitrary object graphs. Instead, they support simple, structured data-objects consisting of a collection of attribute-value pairs. Only a few primitive and array data types are supported. This simple abstraction turns out to be sufficient for building extremely powerful distributed systems and simple enough to avoid the serious problems that have plagued Java serialization since its inception. 以上所述技術的共同點是它們比 Java 序列化簡單得多。它們不支持任意對象圖的自動序列化和反序列化。相反，它們支持簡單的結構化數據對象，由一組「屬性-值」對組成。只有少數基本數據類型和數組數據類型得到支持。事實證明，這個簡單的抽象足以構建功能極其強大的分布式系統，而且足夠簡單，可以避免 Java 序列化從一開始就存在的嚴重問題。 The leading cross-platform structured data representations are JSON [JSON] and Protocol Buffers, also known as protobuf [Protobuf]. JSON was designed by Douglas Crockford for browser-server communication, and protocol buffers were designed by Google for storing and interchanging structured data among its servers. Even though these representations are sometimes called languageneutral, JSON was originally developed for JavaScript and protobuf for C++; both representations retain vestiges of their origins. 領先的跨平臺結構化數據表示是 JSON 和 Protocol Buffers，也稱為 protobuf。JSON 由 Douglas Crockford 設計用于瀏覽器與服務器通信，Protocol Buffers 由谷歌設計用于在其服務器之間存儲和交換結構化數據。盡管這些技術有時被稱為「中性語言」，但 JSON 最初是為 JavaScript 開發的，而 protobuf 是為 c++ 開發的；這兩種技術都保留了其起源的痕跡。 The most significant differences between JSON and protobuf are that JSON is text-based and human-readable, whereas protobuf is binary and substantially more efficient; and that JSON is exclusively a data representation, whereas protobuf offers schemas (types) to document and enforce appropriate usage. Although protobuf is more efficient than JSON, JSON is extremely efficient for a text-based representation. And while protobuf is a binary representation, it does provide an alternative text representation for use where human-readability is desired (pbtxt). JSON 和 protobuf 之間最顯著的區別是 JSON 是基于文本的，并且是人類可讀的，而 protobuf 是二進制的，但效率更高；JSON 是一種專門的數據表示，而 protobuf 提供模式（類型）來記錄和執行適當的用法。雖然 protobuf 比 JSON 更有效，但是 JSON 對于基于文本的表示非常有效。雖然 protobuf 是一種二進制表示，但它確實提供了另一種文本表示，可用于需要具備人類可讀性的場景（pbtxt）。 If you can’t avoid Java serialization entirely, perhaps because you’re working in the context of a legacy system that requires it, your next best alternative is to **never deserialize untrusted data.** In particular, you should never accept RMI traffic from untrusted sources. The official secure coding guidelines for Java say “Deserialization of untrusted data is inherently dangerous and should be avoided.” This sentence is set in large, bold, italic, red type, and it is the only text in the entire document that gets this treatment [Java-secure]. 如果你不能完全避免 Java 序列化，可能是因為你需要在遺留系統環境中工作，那么你的下一個最佳選擇是 **永遠不要反序列化不可信的數據。** 特別要注意，你不應該接受來自不可信來源的 RMI 流量。Java 的官方安全編碼指南說：「反序列化不可信的數據本質上是危險的，應該避免。」這句話是用大號、粗體、斜體和紅色字體設置的，它是整個文檔中唯一得到這種格式處理的文本。[Java-secure] If you can’t avoid serialization and you aren’t absolutely certain of the safety of the data you’re deserializing, use the object deserialization filtering added in Java 9 and backported to earlier releases (java.io.ObjectInputFilter). This facility lets you specify a filter that is applied to data streams before they’re deserialized. It operates at the class granularity, letting you accept or reject certain classes. Accepting classes by default and rejecting a list of potentially dangerous ones is known as blacklisting; rejecting classes by default and accepting a list of those that are presumed safe is known as whitelisting. **Prefer whitelisting to blacklisting,** as blacklisting only protects you against known threats. A tool called Serial Whitelist Application Trainer (SWAT) can be used to automatically prepare a whitelist for your application [Schneider16]. The filtering facility will also protect you against excessive memory usage, and excessively deep object graphs, but it will not protect you against serialization bombs like the one shown above. 如果無法避免序列化，并且不能絕對確定反序列化數據的安全性，那么可以使用 Java 9 中添加的對象反序列化篩選，并將其移植到早期版本（java.io.ObjectInputFilter）。該工具允許你指定一個過濾器，該過濾器在反序列化數據流之前應用于數據流。它在類粒度上運行，允許你接受或拒絕某些類。默認接受所有類，并拒絕已知潛在危險類的列表稱為黑名單；在默認情況下拒絕其他類，并接受假定安全的類的列表稱為白名單。**優先選擇白名單而不是黑名單，** 因為黑名單只保護你免受已知的威脅。一個名為 Serial Whitelist Application Trainer（SWAT）的工具可用于為你的應用程序自動準備一個白名單 [Schneider16]。過濾工具還將保護你免受過度內存使用和過于深入的對象圖的影響，但它不能保護你免受如上面所示的序列化炸彈的影響。 Unfortunately, serialization is still pervasive in the Java ecosystem. If you are maintaining a system that is based on Java serialization, seriously consider migrating to a cross-platform structured-data representation, even though this may be a time-consuming endeavor. Realistically, you may still find yourself having to write or maintain a serializable class. It requires great care to write a serializable class that is correct, safe, and efficient. The remainder of this chapter provides advice on when and how to do this. 不幸的是，序列化在 Java 生態系統中仍然很普遍。如果你正在維護一個基于 Java 序列化的系統，請認真考慮遷移到跨平臺的結構化數據，盡管這可能是一項耗時的工作。實際上，你可能仍然需要編寫或維護一個可序列化的類。編寫一個正確、安全、高效的可序列化類需要非常小心。本章的其余部分將提供何時以及如何進行此操作的建議。 In summary, serialization is dangerous and should be avoided. If you are designing a system from scratch, use a cross-platform structured-data representation such as JSON or protobuf instead. Do not deserialize untrusted data. If you must do so, use object deserialization filtering, but be aware that it is not guaranteed to thwart all attacks. Avoid writing serializable classes. If you must do so, exercise great caution. 總之，序列化是危險的，應該避免。如果你從頭開始設計一個系統，可以使用跨平臺的結構化數據，如 JSON 或 protobuf。不要反序列化不可信的數據。如果必須這樣做，請使用對象反序列化過濾，但要注意，它不能保證阻止所有攻擊。避免編寫可序列化的類。如果你必須這樣做，一定要非常小心。 --- **[Back to contents of the chapter（返回章節目錄）](/Chapter-12/Chapter-12-Introduction.md)** - **Previous Item（上一條目）：[Item 84: Don’t depend on the thread scheduler（不要依賴線程調度器）](/Chapter-11/Chapter-11-Item-84-Don’t-depend-on-the-thread-scheduler.md)** - **Next Item（下一條目）：[Item 86: Implement Serializable with great caution（非常謹慎地實現 Serializable）](/Chapter-12/Chapter-12-Item-86-Implement-Serializable-with-great-caution.md)**