用 Python 讀寫 JSON · PythonGuru 中文系列教程

# 用 Python 讀寫 JSON > 原文： [https://thepythonguru.com/reading-and-writing-json-in-python/](https://thepythonguru.com/reading-and-writing-json-in-python/) * * * 于 2020 年 1 月 7 日更新 * * * JSON（JavaScript 對象表示法）是與語言無關的數據交換格式。它是由道格拉斯·克羅克福德（Douglas Crockford）創建和推廣的。在短短的歷史中，JSON 已成為事實上的跨網絡數據傳輸標準。 JSON 是從 JavaScript 對象語法派生的基于文本的格式。但是，它完全獨立于 JavaScript，因此您無需知道任何 JavaScript 即可使用 JSON。 Web 應用通常使用 JSON 在客戶端和服務器之間傳輸數據。如果您使用的是 Web 服務，則很有可能默認情況下以 JSON 格式將數據返回給您。在 JSON 誕生之前，XML 主要用于在客戶端和服務器之間發送和接收數據。 XML 的問題在于它冗長，繁重且不容易解析。但是，JSON 并非如此，您將很快看到。以下是描述人的 XML 文檔的示例。 ```py <?xml version="1.0" encoding="UTF-8" ?> <root> <firstName>John</firstName> <lastName>Smith</lastName> <isAlive>true</isAlive> <age>27</age> <address> <streetAddress>21 2nd Street</streetAddress> <city>New York</city> <state>NY</state> <postalCode>10021-3100</postalCode> </address> <phoneNumbers> <type>home</type> <number>212 555-1234</number> </phoneNumbers> <phoneNumbers> <type>office</type> <number>646 555-4567</number> </phoneNumbers> <phoneNumbers> <type>mobile</type> <number>123 456-7890</number> </phoneNumbers> <spouse /> </root> ``` 可以使用 JSON 表示相同的信息，如下所示： ```py { "firstName": "John", "lastName": "Smith", "isAlive": true, "age": 27, "address": { "streetAddress": "21 2nd Street", "city": "New York", "state": "NY", "postalCode": "10021-3100" }, "phoneNumbers": [ { "type": "home", "number": "212 555-1234" }, { "type": "office", "number": "646 555-4567" }, { "type": "mobile", "number": "123 456-7890" } ], "children": [], "spouse": null } ``` 我相信您會同意 JSON 副本更容易讀寫。另外，請注意，JSON 格式與 Python 中的字典非常相似。 ## 序列化和反序列化 * * * **序列化**：將對象轉換為適合通過網絡傳輸或存儲在文件或數據庫中的特殊格式的過程稱為序列化。 **反序列化**：與序列化相反。它將序列化返回的特殊格式轉換回可用的對象。在 JSON 的情況下，當我們序列化對象時，實際上是將 Python 對象轉換為 JSON 字符串，反序列化則通過其 JSON 字符串表示形式構建 Python 對象。 Python 提供了一個稱為`json`的內置模塊，用于對對象進行序列化和反序列化。要使用`json`模塊，請按以下步驟導入它： ```py >>> >>> import json >>> ``` `json`模塊主要提供以下用于序列化和反序列化的函數。 1. `dump(obj, fileobj)` 2. `dumps(obj)` 3. `load(fileobj)` 4. `loads(s)` 讓我們從`dump()`函數開始。 ## 使用`dump()`進行序列化 * * * `dump()`函數用于序列化數據。它需要一個 Python 對象，對其進行序列化，然后將輸出（它是 JSON 字符串）寫入對象之類的文件。 `dump()`函數的語法如下： **語法**：`dump(obj, fp)` | 參數 | 描述 | | --- | --- | | `obj` | 要序列化的對象。 | | `fp` | 一個類似文件的對象，將在其中寫入序列化數據。 | 這是一個例子： ```py >>> >>> import json >>> >>> person = { ... 'first_name': "John", ... "isAlive": True, ... "age": 27, ... "address": { ... "streetAddress": "21 2nd Street", ... "city": "New York", ... "state": "NY", ... "postalCode": "10021-3100" ... }, ... "hasMortgage": None ... } >>> >>> >>> with open('person.json', 'w') as f: # writing JSON object ... json.dump(person, f) ... >>> >>> >>> open('person.json', 'r').read() # reading JSON object as string '{"hasMortgage": null, "isAlive": true, "age": 27, "address": {"state": "NY", "streetAddress": "21 2nd Street", "city": "New York", "postalCode": "10021-3100"}, "first_name": "John"}' >>> >>> >>> type(open('person.json', 'r').read()) <class 'str'> >>> >>> ``` 請注意，在序列化對象時，Python 的`None`類型將轉換為 JSON 的`null`類型。下表列出了序列化數據時類型之間的轉換。 | Python 類型 | JSON 類型 | | --- | --- | | `dict` | `object` | | `list`，`tuple` | `array` | | `int` | `number` | | `float` | `number` | | `str` | `string` | | `True` | `true` | | `False` | `false` | | `None` | `null` | 當我們反序列化對象時，JSON 類型將轉換回其等效的 Python 類型。下表中描述了此操作： | JSON 類型 | Python 類型 | | --- | --- | | `object` | `dict` | | `array` | `list` | | `string` | `str` | | `number (int)` | `int` | | `number (real)` | `float` | | `true` | `True` | | `false` | `False` | | `null` | `None` | 這是另一個序列化兩個人的列表的示例： ```py >>> >>> >>> persons = \ ... [ ... { ... 'first_name': "John", ... "isAlive": True, ... "age": 27, ... "address": { ... "streetAddress": "21 2nd Street", ... "city": "New York", ... "state": "NY", ... "postalCode": "10021-3100" ... }, ... "hasMortgage": None, ... }, ... { ... 'first_name': "Bob", ... "isAlive": True, ... "age": 32, ... "address": { ... "streetAddress": "2428 O Conner Street", ... "city": " Ocean Springs", ... "state": "Mississippi", ... "postalCode": "20031-9110" ... }, ... "hasMortgage": True, ... } ... ... ] >>> >>> with open('person_list.json', 'w') as f: ... json.dump(persons, f) ... >>> >>> >>> open('person_list.json', 'r').read() '[{"hasMortgage": null, "isAlive": true, "age": 27, "address": {"state": "NY", "streetAddress": "21 2nd Street", "city": "New York", "postalCode": "10021-3100"}, "first_name": "John"}, {"hasMortgage": true, "isAlive": true, "age": 32, "address": {"state": "Mississippi", "streetAddress": "2428 O Conner Street", "city": " Ocean Springs", "postalCode": "20031-9110"}, "first_name": "Bob"}]' >>> >>> ``` 現在，我們的 Python 對象已序列化到文件。要將其反序列化回 Python 對象，我們使用`load()`函數。 ## 用`load()`反序列化 * * * `load()`函數從類似于對象的文件中反序列化 JSON 對象并返回它。其語法如下： ```py load(fp) -> a Python object ``` | 參數 | 描述 | | --- | --- | | `fp` | 從中讀取 JSON 字符串的類似文件的對象。 | Here is an example: ```py >>> >>> with open('person.json', 'r') as f: ... person = json.load(f) ... >>> >>> type(person) # notice the type of data returned by load() <class 'dict'> >>> >>> person {'age': 27, 'isAlive': True, 'hasMortgage': None, 'address': {'state': 'NY', 'streetAddress': '21 2nd Street', 'city': 'New York', 'postalCode': '10021-3100'}, 'first_name': 'John'} >>> >>> ``` ## 使用`dumps()`和`loads()`進行序列化和反序列化 * * * `dumps()`函數的工作原理與`dump()`完全相同，但是它不是將輸出發送到類似文件的對象，而是將輸出作為字符串返回。同樣，`loads()`函數與`load()`相同，但是它不是從文件反序列化 JSON 字符串，而是從字符串反序列化。這里有些例子： ```py >>> >>> person = { ... 'first_name': "John", ... "isAlive": True, ... "age": 27, ... "address": { ... "streetAddress": "21 2nd Street", ... "city": "New York", ... "state": "NY", ... "postalCode": "10021-3100" ... }, ... "hasMortgage": None ... } >>> >>> data = json.dumps(person) # serialize >>> >>> data '{"hasMortgage": null, "isAlive": true, "age": 27, "address": {"state": "NY", "streetAddress": "21 2nd Street", "city": "New York", "postalCode": "10021-3100"}, "first_name": "John"}' >>> >>> >>> person = json.loads(data) # deserialize from string >>> >>> type(person) <class 'dict'> >>> >>> person {'age': 27, 'isAlive': True, 'hasMortgage': None, 'address': {'state': 'NY', 'streetAddress': '21 2nd Street', 'city': 'New York', 'postalCode': '10021-3100'}, 'first_name': 'John'} >>> >>> ``` **注意**：由于字典不保留元素的順序，因此獲取鍵的順序可能會有所不同。 ## 自定義序列化器 * * * 以下是一些可選的關鍵字參數，可以將這些參數傳遞給`dumps`或`dump()`函數以自定義串行器。 | 參數 | 描述 | | --- | --- | | `indent` | 一個正整數，用于確定每個級別的鍵值對的縮進量。如果您具有深層嵌套的數據結構，則`indent`參數可以方便地美化輸出。 `indent`的默認值為`None`。 | | `sort_keys` | 布爾值標志（如果設置為`True`）將返回按鍵排序的 JSON 字符串，而不是隨機排序的。其默認值為`False`。 | | `skipkeys` | JSON 格式期望鍵為字符串，如果您嘗試使用無法轉換為字符串的類型（如元組），則會引發`TypeError`異常。為防止引發異常并跳過非字符串鍵，請將`skipkeys`參數設置為`True`。 | | `separators` | 它是指形式為`(item_separator, key_separator)`的元組。 `item_separator`是一個字符串，用于分隔列表中的項目。 `key_separator`也是一個字符串，用于分隔字典中的鍵和值。默認情況下，`separators`設置為`(',', ': ')`。 | 以下是一些示例，演示了如何在操作中使用這些參數： **示例 1** ：使用`indent` ```py >>> >>> print(json.dumps(person)) # without indent {"age": 27, "isAlive": true, "hasMortgage": null, "address": {"state": "NY", "streetAddress": "21 2nd Street", "city": "New York", "postalCode": "10021-3100"}, "first_name": "John"} >>> >>> >>> print(json.dumps(person, indent=4)) # with 4 levels of indentation { "age": 27, "isAlive": true, "hasMortgage": null, "address": { "state": "NY", "streetAddress": "21 2nd Street", "city": "New York", "postalCode": "10021-3100" }, "first_name": "John" } >>> >>> ``` 請記住，增加縮進量也會增加數據的大小。因此，請勿在生產環境中使用`indent`。 **示例 2** ：使用`sort_keys` ```py >>> >>> print(json.dumps(person, indent=4)) # print JSON string in random order { "address": { "state": "NY", "postalCode": "10021-3100", "city": "New York", "streetAddress": "21 2nd Street" }, "hasMortgage": null, "first_name": "John", "isAlive": true, "age": 27 } >>> >>> >>> print(json.dumps(person, indent=4, sort_keys=True)) # print JSON string in order by keys { "address": { "city": "New York", "postalCode": "10021-3100", "state": "NY", "streetAddress": "21 2nd Street" }, "age": 27, "first_name": "John", "hasMortgage": null, "isAlive": true } >>> >>> ``` **示例 3** ：使用`skipkeys` ```py >>> >>> data = {'one': 1, 'two': 2, (1,2): 3} >>> >>> json.dumps(data, indent=4) Traceback (most recent call last): ... TypeError: key (1, 2) is not a string >>> >>> ``` 在這種情況下，鍵`(1,2)`無法轉換為字符串，因此會引發`TypeError`異常。為防止引發異常并跳過非字符串鍵，請使用`skipkeys`參數。 ```py >>> >>> print(json.dumps(data, indent=4, skipkeys=True)) { "two": 2, "one": 1 } >>> >>> ``` **示例 4** ：使用`separators` ```py >>> >>> employee = { ... 'first_name': "Tom", ... "designation": 'CEO', ... "Salary": '2000000', ... "age": 35, ... "cars": ['chevy cavalier', 'ford taurus', 'tesla model x'] ... } >>> >>> >>> print(json.dumps(employee, indent=4, skipkeys=True,)) { "designation": "CEO", "age": 35, "cars": [ "chevy cavalier", "ford taurus", "tesla model x" ], "Salary": "2000000", "first_name": "Tom" } >>> >>> ``` 以上輸出中需要注意三件事： 1. 每個鍵值對使用逗號（`,`）分隔。 2. 數組中的項（例如`cars`）也使用逗號（`,`）分隔。 3. JSON 對象的鍵使用`': '`與值分開（即冒號后跟一個空格）。前兩種情況下的分隔符使用`item_separator`字符串控制，最后一種情況下使用`key_separator`控制。以下示例將`item_separator`和`key_separator`分別更改為豎線（`|`）和破折號（`-`）字符 ```py >>> >>> print(json.dumps(employee, indent=4, skipkeys=True, separators=('|', '-'))) { "designation"-"CEO"| "age"-35| "cars"-[ "chevy cavalier"| "ford taurus"| "tesla model x" ]| "Salary"-"2000000"| "first_name"-"Tom" } >>> >>> ``` 現在您知道`separators`的工作原理，我們可以通過從`item_separator`字符串中刪除空格字符來使輸出更緊湊。例如： ```py >>> >>> print(json.dumps(employee, indent=4, skipkeys=True, separators=(',', ':'))) { "designation":"CEO", "age":35, "cars":[ "chevy cavalier", "ford taurus", "tesla model x" ], "Salary":"2000000", "first_name":"Tom" } >>> >>> ``` ## 序列化自定義對象 * * * 默認情況下，`json`模塊僅允許我們序列化以下基本類型： * `int` * `float` * `str` * `bool` * `list` * `tuple` * `dict` * `None` 如果您嘗試序列化或反序列化自定義對象或任何其他內置類型，將引發`TypeError`異常。例如： ```py >>> >>> from datetime import datetime >>> >>> now = datetime.now() >>> >>> now datetime.datetime(2018, 9, 28, 22, 16, 46, 16944) >>> >>> d = {'name': 'bob', 'dob': now} >>> >>> json.dumps(d) Traceback (most recent call last): ... TypeError: datetime.datetime(2018, 9, 28, 22, 7, 0, 622242) is not JSON serializable >>> >>> >>> >>> >>> class Employee: ... ... def __init__(self, name): ... self.name = name ... >>> >>> e = Employee('John') >>> >>> e <__main__.Employee object at 0x7f20c82ee4e0> >>> >>> >>> json.dumps(e) Traceback (most recent call last): ... TypeError: <__main__.Employee object at 0x7f20c82ee4e0> is not JSON serializable >>> >>> ``` 要序列化自定義對象或內置類型，我們必須創建自己的序列化函數。 ```py def serialize_objects(obj): # serialize datetime object if isinstance(obj, datetime): return { '__class__': datetime.__name__, '__value__': str(obj) } # serialize Employee object # # if isinstance(obj, Employee): # return { # '__class__': 'Employee', # '__value__': obj.name # } raise TypeError(str(obj) + ' is not JSON serializable') ``` 以下是有關該函數的一些注意事項。 1. 該函數采用一個名為`obj`的參數。 2. 在第 5 行中，我們使用`isinstance()`函數檢查對象的類型。如果您的函數僅序列化單個類型，則嚴格地不必檢查類型，但是可以輕松添加其他類型的序列化。 3. 在 6-9 行中，我們使用兩個鍵創建一個字典：`__class__`和`__value__`。 `__class__`鍵存儲該類的原始名稱，并將用于反序列化數據。 `__value__`鍵存儲對象的值，在這種情況下，我們僅需使用內置的`str()`函數將`datetime.datetime`對象轉換為其字符串表示形式。 4. 在第 18 行中，我們引發了`TypeError`異常。這是必要的，否則我們的序列化函數不會為無法序列化的對象報告錯誤。我們的序列化函數現在可以序列化`datetime.datetime`對象。下一個問題是-我們如何將自定義序列化函數傳遞給`dumps()`或`dump()`。我們可以使用`default`關鍵字參數將自定義序列化函數傳遞給`dumps()`或`dump()`。這是一個例子： ```py >>> >>> def serialize_objects(obj): ... if isinstance(obj, datetime): ... return { ... '__class__': datetime.__name__, ... '__value__': str(obj) ... } ... raise TypeError(str(obj) + ' is not JSON serializable') ... >>> >>> employee = { ... 'first_name': "Mike", ... "designation": 'Manager', ... "doj": datetime(year=2016, month=5, day=2), # date of joining ... } >>> >>> >>> emp_json = json.dumps(employee, indent=4, default=serialize_objects) >>> >>> >>> print(emp_json) { "designation": "Manager", "doj": { "__value__": "2016-05-02 00:00:00", "__class__": "datetime" }, "first_name": "Mike" } >>> >>> ``` 注意`datetime.datetime`對象如何被序列化為帶有兩個鍵的字典。重要的是要注意，將僅調用`serialize_objects()`函數來序列化不是 Python 基本類型之一的對象。現在，我們已經成功地序列化了`datetime.datetime`對象。讓我們看看如果嘗試反序列化會發生什么。 ```py >>> >>> emp_dict = json.loads(emp_json) >>> >>> type(emp_dict) <class 'dict'> >>> >>> emp_dict {'designation': 'Manager', 'doj': {'__value__': '2016-05-02 00:00:00', '__class__': 'datetime'}, 'first_name': 'Mike'} >>> >>> emp_dict['doj'] {'__value__': '2016-05-02 00:00:00', '__class__': 'datetime'} >>> >>> ``` 請注意，`doj`鍵的值作為字典而不是`datetime.datetime`對象返回。發生這種情況是因為`loads()`函數對首先將`datetime.datetime`對象序列化的`serialize_objects()`函數一無所知。我們需要的是`serialize_objects()`函數的反面-該函數接受字典對象，檢查`__class__`鍵的存在，并根據`__value__`鍵中存儲的字符串表示形式構建`datetime.datetime`對象。 ```py def deserialize_objects(obj): if '__class__' in obj: if obj['__class__'] == 'datetime': return datetime.strptime(obj['__value__'], "%Y-%m-%d %H:%M:%S") # if obj['__class__'] == 'Employee': # return Employee(obj['__value__']) return obj ``` 這里唯一需要注意的是，我們正在使用`datetime.strptime`函數將日期時間字符串轉換為`datetime.datetime`對象。要將自定義反序列化函數傳遞給`loads()`方法，我們使用`object_hook`關鍵字參數。 ```py >>> >>> def deserialize_objects(obj): ... if '__class__' in obj: ... if obj['__class__'] == 'datetime': ... return datetime.strptime(obj['__value__'], "%Y-%m-%d %H:%M:%S") ... # if obj['__class__'] == 'Employee': ... # return Employee(obj['__value__']) ... return obj ... >>> >>> >>> emp_dict = json.loads(emp_json, object_hook=deserialize_objects) >>> >>> emp_dict {'designation': 'Manager', 'doj': datetime.datetime(2016, 5, 2, 0, 0), 'first_name': 'Mike'} >>> >>> emp_dict['doj'] datetime.datetime(2016, 5, 2, 0, 0) >>> >>> ``` 不出所料，這次`doj`鍵的值是`datetime.datetime`對象而不是字典。 * * * * * *