5.3.1 MongoDB存儲 · python3爬蟲筆記

### 1.相關&安裝官方文檔:[https://docs.mongodb.com/manual/reference/operator/query/](https://docs.mongodb.com/manual/reference/operator/query/) [安裝文檔](/1kai-fa-huan-jing-pei-zhi/15-cun-chu-ku-de-an-zhuang/152-pymongode-an-zhuang.md) ### 2.連接數據庫通過pymongo庫中的MongoClient，連接MongoDB數據庫，有兩種連接方式: 第一種: ``` import pymongo client = pymongo.MongoClient(host='localhost',port=27017) ``` 第二種: ``` import pymongo client = pymongo.MongoClient('mongodb://localhost:27017') ``` ### 3.指定數據庫 ``` # 在MongoDB中不需要去創建數據庫，指定數據庫名稱，調用會自定生成相應的數據庫 db = client.test # 等價于 # db = client["test"] ``` ### 4.指定集合 MongoDB 的每個數據庫包含了許多集合 Collection，類似與關系型數據庫中的表指定集合和指定數據庫的操作是一樣的 ``` collection = db.students # collection = db["students"] ``` ### 5.插入數據 ``` student = { "name":"angle", "age":20, } # 通過調用集合的insert()方法插入數據 result = collection.insert(student) print(result) ``` insert 方法會在執行后返回的 \_id 值，\_id值是每一條數據的唯一標識，如果沒有顯示指明\_id，MongoDB會自動產生一個ObjectId類型的\_id屬性運行結果: ``` 5b684a54bd880b468471dccf ``` 如果有多個值，可以以列表形式寫入 ``` import pymongo client = pymongo.MongoClient(host='localhost',port=27017) # 在MongoDB中不需要去創建數據庫，指定數據庫名稱，調用會自定生成相應的數據庫 db = client.test # db = client["test"] collection = db.students # collection = db["students"] student1 = { "name":"angle", "age":20, } student2 = { "name":"angle", "age":20, } result = collection.insert([student1,student2]) print(result) ``` 運行結果: ``` [ObjectId('5b684e83bd880b4408713464'), ObjectId('5b684e83bd880b4408713465')] ``` 注意在python3中，insert方法已經不再被推薦使用，現在官方推薦使用insert\_one和insert\_many方法 * insert\_one:插入一條數據 * insert\_many:插入多條數據 ``` # 插入單挑數據 result = collection.insert_one(student) print(result) ``` 運行結果: ``` <pymongo.results.InsertOneResult object at 0x000002744113BF48> ``` 返回結果和 insert 方法不同，返回的是InsertOneResult 對象，可以調用其 inserted\_id 屬性獲取 \_id ``` # 插入多條數據 result = collection.insert_many([student1,student2]) print(result) # 通過調用inserted_ids屬性獲取插入數據的_id的列表 print(result.inserted_ids) ``` 運行結果: ``` <pymongo.results.InsertManyResult object at 0x000001BBFFE2AF88> [ObjectId('5b684f75bd880b2228c1fd23'), ObjectId('5b684f75bd880b2228c1fd24')] ``` ### 6. 查詢 {#6-查詢} * find\_one:查詢得到單個結果 * find:返回一個生成器對象 ``` result = collection.find_one({"name":"angle"}) print(type(result)) print(result) ``` 運行結果: ``` <class 'dict'> {'_id': ObjectId('5b684a54bd880b468471dccf'), 'name': 'angle', 'age': 20} ``` 可以根據ObjectId查詢，但是需要導入bson庫的ObjectId ``` from bson.objectid import ObjectId result = collection.find_one({'_id': ObjectId('5b684a54bd880b468471dccf')}) print(type(result)) print(result) ``` 運行結果: ``` <class 'dict'> {'_id': ObjectId('5b684a54bd880b468471dccf'), 'name': 'angle', 'age': 20} ``` 對于多條數據的查詢 ``` results = collection.find({"name":"angle"}) print(results) for result in results: print(result) ``` 運行結果: ``` <pymongo.cursor.Cursor object at 0x0000022AED824518> {'_id': ObjectId('5b684a54bd880b468471dccf'), 'name': 'angle', 'age': 20} {'_id': ObjectId('5b684e83bd880b4408713464'), 'name': 'angle', 'age': 20} {'_id': ObjectId('5b684e83bd880b4408713465'), 'name': 'angle', 'age': 20} {'_id': ObjectId('5b684f1dbd880b49a85d9e9e'), 'name': 'angle', 'age': 20} {'_id': ObjectId('5b684f75bd880b2228c1fd23'), 'name': 'angle', 'age': 20} {'_id': ObjectId('5b684f75bd880b2228c1fd24'), 'name': 'angle', 'age': 20} ``` 在查詢時，可以使用條件查詢，例如:查詢age小于20的數據 ``` # 添加數據 student= { "name":"miku", "age":18, } result = collection.insert_one(student) ``` 條件語句通過以字典形式書寫: ``` {'$lt':20} ``` ``` results = collection.find({'age':{'$lt':20}}) for result in results: print(result) ``` 運行結果: ``` {'_id': ObjectId('5b68513cbd880b4dd0e128c7'), 'name': 'miku', 'age': 18} ``` 比較符號 | 符號 | 含義 | 示例 | | :--- | :--- | :--- | | $lt | 小于 | `{'age': {'$lt': 20}}` | | $gt | 大于 | `{'age': {'$gt': 20}}` | | $lte | 小于等于 | `{'age': {'$lte': 20}}` | | $gte | 大于等于 | `{'age': {'$gte': 20}}` | | $ne | 不等于 | `{'age': {'$ne': 20}}` | | $in | 在范圍內 | `{'age': {'$in': [20, 23]}}` | | $nin | 不在范圍內 | `{'age': {'$nin': [20, 23]}}` | | $regex | 正則匹配 | {'name':{'$regex':'^a.\*'}}（匹配以a開頭的字符串） | 更詳細的官方文檔:[https://docs.mongodb.com/manual/reference/operator/query/](https://docs.mongodb.com/manual/reference/operator/query/) 功能符號 | 符號 | 含義 | 示例 | 示例含義 | | :--- | :--- | :--- | :--- | | $regex | 匹配正則 | `{'name': {'$regex': '^M.*'}}` | name 以 M開頭 | | $exists | 屬性是否存在 | `{'name': {'$exists': True}}` | name 屬性存在 | | $type | 類型判斷 | `{'age': {'$type': 'int'}}` | age 的類型為 int | | $mod | 數字模操作 | `{'age': {'$mod': [5, 0]}}` | 年齡模 5 余 0 | | $text | 文本查詢 | `{'$text': {'$search': 'Mike'}}` | text 類型的屬性中包含 Mike 字符串 | | $where | 高級條件查詢 | `{'$where': 'obj.fans_count == obj.follows_count'}` | 自身粉絲數等于關注數 | | $inc | 加法 | {'$inc':{'age':1}} | 自身年齡加1 | ### 7.計數統計查詢結有多少條數據，可以調用count方法 ``` # 統計所有數據數目 count = collection.find().count() print(count) ``` 可以統計符合某個條件的數據有多少條數目 ``` count = collection.find({'age':{'$lt':20}}).count() print(count) ``` ### 8.排序可以調用sort方法進行排序，并傳入如下參數可指定升序或降序 * pymongo.ASCENDING:升序 * pymongo,DESCENDING:降序 ``` # 升序 results = collection.find().sort('name',pymongo.ASCENDING) print([result['name'] for result in results]) ``` 運行結果: ``` ['angle', 'angle', 'angle', 'angle', 'angle', 'angle', 'miku', 'miku'] ``` ### 9.偏移 skip$n$方法:向后偏移n個，獲取第n+1及其后的數據 ``` results = collection.find().sort('name',pymongo.ASCENDING).skip(2) print([result['name'] for result in results]) ``` 運行結果: ``` ['angle', 'angle', 'angle', 'angle', 'miku', 'miku'] ``` 使用limit限制指定要取的結果個數 limit$n$:限制只取n個數據 ``` results = collection.find().sort('name',pymongo.ASCENDING).skip(2).limit(3) print([result['name'] for result in results]) ``` 運行結果: ``` ['angle', 'angle', 'angle'] ``` 注意:在數據超多時，不要使用偏移量，應該使用\_id來進行篩選 ``` from bson.objectid import ObjectId collection.find({'_id': {'$gt': ObjectId('5b684a54bd880b468471dccf')}}) ``` ### 10.更新 update方法:指定更新條件和更新的數據，對數據進行更新 ``` # 根據條件先篩選出數據 condition = {"age":{"$lt":20}} student = collection.find_one(condition) # 修改數據 student['age'] = 100 # 把原條件和修改后的數據傳入，完成數據的更新 result = collection.update(condition,student) print(result) ``` 運行結果: ``` {'n': 1, 'nModified': 1, 'ok': 1.0, 'updatedExisting': True} ``` 返回結果為字典形式，ok為執行成，nModified:為影響的數據條數使用$set操作符對數據進行更新，$set操作符，只更新student字典內存在的字段，如果student原還有其他字段則不會更新，也不會刪除，如果不用$set操作符的話，之前的數據全部被student字典替換，如果原先存在其他的字段則會被刪除 ``` result = collection.update(condition,{'$set':student}) ``` 注意update方法已經不再被推薦使用了，目前推薦使用update\_one和update\_many方法，用法更加嚴格，第二個參數需要使用$類型操作符作為字典的鍵名 ``` condition = {"name":"miku"} student = collection.find_one(condition) student['age'] = 50 result = collection.update_one(condition,{'$set':student}) print(result) # matched_count:獲取匹配的數據條數 # modified_count:獲取影響的數據條數 print(result.matched_count, result.modified_count) ``` 運行結果: ``` <pymongo.results.UpdateResult object at 0x0000028E1A94AE88> 1 1 ``` 更新多條數據 ``` # 所有符合年齡大于20的，篩選出來后，再加上10，然后更新age字段數據 condition = {"age":{"$gt":20}} result = collection.update_many(condition,{'$inc':{'age':10}}) print(result) print(result.matched_count,result.modified_count) ``` ### 11.刪除 remove$condition$方法:指定刪除條件，所有符合的數據都會被刪除數據 ``` condition = {"age":{"$gt":20}} result = collection.remove(condition) print(result) ``` 運行結果: ``` {'n': 2, 'ok': 1.0} ``` n:表示刪除逇數目，ok表示刪除成功和上面一樣remove已經不被推薦使用，目前使用delete\_one和delete\_many方法 * delete\_one:刪除一條數據 * delete\_many:刪除多條數據 ``` result = collection.delete_one({'name':'angle'}) print(result) print(result.deleted_count) result = collection.delete_many({'name':'angle'}) print(result.deleted_count) ``` 運行結果: ``` <pymongo.results.DeleteResult object at 0x0000024241D3AE88> 1 4 ``` ### 12. 更多 {#12-更多} 另外 PyMongo 還提供了一些組合方法，如find\_one\_and\_delete、find\_one\_and\_replace、find\_one\_and\_update，就是查找后刪除、替換、更新操作，用法與上述方法基本一致。另外還可以對索引進行操作，如 create\_index、create\_indexes、drop\_index 等。詳細用法可以參見官方文檔：[http://api.mongodb.com/python/current/api/pymongo/collection.html](http://api.mongodb.com/python/current/api/pymongo/collection.html)。另外還有對數據庫、集合本身以及其他的一些操作，在這不再一一講解，可以參見官方文檔：[http://api.mongodb.com/python/current/api/pymongo/](http://api.mongodb.com/python/current/api/pymongo/)。