python 學習記錄 · php筆記

## python 學習記錄 **last update: 2022-06-06 10:23:11** ---- [TOC=3,8] ---- ### Linux源碼安裝python(2、3) ~~~shell wget https://www.openssl.org/source/openssl-1.1.1o.tar.gz tar zxvf openssl-1.1.1o.tar.gz ./config --prefix=/opt/openssl-1.1.1o --openssldir=/opt/openssl-1.1.1o/openssl no-zlib make && make install echo "/opt/openssl-1.1.1o/lib" >> /etc/ld.so.conf ldconfig -v ~~~ ~~~shell yum -y install gcc gcc-c++ gdb wget https://www.python.org/ftp/python/3.10.4/Python-3.10.4.tgz tar -zxvf Python-3.10.4.tgz cd Python-3.10.4 ./configure -h make clean ./configure --prefix=/usr/local/python-3.10.4 --with-openssl=/opt/openssl-1.1.1o --with-openssl-rpath=auto --with-ssl make && make install ~~~ ~~~ ls /usr/local/python-3.10.4/bin -Fl /usr/local/python-3.10.4/bin/python3.10 --version /usr/local/python-3.10.4/bin/pip3.10 --version /usr/local/python-3.10.4/bin/python3.10 -m site /usr/local/python-3.10.4/bin/pip3.10 list -vvv --format=columns ~~~ ~~~ ln -s /usr/local/python-3.10.4/bin/python3.10 /usr/local/bin/python3 ln -s /usr/local/python-3.10.4/bin/pip3.10 /usr/local/bin/pip3 ~~~ 可選 ~~~ ln -s /usr/local/bin/python3 /usr/local/bin/python ln -s /usr/local/bin/pip3 /usr/local/bin/pip pip install -U pip -i https://pypi.tuna.tsinghua.edu.cn/simple/ python --version pip --version ~~~ [centos 安裝python3.8報錯_qq_36664203的博客-CSDN博客](https://blog.csdn.net/qq_36664203/article/details/106301856) [linux 源碼編譯安裝python3.* - 明月知秋的博客](https://www.xcwmoon.com/post/135) [python3 openssl問題（賊有用） - Captain_Li - 博客園](https://www.cnblogs.com/lemon-le/p/13419429.html) [python3中pip3安裝出錯,找不到SSL的解決方式_python_腳本之家](https://www.jb51.net/article/176223.htm) [解決安裝python3.7.4報錯Can''t connect to HTTPS URL because the SSL module is not available_python_腳本之家](https://www.jb51.net/article/166688.htm) [編譯安裝numpy報 error: ‘for’ loop initial declarations are only allowed in C99 mode 問題解決_小白_愚妄的博客-CSDN博客](https://blog.csdn.net/weixin_42883321/article/details/122458160?spm=1001.2101.3001.6661.1&utm_medium=distribute.pc_relevant_t0.none-task-blog-2~default~CTRLIST~Rate-1-122458160-blog-123213842.pc_relevant_antiscanv2&depth_1-utm_source=distribute.pc_relevant_t0.none-task-blog-2~default~CTRLIST~Rate-1-122458160-blog-123213842.pc_relevant_antiscanv2&utm_relevant_index=1) > `$: export CFLAGS='-std=c99'` ---- ### PyCharm 使用記錄 > python 語法最接近自然語言，也就是像我們平常說話那般自然，即使你不熟悉語法，當你不知道該怎么寫時，你就按照自己的想法寫，你會發現很多時候你總是對的，它就是你所想的那樣，python 就是這樣，就是你所想的樣子，那樣自洽，那樣自然，那樣簡單，不需要任何的刻意，它不追求任何技巧，不故作高深，只是人思想的自然表達而已，仿佛每個人天生就熟悉它。 #### 控制臺中文輸出亂碼解決方法：https://www.cnblogs.com/it-tsz/p/9823536.html 1. 文件 -> 設置 -> 編輯器 -> 文件編碼：項目編碼 - GBK 2. 文件 -> 設置 -> 編輯器 -> 文件和代碼模板 -> Python Script： ```python #!/usr/bin/env python3 # -*- coding: utf-8 -*- ' a python script ... ' __author__ = 'xiasf' ``` ---- #### 習慣配置主題：Monokai （配色方案）字體: Droid Sans Mono 大小: 13 行高: 1.1 - 編輯器 > 常規 > 外觀 > 顯示空格（前導、內部、尾隨） - 編輯器 > 常規 > 編輯器選項卡 >外觀 > 編輯已修改(*) - 系統設置 > 推出 IED 之前確認 > 終止進程插件： - CodeGlance （類似 sublime 中的代碼預覽地圖） - Chinese （漢化） ---- #### OSError: Cannot load native module 'Crypto.Cipher._raw_ecb': ``` OSError: Cannot load native module 'Crypto.Cipher._raw_ecb': Trying '_raw_ecb.pyd': [Error 193] %1 不是有效的 Win32 ``` https://github.com/Legrandin/pycryptodome/issues/155 上面報錯不是這個問題，而是 64位的操作系統，只能安裝 64位的 python 才行。https://www.python.org/downloads/release/python-2718/ ---- #### 關閉 sql 等警告提示通常我們不需要它檢查sql，因為這需要鏈接數據庫，而數據庫還可能是遠程的，開發時不希望連接檢查。 https://blog.csdn.net/weixin_42686768/article/details/97121081 https://www.cnblogs.com/lurenjia1994/p/9681637.html https://www.cnblogs.com/zq8421/p/10356383.html https://www.cnblogs.com/wisir/p/10898469.html https://blog.csdn.net/windscloud/article/details/80208960 [Python基礎之PEP8規范（代碼寫作規范） - 知乎](https://zhuanlan.zhihu.com/p/88729367) [Python PEP8 編碼規范中文版_基因記憶-CSDN博客](https://blog.csdn.net/ratsniper/article/details/78954852) https://legacy.python.org/dev/peps/pep-0008/ ---- #### pyautogui.click() 沒有效果 [51模擬器使用python pyautogui點擊沒有效果的解決方法_york1996的博客-CSDN博客](https://blog.csdn.net/york1996/article/details/104154806) > 右鍵啟動 PyCharm 或者其他 IDE 的時候選擇以管理員權限啟動即可。 ---- #### 關閉IDE時選擇終止或斷開連接的區別 PyCharm 終止和斷開連接不一樣，終止是正常關閉正在執行程序，如 `db` 類析構時會斷開數據庫連接，而非正常終止程序斷開連接則不會這樣，沒有執行析構。 ---- ### python 學習記錄 ---- #### 見微知著見微知著，以小見大，簡單的事物總是蘊含著其最本質的規律與哲學。 - 那是不是越低級的程序越難學，越高級的程序越簡單？表面上來說，是的，**但是，在非常高的抽象計算中，高級的 Python 程序設計也是非常難學的**，所以，高級程序語言不等于簡單。 - 任何計算機程序都是為了執行一個特定的任務，**有了輸入，用戶才能告訴計算機程序所需的信息，有了輸出，程序運行后才能告訴用戶任務的結果。** 輸入是 Input ，輸出是 Output ，因此，我們把輸入輸出統稱為 Input / Output，或者簡寫為 IO。 - **計算機之所以能做很多自動化的任務，因為它可以自己做條件判斷。** - **為了讓計算機能計算成千上萬次的重復運算，我們就需要循環語句。** - 寫計算機程序也是一樣，**函數就是最基本的一種代碼抽象的方式。** - 在協程中，不能調用普通的同步 IO 操作，因為所有用戶都是由一個線程服務的，**協程的執行速度必須非常快，才能處理大量用戶的請求。** 而耗時的 IO 操作不能在協程中以同步的方式調用，否則，等待一個 IO 操作時，系統無法響應任何其他用戶。這就是異步編程的一個原則：**一旦決定使用異步，則系統每一層都必須是異步，“開弓沒有回頭箭”。** - 但是在 Python 中，代碼不是越多越好，而是越少越好。代碼不是越復雜越好，而是越簡單越好。請始終牢記，**代碼越少，開發效率越高。** [Python 教程 - 廖雪峰的官方網站](https://www.liaoxuefeng.com/wiki/1016959663602400) [Python 3.11.3 Documentation](https://docs.python.org/zh-cn/3/index.html) [PEP 8 – Style Guide for Python Code | peps.python.org](https://peps.python.org/pep-0008/) ---- #### 數據類型 list：列表，索引數組 [a, b, c, ...] tuple：元組，不可變的列表 (a, b, c) dict：字典，關聯數組 {'k': 1, ...} set：集合，沒有 value 的 dict {'k', ...} ，集合中沒有重復的元素 `type()` 獲取變量類型： ```python >>> type(1) <class 'int'> ``` `isinstance()` 判斷變量類型： ```python def my_abs(x): if not isinstance(x, (int, float)): raise TypeError('bad operand type') if x >= 0: return x else: return -x ``` ```python >>> isinstance(1, str) False >>> isinstance(1, (int, float, bool, str)) True ``` ---- #### 函數參數相較于 php 來說，python 的函數參數定義及調用時傳參方式太靈活了，這樣在很多時候可以極大的方便調用，不過定義函數也復雜很多。 1. **位置參數**（必傳，支持按順序或按參數名傳入） 2. **默認參數**（支持按順序或按參數名傳入，沒有傳入參數時，參數值為默認值（缺省值），**默認參數必須指向不變對象！**，通常把變化大的參數放前面，**變化小的參數放后面。變化小的參數就可以作為默認參數。默認參數可以簡化函數的調用。**） 3. **可變參數** `*args`（傳入的參數個數是可變的，允許0個或任意個按位置傳入的參數，函數接收到的參數是一個元組 `tuple`，調用時也可以在前面加一個 `*` 號把 `list` 或 `tuple` 元素變成可變參數傳進去） 4. **關鍵字參數** `**kwargs`（類似可變參數，允許0個或任意個按參數名傳入的參數，函數接收到的參數是一個 `dict`，關鍵字參數常用于擴展函數的功能） 5. **命名關鍵字參數**（限制關鍵字參數的名字，如限制必傳，也可以有默認值；特殊分隔符 `*` 后面的都是命名關鍵字參數，如果已經有了一個可變參數 `*args` ，后面的命名關鍵字參數就不需要特殊分隔符 `*` 了） > 參數組合：在Python中定義函數，可以用必選參數、默認參數、可變參數、關鍵字參數和命名關鍵字參數，這5種參數都可以組合使用。但是請注意，**參數定義的順序必須是：必選參數、默認參數、可變參數、命名關鍵字參數和關鍵字參數。** > 對于任意函數，都可以通過類似 `func(*args, **kw)` 的形式調用它，無論它的參數是如何定義的。（不論按順序傳參還是按參數名傳參） [函數的參數 - 廖雪峰的官方網站](https://www.liaoxuefeng.com/wiki/1016959663602400/1017261630425888) ```python def fun(a, b): print(a, b) # TypeError: got multiple values for argument 'a' # TypeError: 參數 'a' 獲取了多個值 fun(1, a=1) # SyntaxError: positional argument follows keyword argument # SyntaxError: 位置參數不能在關鍵字參數后面 fun(b=2, 1) # TypeError: fun() takes 2 positional arguments but 3 were given # TypeError: 接受2個位置參數，但給定了3個 fun(1, 1, 2) # ---------- def fun(**kwargs): print(kwargs) # TypeError: fun() takes 0 positional arguments but 3 were given # TypeError: 接受0個位置參數，但給定了3個 fun(1, 2, 3) fun(*{"a": 1, "b": 2, "c": 3}) fun(a=1, b=2, c=3) # ok # ---------- def fun(*args): print(args) # TypeError: fun() got an unexpected keyword argument 'a' # TypeError: 獲得了意外的關鍵字參數“a” fun(a=1, b=2, c=3) fun(*{"a": 1, "b": 2, "c": 3}) # ok # -------------------------------------------- # 位置參數 def fun(name): pass fun('foo') # ---------- # 位置參數、默認參數 def fun(name, value="def", value2="def2"): pass # 以參數名稱傳參 fun('foo', value2="val2") # ---------- # 可變參數 ——— 參數數量不定 def fun(*args): print(args) fun(1, 2, 3) # (1, 2, 3) fun(*(1, 2, 3)) # (1, 2, 3) fun(*{"a": 1, "b": 2, "c": 3}) # ('a', 'b', 'c') def fun(a, *args): print(a, args) fun(1, 2, 3) # 1 (2, 3) def fun(a, *args, d): print(a, args, d) fun(1, 2, 3, 4) # TypeError: fun() missing 1 required keyword-only argument: 'd' fun(1, 2, 3, d=4) # 1 (2, 3) 4 # ---------- # 關鍵字參數 ——— 以參數名稱傳參，參數數量不定 def fun(**kwargs): print(kwargs) fun(a=1, b=2, c=3) # {'a': 1, 'b': 2, 'c': 3} fun(**{"a": 1, "b": 2, "c": 3}) # {'a': 1, 'b': 2, 'c': 3} def fun(a, **kwargs): print(a, kwargs) fun(a=1, b=2, c=3) # 1 {'b': 2, 'c': 3} def fun(a, **kwargs, d=4): ^ SyntaxError: arguments cannot follow var-keyword argument # ---------- # 可變參數、關鍵字參數 def fun(*args, **kwargs): print(args, kwargs) fun(1, 2, 3, a=1, b=2, c=3) # (1, 2, 3) {'a': 1, 'b': 2, 'c': 3} fun(1, 2, 3, **{"a": 1, "b": 2, "c": 3}) # (1, 2, 3) {'a': 1, 'b': 2, 'c': 3} # ---------- # 命名關鍵字參數 ——— 命名關鍵字參數 c 必傳、b 有默認值 def fun(a, *, b, c=3): print(a, b, c) # TypeError: fun() takes 1 positional argument but 3 were given fun(1, 2, 3) fun(1, b=2) # 1 2 3 def fun(*, b, c=3, **kwargs): print(b, c, kwargs) fun(a=1, d=4) # TypeError: fun() missing 1 required keyword-only argument: 'b' fun(a=1, b=2, d=4) # 2 3 {'a': 1, 'd': 4} def fun(*args, b=2, c, **kwargs): print(args, b, c, kwargs) fun(1, 2, a=1, d=4) # TypeError: fun() missing 1 required keyword-only argument: 'c' fun(1, 2, a=1, c=3, d=4) # (1, 2) 2 3 {'a': 1, 'd': 4} ``` ---- #### 高級特性想到就能做到，為什么不呢。 **切片：** `[a:b:t]` 以最自然，效率最高的方式操縱 list 或 tuple。 ```python L = ['Michael', 'Sarah', 'Tracy', 'Bob', 'Jack'] # 取指定索引范圍 >>> L[1:3] ['Sarah', 'Tracy'] # 倒數第一個元素的索引是 -1 >>> L[-2:] ['Bob', 'Jack'] >>> L[-2:-1] ['Bob'] ``` `L[0:3]`表示，從索引`0`開始取，直到索引`3`為止，但不包括索引`3`。即索引`0`，`1`，`2`，正好是3個元素。 ```python L = list(range(100)) # 前10個數，每兩個取一個： >>> L[:10:2] [0, 2, 4, 6, 8] # 原樣復制 >>> L[:] [0, 1, 2, 3, ..., 99] ``` 有了切片操作，很多地方循環就不再需要了。Python的切片非常靈活，一行代碼就可以實現很多行循環才能完成的操作。在很多編程語言中，針對字符串提供了很多各種截取函數（例如，substring），其實目的就是對字符串切片。Python沒有針對字符串的截取函數，只需要切片一個操作就可以完成，非常簡單。 list，tuple，string 都可以進行切片，操作結果類型仍是原類型。 ---- **迭代**：如果給定一個 list 或 tuple，我們可以通過 `for ... in` 循環來遍歷這個 list 或 tuple，**這種遍歷我們稱為迭代（`Iteration`）**。 python 的 `for` 循環還可以作用在其他可迭代對象上。通過下標完成的遍歷在 python 中不是迭代。 ```python for (i=0; i<length; i++) { n = list[i]; } ``` ```python >>> d = {'a': 1, 'b': 2, 'c': 3} >>> for key in d: ... print(key) ... a c b ``` 默認情況下， dict 迭代的是 key 。如果要迭代 value ，可以用 `for value in d.values()`，如果要同時迭代 key 和 value ，可以用 `for k, v in d.items()`。 ```python >>> l = {'a':1} >>> l {'a': 1} >>> l.items() dict_items([('a', 1)]) >>> type(l.items()) <class 'dict_items'> >>> type(l) <class 'dict'> ``` 只要作用于一個可迭代對象，for循環就可以正常運行，而我們不太關心該對象究竟是list還是其他數據類型。 ```python >>> from collections.abc import Iterable >>> isinstance('abc', Iterable) # str 是否可迭代 True >>> isinstance([1,2,3], Iterable) # list 是否可迭代 True >>> isinstance(123, Iterable) # 整數是否可迭代 False ``` `for` 循環里，同時引用了兩個變量： ``` # 列表如何遍歷到下標呢？ >>> for i, value in enumerate(['A', 'B', 'C']): ... print(i, value) ... 0 A 1 B 2 C # 元組可以遍歷到下標 >>> for x, y in [(1, 1), (2, 4), (3, 9)]: ... print(x, y) ... 1 1 2 4 3 9 ``` ---- **列表生成式**：列表生成式即 List Comprehensions，是 Python 內置的非常簡單卻強大的可以**用來創建 list 的生成式。** 生成 list 最簡單的方式是 `range()` 方法： ```python >>> list(range(1, 11)) [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] ``` 但如果要生成更復雜的列表時，就需要用表現力更強的 **列表生成式**了： ```python >>> [x * x for x in range(1, 11)] [1, 4, 9, 16, 25, 36, 49, 64, 81, 100] ``` for 循環后面還可以加上 if 判斷作為過濾條件以控制是否生成元素： ```python >>> [x * x for x in range(1, 11) if x % 2 == 0] [4, 16, 36, 64, 100] ``` 在一個列表生成式中，`for`前面的`if ... else`是表達式，而`for`后面的`if`是過濾條件，不能帶`else`： ```python >>> [x if x % 2 == 0 else -x for x in range(1, 11)] [-1, 2, -3, 4, -5, 6, -7, 8, -9, 10] ``` 還可以使用兩層循環，可以生成全排列： ```python >>> [m + n for m in 'ABC' for n in 'XYZ'] ['AX', 'AY', 'AZ', 'BX', 'BY', 'BZ', 'CX', 'CY', 'CZ'] ``` 列表生成式也可以使用兩個變量來生成 list ： ```python >>> d = {'x': 'A', 'y': 'B', 'z': 'C' } >>> [k + '=' + v for k, v in d.items()] ['y=B', 'x=A', 'z=C'] ``` ~~~ [ v for v in Iteration ] [ 表達式 for 元素 in 迭代對象 ] [ ... (表達式 for 元素 in 迭代對象) for 元素 in 迭代對象 ] 嵌套表達式從最右邊開始計算 ~~~ ---- **生成器**：通過列表生成式，我們可以直接創建一個列表。但是，受到內存限制，列表容量肯定是有限的。而生成器就可以解決這個問題，邊循環邊計算。創建列表和生成器的區別僅在于最外層的`[]`和`()`： ```python >>> g = (x * x for x in range(10)) >>> g <generator object <genexpr> at 0x1022ef630> # next() 生成下一個值 >>> next(g) 0 >>> next(g) 1 ... >>> next(g) 81 # 沒有更多的元素時，拋出StopIteration的錯誤。 >>> next(g) Traceback (most recent call last): File "<stdin>", line 1, in <module> StopIteration ``` 通過for循環來迭代它，并且不需要關心StopIteration的錯誤。 ```python >>> g = (x * x for x in range(10)) >>> for n in g: ... print(n) 0 1 ... 81 ``` 函數 generator，可以從第一個元素開始，推算出后續任意的元素，這種邏輯其實非常類似 generator： ```python def fib(max): n, a, b = 0, 0, 1 while n < max: yield b a, b = b, a + b n = n + 1 return 'done' >>> f = fib(6) >>> f <generator object fib at 0x104feaaa0> ``` generator 函數和普通函數的執行流程不一樣。普通函數是順序執行，遇到 `return` 語句或者最后一行函數語句就返回。而變成 generator 的函數，直接調用生成 generator 對象，在每次調用 `next()` 的時候執行，遇到 `yield` 語句返回，再次 `next()` 執行時從上次返回的 `yield` 語句處繼續執行。 ```python >>> for n in fib(6): ... print(n) ... 1 1 2 3 5 8 ``` for 循環調用 generator 時，發現拿不到 generator 的 return語句的返回值。如果想要拿到返回值，必須捕獲 StopIteration 錯誤，返回值包含在 StopIteration 的 value 中： ```python >>> g = fib(6) >>> while True: ... try: ... x = next(g) ... print('g:', x) ... except StopIteration as e: ... print('Generator return value:', e.value) ... break ... g: 1 g: 1 g: 2 g: 3 g: 5 g: 8 Generator return value: done ``` 要理解 generator 的工作原理，它是在 for 循環的過程中不斷計算出下一個元素，并在適當的條件結束 for 循環。對于函數改成的 generator 來說，**遇到 return 語句或者執行到函數體最后一行語句，就是結束generator 的指令**，for 循環隨之結束。 ---- **迭代器**： list、tuple、dict、set、str、generator 等，這些可以直接作用于 for 循環的對象統稱為**可迭代對象**：**`Iterable`**。可以使用 `isinstance(x, Iterable)` 判斷一個對象是否是 `Iterable` **可迭代對象**： ```python >>> from collections.abc import Iterable >>> isinstance([], Iterable) True >>> isinstance({}, Iterable) True >>> isinstance('abc', Iterable) True >>> isinstance((x for x in range(10)), Iterable) True >>> isinstance(100, Iterable) False ``` 生成器不但可以作用于 for 循環，還可以被 next() 函數不斷調用并返回下一個值，直到最后拋出 StopIteration 錯誤表示無法繼續返回下一個值了。可以被 next() 函數調用并不斷返回下一個值的對象稱為 **迭代器**：`Iterator`。(表示一個惰性計算的序列) 可以使用 `isinstance(x, Iterator)` 判斷一個對象是否是 `Iterator` **迭代器對象**： ```python >>> from collections.abc import Iterator >>> isinstance((x for x in range(10)), Iterator) True >>> isinstance([], Iterator) False >>> isinstance({}, Iterator) False >>> isinstance('abc', Iterator) False ``` generator 生成器既是 `Iterable` **可迭代對象**，也是 `Iterator` **迭代器對象** ，但 list、dict、str 雖然是 `Iterable` **可迭代對象**，卻不是 `Iterator` **迭代器對象**。把 list、dict、str 等 `Iterable` **可迭代對象** 變成 `Iterator` **迭代器對象** 可以使用 iter() 函數： ```python >>> isinstance(iter([]), Iterator) True >>> isinstance(iter('abc'), Iterator) True ``` 為什么 list、dict、str 等數據類型不是 `Iterator` **迭代器對象** ？這是因為 Python 的 `Iterator` **迭代器對象** 表示的是一個數據流，Iterator 對象可以被 next() 函數調用并不斷返回下一個數據，直到沒有數據時拋出 StopIteration 錯誤。可以把這個數據流看做是一個有序序列，但我們卻**不能提前知道序列的長度**，只能不斷通過 next() 函數實現**按需計算**下一個數據， **所以 Iterator 的計算是惰性的，只有在需要返回下一個數據時它才會計算。** Iterator 甚至可以表示一個無限大的數據流，例如全體自然數。而使用 list 是永遠不可能存儲全體自然數的。 ---- #### 函數式編程通過把代碼封裝成函數，把復雜任務分解成簡單的任務，這種分解可以稱之為面向過程的程序設計。函數就是面向過程的程序設計的基本單元。 **函數式編程**（請注意多了一個“式”字）——Functional Programming，雖然也可以歸結到面向過程的程序設計，但**其思想更接近數學計算**。計算是指數學意義上的計算，越是抽象的計算，離計算機硬件越遠。函數式編程就是一種抽象程度很高的編程范式，**純粹的函數式編程語言編寫的函數沒有變量**，因此，任意一個函數，**只要輸入是確定的，輸出就是確定的**，這種純函數我們稱之為**沒有副作用**。而允許使用變量的程序設計語言，由于函數內部的變量狀態不確定，同樣的輸入，可能得到不同的輸出，因此，這種函數是有副作用的。函數式編程的一個特點就是，允許**把函數本身作為參數**傳入另一個函數，還允許**返回一個函數**！ **高階函數**：接收另一個函數作為參數，這種函數就稱之為高階函數。 **map/reduce**： ```python >>> def f(x): ... return x * x ... >>> r = map(f, [1, 2, 3, 4, 5, 6, 7, 8, 9]) >>> list(r) [1, 4, 9, 16, 25, 36, 49, 64, 81] ``` ```python reduce(f, [x1, x2, x3, x4]) = f(f(f(x1, x2), x3), x4) >>> from functools import reduce >>> def add(x, y): ... return x + y ... >>> reduce(add, [1, 3, 5, 7, 9]) 25 ``` **filter**： Python 內建的 filter() 函數用于過濾序列： ```python >>> def is_odd(n): ... return n % 2 == 1 ... >>> list(filter(is_odd, [1, 2, 4, 5, 6, 9, 10, 15])) [1, 5, 9, 15] ``` **sorted**： Python 內置的 sorted() 函數就可以對 list 進行排序： ```python sorted([36, 5, -12, 9, -21], key=abs) [5, 9, -12, -21, 36] ``` ---- **函數作為返回值**：高階函數除了可以接受函數作為參數外，還可以把函數作為結果值返回： ```python def lazy_sum(*args): def sum(): ax = 0 for n in args: ax = ax + n return ax return sum >>> f = lazy_sum(1, 3, 5, 7, 9) >>> f <function lazy_sum.<locals>.sum at 0x101c6ed90> >>> f() 25 ``` **閉包與 nonlocal**： ```python def inc(): x = 0 def fn(): nonlocal x x = x + 1 return x return fn f = inc() print(f()) # 1 print(f()) # 2 ``` **匿名函數 lambda**： `lambda x: x * x` 實際上就是： ```python def f(x): return x * x ``` 用匿名函數有個好處，因為函數沒有名字，不必擔心函數名沖突。此外，匿名函數也是一個函數對象，也可以把匿名函數賦值給一個變量，再利用變量來調用該函數： ```python >>> f = lambda x: x * x >>> f <function <lambda> at 0x101c6ef28> >>> f(5) 25 def build(x, y): return lambda: x * x + y * y ``` ---- **裝飾器**：函數對象有一個`__name__`屬性（注意：是前后各兩個下劃線），可以拿到函數的名字： ```python >>> def now(): ... print('2015-3-25') ... >>> f = now >>> f() 2015-3-25 >>> now.__name__ 'now' >>> f.__name__ 'now' ``` 代碼運行期間動態增加功能的方式，稱之為“裝飾器”（Decorator）。 `wrapper()` 函數的參數定義是 `(*args, **kw)` ，因此，`wrapper()` 函數可以接受任意參數的調用。 ```python def log(func): def wrapper(*args, **kw): print('call %s():' % func.__name__) return func(*args, **kw) return wrapper @log def now(): print('2015-3-25') >>> now() call now(): 2015-3-25 ``` 帶參數的裝飾器： ```python def log(text): def decorator(func): def wrapper(*args, **kw): print('%s %s():' % (text, func.__name__)) return func(*args, **kw) return wrapper return decorator @log('execute') def now(): print('2015-3-25') >>> now() execute now(): 2015-3-25 ``` `__name__` 等屬性的處理： ```python import functools def log(func): @functools.wraps(func) def wrapper(*args, **kw): print('call %s():' % func.__name__) return func(*args, **kw) return wrapper ``` 在面向對象（OOP）的設計模式中，decorator 被稱為裝飾模式。OOP 的裝飾模式需要通過繼承和組合來實現，而 Python 除了能支持 OOP 的 decorator 外，直接從語法層次支持 decorator 。Python 的 decorator 可以用函數實現，也可以用類實現。 **偏函數**：當函數的參數個數太多，需要簡化時，使用 **`functools.partial`偏函數** 可以創建一個新的函數，這個新函數可以固定住原函數的部分參數，從而在調用時更簡單。 ```python >>> import functools >>> int2 = functools.partial(int, base=2) >>> int2('1000000') 64 >>> int2('1010101') 85 # 也可以在函數調用時傳入其他值 >>> int2('1000000', base=10) 1000000 ``` `functools.partial` 偏函數的作用就是，把一個函數的某些參數給固定住（也就是設置默認值），返回一個新的函數，調用這個新函數會更簡單。當作為位置參數時，默認加在左邊： ```python max2 = functools.partial(max, 10) max2(5, 6, 7) args = (10, 5, 6, 7) max(*args) ``` ---- #### 模塊在 Python 中，一個 `.py` 文件就稱之為一個模塊（Module）。使用模塊最大的好處是**大大提高了代碼的可維護性**。其次，**編寫代碼不必從零開始。當一個模塊編寫完畢，就可以被其他地方引用**。我們在編寫程序的時候，也經常引用其他模塊，包括 Python 內置的模塊和來自第三方的模塊。使用模塊還**可以避免函數名和變量名沖突**。但是也要注意，盡量不要與內置函數名字沖突。為了避免模塊名沖突，Python 又引入了按目錄來組織模塊的方法，稱為**包（Package）**。 ~~~ mycompany ├─ __init__.py ├─ abc.py └─ xyz.py ~~~ 請注意，每一個包目錄下面都會有一個 `__init__.py` 的文件，這個文件是必須存在的，否則，Python 就把這個目錄當成普通目錄，而不是一個包。`__init__.py` 可以是空文件，也可以有 Python 代碼，因為`__init__.py` 本身就是一個模塊，而它的模塊名就是`mycompany`。多級目錄結構： ~~~ mycompany ├─ web │ ├─ __init__.py │ ├─ utils.py │ └─ www.py ├─ __init__.py ├─ abc.py └─ utils.py ~~~ 文件`www.py`的模塊名就是`mycompany.web.www`，兩個文件`utils.py`的模塊名分別是`mycompany.utils`和`mycompany.web.utils`。 **定義模塊**： `hello.py` ```python #!/usr/bin/env python3 # -*- coding: utf-8 -*- ' a test module ' __author__ = 'Michael Liao' import sys def test(): args = sys.argv if len(args)==1: print('Hello, world!') elif len(args)==2: print('Hello, %s!' % args[1]) else: print('Too many arguments!') if __name__=='__main__': test() ``` 當我們在命令行運行 hello 模塊文件時，Python 解釋器把一個特殊變量 `__name__` 置為`__main__`，而如果在其他地方導入該 hello 模塊時，if判斷將失敗，因此，這種 `if` 測試可以讓一個模塊通過命令行運行時執行一些額外的代碼，最常見的就是運行測試。 **作用域**：類似`_xxx`和`__xxx`這樣的函數或變量就是非公開的（private），不應該被直接引用，比如`_abc`，`__abc`等；之所以我們說，private 函數和變量“不應該”被直接引用，而不是“不能”被直接引用，是因為 Python 并沒有一種方法可以完全限制訪問 private 函數或變量，但是，從編程習慣上不應該引用 private 函數或變量。 **模塊導入**： ``` 從 xxx 導入 aaa ``` ```python from collections.abc import Iterable ``` python import 相當于 php 中的 `require_once`，**只會引入、執行一次。** ---- #### 面向對象 [Python語法筆記：魔術方法 - 知乎](https://zhuanlan.zhihu.com/p/619847950?utm_id=0) 面向對象的抽象程度又比函數要高，因為一個 Class 既包含數據，又包含操作數據的方法。類是創建實例的模板，而實例則是一個一個具體的對象，各個實例擁有的數據都互相獨立，互不影響。方法就是與實例綁定的函數，和普通函數不同，方法可以直接訪問實例的數據。 ```python class Student(object): def __init__(self, name, score): self.name = name self.score = score def print_score(self): print('%s: %s' % (self.name, self.score)) ``` **訪問限制**：如果要讓內部屬性不被外部訪問，可以把屬性的名稱前加上兩個下劃線`__`，在 Python 中，實例的變量名如果以`__`開頭，就變成了一個私有變量（private），只有內部可以訪問，外部不能訪問，所以，我們把 Student 類改一改： ```python class Student(object): def __init__(self, name, score): self.__name = name self.__score = score def print_score(self): print('%s: %s' % (self.__name, self.__score)) ``` 改完后，對于外部代碼來說，沒什么變動，但是已經無法從外部訪問`實例變量.__name`和`實例變量.__score`了： ```python >>> bart = Student('Bart Simpson', 59) >>> bart.__name Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'Student' object has no attribute '__name' ``` **繼承和多態**：繼承，最大的好處是子類獲得了父類的全部功能。多態，子類可以覆蓋父類的方法，從而能夠實現讓子類在一致的規則接口下實現不同的功能。 ```python class Animal(object): def run(self): print('Animal is running...') class Dog(Animal): pass class Cat(Animal): pass ``` Python 動態語言的“鴨子類型”，它并不要求嚴格的繼承體系，一個對象只要“看起來像鴨子，走起路來像鴨子”，那它就可以被看做是鴨子。 **獲取對象信息**： ```python >>> type(123)==type(456) True >>> type(123)==int True >>> type('abc')==type('123') True >>> type('abc')==str True >>> type('abc')==type(123) False ``` ```python >>> import types >>> def fn(): ... pass ... >>> type(fn)==types.FunctionType True >>> type(abs)==types.BuiltinFunctionType True >>> type(lambda x: x)==types.LambdaType True >>> type((x for x in range(10)))==types.GeneratorType True ``` ```python # object -> Animal -> Dog -> Husky >>> a = Animal() >>> d = Dog() >>> h = Husky() >>> isinstance(h, Husky) True >>> isinstance(h, Dog) True >>> isinstance(h, Animal) True >>> isinstance(d, Husky) False ``` ```python >>> isinstance('a', str) True >>> isinstance(123, int) True >>> isinstance(b'a', bytes) True ``` ```python # 判斷一個變量是否是某些類型中的一種 >>> isinstance([1, 2, 3], (list, tuple)) True >>> isinstance((1, 2, 3), (list, tuple)) True ``` > 總是優先使用 isinstance() 判斷類型，可以將指定類型及其子類“一網打盡”。 **使用 dir()**：要獲得一個對象的所有屬性和方法，可以使用dir()函數，它返回一個包含字符串的 list ： ```python >>> dir('ABC') ['__add__', '__class__',..., '__subclasshook__', 'capitalize', 'casefold',..., 'zfill'] ``` 類似`__xxx__`的屬性和方法在Python中都是有特殊用途的： ```python >>> len('ABC') 3 >>> 'ABC'.__len__() 3 ``` 我們自己寫的類，如果也想用 len(myObj) 的話，就自己寫一個`__len__()`方法： ```python >>> class MyDog(object): ... def __len__(self): ... return 100 ... >>> dog = MyDog() >>> len(dog) 100 ``` **etattr()、setattr()、hasattr()**： ```python >>> getattr(obj, 'z') # 獲取屬性'z' Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'MyObject' object has no attribute 'z' >>> getattr(obj, 'z', 404) # 獲取屬性'z'，如果不存在，返回默認值404 404 >>> hasattr(obj, 'power') # 有屬性'power'嗎？ True >>> getattr(obj, 'power') # 獲取屬性'power' <bound method MyObject.power of <__main__.MyObject object at 0x10077a6a0>> >>> fn = getattr(obj, 'power') # 獲取屬性'power'并賦值到變量fn >>> fn # fn指向obj.power <bound method MyObject.power of <__main__.MyObject object at 0x10077a6a0>> >>> fn() # 調用fn()與調用obj.power()是一樣的 81 ``` **實例屬性和類屬性**： > 屬性分為類屬性和實例屬性，這和其它語言，比如 php 中的類、對象就有所區別。當我們定義了一個類屬性后，這個屬性雖然歸類所有，但類的所有實例都可以訪問到： ```python >>> class Student(object): ... name = 'Student' ... >>> s = Student() # 創建實例s >>> print(s.name) # 打印name屬性，因為實例并沒有name屬性，所以會繼續查找class的name屬性 Student >>> print(Student.name) # 打印類的name屬性 Student >>> s.name = 'Michael' # 給實例綁定name屬性 >>> print(s.name) # 由于實例屬性優先級比類屬性高，因此，它會屏蔽掉類的name屬性 Michael >>> print(Student.name) # 但是類屬性并未消失，用Student.name仍然可以訪問 Student >>> del s.name # 如果刪除實例的name屬性 >>> print(s.name) # 再次調用s.name，由于實例的name屬性沒有找到，類的name屬性就顯示出來了 Student ``` ---- #### 高級面向對象編程 **__slots__：定義允許綁定的屬性** ```python class Student(object): __slots__ = ('name', 'age') # 用tuple定義允許綁定的屬性名稱 ``` ```python >>> s = Student() # 創建新的實例 >>> s.name = 'Michael' # 綁定屬性'name' >>> s.age = 25 # 綁定屬性'age' >>> s.score = 99 # 綁定屬性'score' Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'Student' object has no attribute 'score' ``` 使用 `__slots__` 要注意，`__slots__` 定義的屬性僅對當前類實例起作用，對繼承的子類是不起作用的。除非在子類中也定義`__slots__`，這樣，子類實例允許定義的屬性就是自身的`__slots__`加上父類的`__slots__`。 **`@property`裝飾器就是把一個方法變成屬性調用：** ```python class Student(object): @property def score(self): return self._score # @property 本身又創建了另一個裝飾器@score.setter，負責把一個setter方法變成屬性賦值 @score.setter def score(self, value): if not isinstance(value, int): raise ValueError('score must be an integer!') if value < 0 or value > 100: raise ValueError('score must between 0 ~ 100!') self._score = value # 只定義getter方法，就是一個只讀屬性 @property def age(self): return 2015 - self._birth ``` >[tip] 需要注意的是，屬性的方法名不要和實例變量重名，否則造成無限遞歸，最終導致棧溢出報錯`RecursionError`。 ---- **多重繼承**：通過多重繼承，一個子類就可以同時獲得多個父類的所有功能。 ```python # 奔跑的哺乳動物，例如狗： class Dog(Mammal, Runnable): pass # 可以飛翔的哺乳動物，例如蝙蝠： class Bat(Mammal, Flyable): pass # 不能飛翔的鳥類，例如鴕鳥： class Ostrich(Bird, Runnable): pass # 可以飛翔的鳥類，例如鸚鵡： class Parrot(Bird, Flyable): pass ``` **混合 MixIn**：多重繼承的設計通常稱之為 MixIn ，為了更好地看出繼承關系，我們把`Runnable`和`Flyable`改為`RunnableMixIn`和`FlyableMixIn`。類似的，你還可以定義出肉食動物`CarnivorousMixIn`和植食動物`HerbivoresMixIn`，讓某個動物同時擁有好幾個`MixIn`： ```python class Dog(Mammal, RunnableMixIn, CarnivorousMixIn): pass # 比如，編寫一個多進程模式的TCP服務，定義如下： class MyTCPServer(TCPServer, ForkingMixIn): pass # 編寫一個多線程模式的UDP服務，定義如下： class MyUDPServer(UDPServer, ThreadingMixIn): pass # 如果你打算搞一個更先進的協程模型，可以編寫一個 CoroutineMixIn： class MyTCPServer(TCPServer, CoroutineMixIn): pass ``` 一個類繼承一個主類，其他都是 `MixIn`。 ---- **定制類**： `__str__` 輸出實例時的魔術方法： ```python class Student(object): def __init__(self, name): self.name = name def __str__(self): return 'Student object (name=%s)' % self.name __repr__ = __str__ ``` `__iter__` 如果一個類想被用于for ... in循環，類似list或tuple那樣，就必須實現一個__iter__()方法，該方法返回一個迭代對象，然后，Python的for循環就會不斷調用該迭代對象的__next__()方法拿到循環的下一個值，直到遇到StopIteration錯誤時退出循環。 ```python class Fib(object): def __init__(self): self.a, self.b = 0, 1 # 初始化兩個計數器a，b def __iter__(self): return self # 實例本身就是迭代對象，故返回自己 def __next__(self): self.a, self.b = self.b, self.a + self.b # 計算下一個值 if self.a > 100000: # 退出循環的條件 raise StopIteration() return self.a # 返回下一個值 ``` `__getitem__` Fib實例雖然能作用于 for 循環，看起來和 list 有點像，但是，把它當成 list 來使用還是不行，比如，取第5個元素： ```python class Fib(object): def __getitem__(self, n): a, b = 1, 1 for x in range(n): a, b = b, a + b return a >>> f = Fib() >>> f[0] 1 >>> f[1] 1 >>> f[2] 2 >>> f[3] 3 >>> f[10] 89 >>> f[100] 573147844013817084101 ``` `__getattr__` 當調用不存在的屬性時，`__get` 魔術方法： ```python class Student(object): def __init__(self): self.name = 'Michael' def __getattr__(self, attr): if attr=='score': return 99 raise AttributeError('\'Student\' object has no attribute \'%s\'' % attr) ``` `__call__` 調用實例本身： ```python class Student(object): def __init__(self, name): self.name = name def __call__(self): print('My name is %s.' % self.name) >>> s = Student('Michael') >>> s() # self參數不要傳入 My name is Michael. ``` ```python >>> callable(Student()) True >>> callable(max) True >>> callable([1, 2, 3]) False >>> callable(None) False >>> callable('str') False ``` 通過 callable() 函數，我們就可以判斷一個對象是否是“可調用”對象。還有很多可定制的方法，請參考 Python 的官方文檔 [3. 數據模型 - 特殊方法名稱 — Python 3.11.3 文檔](https://docs.python.org/zh-cn/3/reference/datamodel.html#special-method-names)。 ---- **枚舉類**： ```python from enum import Enum, unique @unique class Weekday(Enum): Sun = 0 # Sun的value被設定為0 Mon = 1 Tue = 2 Wed = 3 Thu = 4 Fri = 5 Sat = 6 ``` **`metaclass` 元類：** 根據 metaclass 創建出類： ```python # metaclass是類的模板，所以必須從`type`類型派生： class ListMetaclass(type): def __new__(cls, name, bases, attrs): attrs['add'] = lambda self, value: self.append(value) return type.__new__(cls, name, bases, attrs) class MyList(list, metaclass=ListMetaclass): pass >>> L = MyList() >>> L.add(1) >> L [1] ``` metaclass 是 Python 中非常具有魔術性的對象，**它可以改變類創建時的行為**。這種強大的功能使用起來務必小心。 ---- #### 錯誤處理 **錯誤、異常** ```python try: print('try...') r = 10 / 0 print('result:', r) except ZeroDivisionError as e: print('except:', e) finally: print('finally...') print('END') try... except: division by zero finally... END ``` **拋出錯誤：** 錯誤并不是憑空產生的，而是有意創建并拋出的。Python的內置函數會拋出很多類型的錯誤，我們自己編寫的函數也可以拋出錯誤。 ```python class FooError(ValueError): pass def foo(s): n = int(s) if n==0: raise FooError('invalid value: %s' % s) return 10 / n foo('0') ``` Python 內置的 `try...except...finally` 用來處理錯誤十分方便。出錯時，會分析錯誤信息并定位錯誤發生的代碼位置才是最關鍵的。程序也可以主動拋出錯誤，讓調用者來處理相應的錯誤。但是，**應該在文檔中寫清楚可能會拋出哪些錯誤，以及錯誤產生的原因。** [內置異常 — Python 3.11.3 文檔](https://docs.python.org/zh-cn/3/library/exceptions.html#exception-hierarchy) [優雅地處理 Python 中的異常？Merry工具包做到了！ - 知乎](https://zhuanlan.zhihu.com/p/390111195) **日志調試** logging 模塊 [Python中logging模塊的基本用法 | 靜覓](https://cuiqingcai.com/6080.html) > 任何一款軟件如果沒有標準的日志記錄，都不能算作一個合格的軟件。作為開發者，我們需要重視并做好日志記錄過程。 ~~~ Logger -> Log Record -> Filter -> Formatter -> Handler DEBUG > INFO > WARNING > ERROR > CRITICAL ~~~ [python中logging日志模塊詳解 - 咸魚也是有夢想的 - 博客園](https://www.cnblogs.com/xianyulouie/p/11041777.html) [python logging設置顏色-掘金](https://juejin.cn/s/python%20logging%E8%AE%BE%E7%BD%AE%E9%A2%9C%E8%89%B2) [xolox/python-coloredlogs: Colored terminal output for Python's logging module](https://github.com/xolox/python-coloredlogs) [python的logging日志模塊_logging.basicconfig(level=logging.debug)_FlyLikeButterfly的博客-CSDN博客](https://blog.csdn.net/FlyLikeButterfly/article/details/120223112) > `[%(levelname)-8s]` 寬度對齊 [Rich：Python開發者的完美終端工具！ - 知乎](https://zhuanlan.zhihu.com/p/394105084) [rich/README.cn.md at master · Textualize/rich](https://github.com/textualize/rich/blob/master/README.cn.md) > https://github.com/Textualize/rich/issues/988 顏色有坑，暫時不用，先用標準的日志吧 ---- **單元測試**：單元測試是用來對一個模塊、一個函數或者一個類來進行正確性檢驗的測試工作。如果單元測試通過，說明我們測試的這個函數能夠正常工作。如果單元測試不通過，要么函數有bug，要么測試條件輸入不正確，總之，需要修復使單元測試能夠通過。單元測試通過后有什么意義呢？如果我們對abs()函數代碼做了修改，只需要再跑一遍單元測試，如果通過，說明我們的修改不會對abs()函數原有的行為造成影響，如果測試不通過，說明我們的修改與原有行為不一致，要么修改代碼，要么修改測試。這種以測試為驅動的開發模式最大的好處就是**確保一個程序模塊的行為符合我們設計的測試用例**。在將來修改的時候，可以極大程度地保證該模塊行為仍然是正確的。單元測試的測試用例要覆蓋常用的輸入組合、**邊界條件和異常**。單元測試通過了并不意味著程序就沒有bug了，但是不通過程序肯定有bug。 ```python import unittest from mydict import Dict class TestDict(unittest.TestCase): def test_init(self): d = Dict(a=1, b='test') self.assertEqual(d.a, 1) self.assertEqual(d.b, 'test') self.assertTrue(isinstance(d, dict)) def test_key(self): d = Dict() d['key'] = 'value' self.assertEqual(d.key, 'value') def test_attr(self): d = Dict() d.key = 'value' self.assertTrue('key' in d) self.assertEqual(d['key'], 'value') def test_keyerror(self): d = Dict() with self.assertRaises(KeyError): value = d['empty'] def test_attrerror(self): d = Dict() with self.assertRaises(AttributeError): value = d.empty ``` **文檔測試**：自動執行寫在注釋中的代碼，doctest 嚴格按照 Python 交互式命令行的輸入和輸出來判斷測試結果是否正確。只有測試異常的時候，可以用 ... 表示中間一大段煩人的輸出： ```python # mydict2.py class Dict(dict): ''' Simple dict but also support access as x.y style. >>> d1 = Dict() >>> d1['x'] = 100 >>> d1.x 100 >>> d1.y = 200 >>> d1['y'] 200 >>> d2 = Dict(a=1, b=2, c='3') >>> d2.c '3' >>> d2['empty'] Traceback (most recent call last): ... KeyError: 'empty' >>> d2.empty Traceback (most recent call last): ... AttributeError: 'Dict' object has no attribute 'empty' ''' def __init__(self, **kw): super(Dict, self).__init__(**kw) def __getattr__(self, key): try: return self[key] except KeyError: raise AttributeError(r"'Dict' object has no attribute '%s'" % key) def __setattr__(self, key, value): self[key] = value if __name__=='__main__': import doctest doctest.testmod() ``` ---- #### 進程管理 **IO編程**： IO編程中，Stream（流）是一個很重要的概念，可以把流想象成一個水管，數據就是水管里的水，但是只能單向流動。Input Stream就是數據從外面（磁盤、網絡）流進內存，Output Stream就是數據從內存流到外面去。對于瀏覽網頁來說，瀏覽器和新浪服務器之間至少需要建立兩根水管，才可以既能發數據，又能收數據。 ---- **異步IO**： [Python 異步編程入門 - 阮一峰的網絡日志](https://www.ruanyifeng.com/blog/2019/11/python-asyncio.html) **網絡編程**： **進程、線程**： ---- #### SQLAlchemy [SQLAlchemy文檔 — SQLAlchemy 1.4 Documentation](https://www.osgeo.cn/sqlalchemy/) [用 peewee 代替 SQLAlchemy - Jiajun 的編程隨想](https://jiajunhuang.com/articles/2020_05_29-use_peewee.md.html) [peewee — peewee 3.15.3 文檔](https://www.osgeo.cn/peewee/) ---- #### Scrapy [Scrapy 教程 — Scrapy 2.5.0 文檔](https://www.osgeo.cn/scrapy/intro/tutorial.html) ```shell pip3 install scrapy -i https://pypi.tuna.tsinghua.edu.cn/simple ``` ```shell (tutorial-env) PS D:\web\tutorial-env> python.exe -m pip install --upgrade pip Requirement already satisfied: pip in d:\web\tutorial-env\lib\site-packages (22.3.1) Collecting pip Using cached pip-23.1-py3-none-any.whl (2.1 MB) Installing collected packages: pip Attempting uninstall: pip Found existing installation: pip 22.3.1 Uninstalling pip-22.3.1: Successfully uninstalled pip-22.3.1 Successfully installed pip-23.1 (tutorial-env) PS D:\web\tutorial-env> pip install scrapy Collecting scrapy Downloading Scrapy-2.8.0-py2.py3-none-any.whl (272 kB) ---------------------------------------- 272.9/272.9 kB 884.5 kB/s eta 0:00:00 Collecting Twisted>=18.9.0 (from scrapy) Downloading Twisted-22.10.0-py3-none-any.whl (3.1 MB) ---------------------------------------- 3.1/3.1 MB 1.9 MB/s eta 0:00:00 Collecting cryptography>=3.4.6 (from scrapy) Downloading cryptography-40.0.2-cp36-abi3-win_amd64.whl (2.6 MB) ---------------------------------------- 2.6/2.6 MB 2.0 MB/s eta 0:00:00 Collecting cssselect>=0.9.1 (from scrapy) Downloading cssselect-1.2.0-py2.py3-none-any.whl (18 kB) Collecting itemloaders>=1.0.1 (from scrapy) Downloading itemloaders-1.0.6-py3-none-any.whl (11 kB) Collecting parsel>=1.5.0 (from scrapy) Downloading parsel-1.8.1-py2.py3-none-any.whl (17 kB) Collecting pyOpenSSL>=21.0.0 (from scrapy) Downloading pyOpenSSL-23.1.1-py3-none-any.whl (57 kB) ---------------------------------------- 57.9/57.9 kB 610.9 kB/s eta 0:00:00 Collecting queuelib>=1.4.2 (from scrapy) Downloading queuelib-1.6.2-py2.py3-none-any.whl (13 kB) Collecting service-identity>=18.1.0 (from scrapy) Downloading service_identity-21.1.0-py2.py3-none-any.whl (12 kB) Collecting w3lib>=1.17.0 (from scrapy) Downloading w3lib-2.1.1-py3-none-any.whl (21 kB) Collecting zope.interface>=5.1.0 (from scrapy) Downloading zope.interface-6.0-cp311-cp311-win_amd64.whl (204 kB) ---------------------------------------- 204.1/204.1 kB 1.6 MB/s eta 0:00:00 Collecting protego>=0.1.15 (from scrapy) Downloading Protego-0.2.1-py2.py3-none-any.whl (8.2 kB) Collecting itemadapter>=0.1.0 (from scrapy) Downloading itemadapter-0.8.0-py3-none-any.whl (11 kB) Requirement already satisfied: setuptools in d:\web\tutorial-env\lib\site-packages (from scrapy) (65.5.0) Collecting packaging (from scrapy) Downloading packaging-23.1-py3-none-any.whl (48 kB) ---------------------------------------- 48.9/48.9 kB 1.2 MB/s eta 0:00:00 Collecting tldextract (from scrapy) Downloading tldextract-3.4.0-py3-none-any.whl (93 kB) ---------------------------------------- 93.9/93.9 kB 1.8 MB/s eta 0:00:00 Collecting lxml>=4.3.0 (from scrapy) Downloading lxml-4.9.2-cp311-cp311-win_amd64.whl (3.8 MB) ---------------------------------------- 3.8/3.8 MB 1.7 MB/s eta 0:00:00 Collecting PyDispatcher>=2.0.5 (from scrapy) Downloading PyDispatcher-2.0.7-py3-none-any.whl (12 kB) Collecting cffi>=1.12 (from cryptography>=3.4.6->scrapy) Downloading cffi-1.15.1-cp311-cp311-win_amd64.whl (179 kB) ---------------------------------------- 179.0/179.0 kB 1.5 MB/s eta 0:00:00 Collecting jmespath>=0.9.5 (from itemloaders>=1.0.1->scrapy) Downloading jmespath-1.0.1-py3-none-any.whl (20 kB) Collecting six (from protego>=0.1.15->scrapy) Downloading six-1.16.0-py2.py3-none-any.whl (11 kB) Collecting attrs>=19.1.0 (from service-identity>=18.1.0->scrapy) Downloading attrs-23.1.0-py3-none-any.whl (61 kB) ---------------------------------------- 61.2/61.2 kB 821.3 kB/s eta 0:00:00 Collecting pyasn1-modules (from service-identity>=18.1.0->scrapy) Downloading pyasn1_modules-0.3.0-py2.py3-none-any.whl (181 kB) ---------------------------------------- 181.3/181.3 kB 1.6 MB/s eta 0:00:00 Collecting pyasn1 (from service-identity>=18.1.0->scrapy) Downloading pyasn1-0.5.0-py2.py3-none-any.whl (83 kB) ---------------------------------------- 83.9/83.9 kB 1.6 MB/s eta 0:00:00 Collecting constantly>=15.1 (from Twisted>=18.9.0->scrapy) Downloading constantly-15.1.0-py2.py3-none-any.whl (7.9 kB) Collecting incremental>=21.3.0 (from Twisted>=18.9.0->scrapy) Downloading incremental-22.10.0-py2.py3-none-any.whl (16 kB) Collecting Automat>=0.8.0 (from Twisted>=18.9.0->scrapy) Downloading Automat-22.10.0-py2.py3-none-any.whl (26 kB) Collecting hyperlink>=17.1.1 (from Twisted>=18.9.0->scrapy) Downloading hyperlink-21.0.0-py2.py3-none-any.whl (74 kB) ---------------------------------------- 74.6/74.6 kB 2.0 MB/s eta 0:00:00 Collecting typing-extensions>=3.6.5 (from Twisted>=18.9.0->scrapy) Downloading typing_extensions-4.5.0-py3-none-any.whl (27 kB) Collecting twisted-iocpsupport<2,>=1.0.2 (from Twisted>=18.9.0->scrapy) Downloading twisted_iocpsupport-1.0.3-cp311-cp311-win_amd64.whl (39 kB) Collecting idna (from tldextract->scrapy) Downloading idna-3.4-py3-none-any.whl (61 kB) ---------------------------------------- 61.5/61.5 kB 1.7 MB/s eta 0:00:00 Collecting requests>=2.1.0 (from tldextract->scrapy) Downloading requests-2.28.2-py3-none-any.whl (62 kB) ---------------------------------------- 62.8/62.8 kB 1.6 MB/s eta 0:00:00 Collecting requests-file>=1.4 (from tldextract->scrapy) Downloading requests_file-1.5.1-py2.py3-none-any.whl (3.7 kB) Collecting filelock>=3.0.8 (from tldextract->scrapy) Downloading filelock-3.12.0-py3-none-any.whl (10 kB) Collecting pycparser (from cffi>=1.12->cryptography>=3.4.6->scrapy) Downloading pycparser-2.21-py2.py3-none-any.whl (118 kB) ---------------------------------------- 118.7/118.7 kB 2.4 MB/s eta 0:00:00 Collecting charset-normalizer<4,>=2 (from requests>=2.1.0->tldextract->scrapy) Downloading charset_normalizer-3.1.0-cp311-cp311-win_amd64.whl (96 kB) ---------------------------------------- 96.7/96.7 kB 1.8 MB/s eta 0:00:00 Collecting urllib3<1.27,>=1.21.1 (from requests>=2.1.0->tldextract->scrapy) Downloading urllib3-1.26.15-py2.py3-none-any.whl (140 kB) ---------------------------------------- 140.9/140.9 kB 1.4 MB/s eta 0:00:00 Collecting certifi>=2017.4.17 (from requests>=2.1.0->tldextract->scrapy) Downloading certifi-2022.12.7-py3-none-any.whl (155 kB) ---------------------------------------- 155.3/155.3 kB 2.3 MB/s eta 0:00:00 Installing collected packages: twisted-iocpsupport, PyDispatcher, incremental, constantly, zope.interface, w3lib, urllib3, typing-extensions, six, queuelib, pycparser, pyasn1, packaging, lxml, jmespath, itemadapter, idna, filelock, cssselect, charset-normalizer, certifi, attrs, requests, pyasn1-modules, protego, parsel, hyperlink, cffi, Automat, Twisted, requests-file, itemloaders, cryptography, tldextract, service-identity, pyOpenSSL, scrapy Successfully installed Automat-22.10.0 PyDispatcher-2.0.7 Twisted-22.10.0 attrs-23.1.0 certifi-2022.12.7 cffi-1.15.1 charset-normalizer-3.1.0 constantly-15.1.0 cryptography-40.0.2 cssselect-1.2.0 filelock-3.12.0 hyperlink-21.0.0 idna-3.4 incremental-22.10.0 itemadapter-0.8.0 itemloaders-1.0.6 jmespath-1.0.1 lxml-4.9.2 packaging-23.1 parsel-1.8.1 protego-0.2.1 pyOpenSSL-23.1.1 pyasn1-0.5.0 pyasn1-modules-0.3.0 pycparser-2.21 queuelib-1.6.2 requests-2.28.2 requests-file-1.5.1 scrapy-2.8.0 service-identity-21.1.0 six-1.16.0 tldextract-3.4.0 twisted-iocpsupport-1.0.3 typing-extensions-4.5.0 urllib3-1.26.15 w3lib-2.1.1 zope.interface-6.0 ``` ---- **如何爬取更多鏈接？** 雖然爬蟲是**從一個入口鏈接開始**的，但不要因此就認為它只能完成一次性的簡單爬取任務，我們可在 `parse()` 中根據情況使用 `response.follow(next_page, self.parse)` 、`yield scrapy.Request(next_page, callback=self.parse)` **繼續生成其他請求，以滿足爬取所有其他頁面。** ---- **如何處理和保存爬取到的數據？** ---- **如何使用代理？** ---- **如何分布式大規模爬取？** ---- **如何處理登錄？** ---- **如何處理驗證碼？** ---- **如何處理滑塊等防爬人機驗證？** ---- **如何處理加密防爬？** ---- **如何使用無頭瀏覽器？** ---- **如何管理、控制爬蟲？** ---- #### 常用模塊與三方包 **pip 源鏡像** ~~~ 清華大學：https://pypi.tuna.tsinghua.edu.cn/simple/ 阿里云：http://mirrors.aliyun.com/pypi/simple/ 中國科學技術大學：http://pypi.mirrors.ustc.edu.cn/simple/ 華中科技大學：http://pypi.hustunique.com/ 豆瓣源：http://pypi.douban.com/simple/ 騰訊源：http://mirrors.cloud.tencent.com/pypi/simple 華為鏡像源：https://repo.huaweicloud.com/repository/pypi/simple ~~~ ```shell pip3 install pymysql -i https://pypi.tuna.tsinghua.edu.cn/simple ``` [pip源_淘小欣的博客-CSDN博客](https://blog.csdn.net/weixin_44621343/article/details/116459859) ---- #### venv 為一個應用創建一套“隔離”的 Python 運行環境，使用不同的虛擬環境可以解決不同應用的依賴沖突問題。 ```shell # 創建虛擬環境 python -m venv venv # 激活虛擬環境 source venv/bin/activate ``` [12. 虛擬環境和包 — Python 3.11.3 文檔](https://docs.python.org/zh-cn/3/tutorial/venv.html#tut-venv) **windows 環境**：以**管理員身份**運行 Windows PowerShell ： ```shell PS D:\web\tutorial-env> set-executionpolicy remotesigned PS D:\web\tutorial-env> get-executionpolicy RemoteSigned PS D:\web\tutorial-env> .\Scripts\activate (tutorial-env) PS D:\web\tutorial-env> python3 Python 3.11.2 (tags/v3.11.2:878ead1, Feb 7 2023, 16:38:35) [MSC v.1934 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> sys.path ['', 'D:\\Program Files\\Python311\\python311.zip', 'D:\\Program Files\\Python311\\DLLs', 'D:\\Program Files\\Python311\\Lib', 'D:\\Program Files\\Python311', 'D:\\web\\tutorial-env', 'D:\\web\\tutorial-env\\Lib\\site-packages'] ``` [環境搭建 Python Windows中使用venv - 簡書](https://www.jianshu.com/p/eb08e9198387) [Python3.10.4激活venv環境失敗解決方法_python_腳本之家](https://www.jb51.net/article/272801.htm) [Python3.10.4激活venv環境失敗解決方法](https://it.cha138.com/android/show-73531.html) ---- #### 其他 **正則表達式**： [re --- 正則表達式操作 — Python 3.11.3 文檔](https://docs.python.org/zh-cn/3/library/re.html) ---- #### pymysql 參數綁定 ```python cursor.execute(' select ... %s ', []) ``` 使用參數綁定而不是 sql 拼接，這是防止 SQL 注入的最安全的方法。（注意使用參數綁定時，`%s` 占位符不需要使用引號） https://www.cnpython.com/qa/194936 ---- #### windows 使用 pyautogui 時要注意的 windows 使用 pyautogui 時，不能關掉鏈接，甚至小窗遠程連接也不行 https://www.cnblogs.com/sophia201552/p/13344320.html 按這個方法也不行 > 所以我們通常使用一臺機子當作“監控機” ---- #### 注意庫模塊的隱式引用 ```python C:\Python27\python.exe D:/wamp64/www/xiak-DataValley/test_xiak/t.py Traceback (most recent call last): File "D:/wamp64/www/xiak-DataValley/test_xiak/t.py", line 4, in <module> from selenium import webdriver File "C:\Python27\lib\site-packages\selenium\webdriver\__init__.py", line 27, in <module> from .safari.webdriver import WebDriver as Safari # noqa File "C:\Python27\lib\site-packages\selenium\webdriver\safari\webdriver.py", line 20, in <module> import http.client as http_client File "D:\wamp64\www\xiak-DataValley\test_xiak\http.py", line 17 str = json.dumps(cookies) ^ IndentationError: expected an indented block ``` `C:\Python27\lib\site-packages\selenium\webdriver\safari\webdriver.py` 第19行還有這樣的代碼： ```python try: import http.client as http_client except ImportError: import httplib as http_client ``` 而項目目錄剛好有 D:\wamp64\www\xiak-DataValley\test_xiak\http.py 這個文件，所以被當作模塊引用了，看來在不了解所使用的庫時，不能隨便定義文件模塊啊。這有點類似于php中的依賴注入，但這個竟然是隱式的。 ---- [非常詳細的字符編碼講解，ASCII、GB2312、GBK、Unicode、UTF-8等知識點都有](https://www.bilibili.com/video/BV1gZ4y1x7p7) [非常生動的Python2和Python3的編解碼講解](https://www.bilibili.com/video/BV1XK4y1t7D4) [HelloDjango - django REST framework 教程_追夢人物的博客](https://www.zmrenwu.com/courses/django-rest-framework-tutorial/) ---- last update: 2020-11-23 10:27:18