第?35?章?線程 · Linux C 一站式學習

# 第?35?章?線程 **目錄** + [1\. 線程的概念](ch35s01.html) + [2\. 線程控制](ch35s02.html) + [2.1\. 創建線程](ch35s02.html#id2895632) + [2.2\. 終止線程](ch35s02.html#id2896029) + [3\. 線程間同步](ch35s03.html) + [3.1\. mutex](ch35s03.html#id2896462) + [3.2\. Condition Variable](ch35s03.html#id2895424) + [3.3\. Semaphore](ch35s03.html#id2897332) + [3.4\. 其它線程間同步機制](ch35s03.html#id2897423) + [4\. 編程練習](ch35s04.html) ## 1.?線程的概念我們知道，進程在各自獨立的地址空間中運行，進程之間共享數據需要用`mmap`或者進程間通信機制，本節我們學習如何在一個進程的地址空間中執行多個線程。有些情況需要在一個進程中同時執行多個控制流程，這時候線程就派上了用場，比如實現一個圖形界面的下載軟件，一方面需要和用戶交互，等待和處理用戶的鼠標鍵盤事件，另一方面又需要同時下載多個文件，等待和處理從多個網絡主機發來的數據，這些任務都需要一個“等待-處理”的循環，可以用多線程實現，一個線程專門負責與用戶交互，另外幾個線程每個線程負責和一個網絡主機通信。以前我們講過，`main`函數和信號處理函數是同一個進程地址空間中的多個控制流程，多線程也是如此，但是比信號處理函數更加靈活，信號處理函數的控制流程只是在信號遞達時產生，在處理完信號之后就結束，而多線程的控制流程可以長期并存，操作系統會在各線程之間調度和切換，就像在多個進程之間調度和切換一樣。由于同一進程的多個線程共享同一地址空間，因此Text Segment、Data Segment都是共享的，如果定義一個函數，在各線程中都可以調用，如果定義一個全局變量，在各線程中都可以訪問到，除此之外，各線程還共享以下進程資源和環境： * 文件描述符表 * 每種信號的處理方式（`SIG_IGN`、`SIG_DFL`或者自定義的信號處理函數） * 當前工作目錄 * 用戶id和組id 但有些資源是每個線程各有一份的： * 線程id * 上下文，包括各種寄存器的值、程序計數器和棧指針 * 棧空間 * `errno`變量 * 信號屏蔽字 * 調度優先級我們將要學習的線程庫函數是由POSIX標準定義的，稱為POSIX thread或者pthread。在Linux上線程函數位于`libpthread`共享庫中，因此在編譯時要加上`-lpthread`選項。 ## 2.?線程控制 ### 2.1.?創建線程 ``` #include <pthread.h> int pthread_create(pthread_t *restrict thread, const pthread_attr_t *restrict attr, void *(*start_routine)(void*), void *restrict arg); ``` 返回值：成功返回0，失敗返回錯誤號。以前學過的系統函數都是成功返回0，失敗返回-1，而錯誤號保存在全局變量`errno`中，而pthread庫的函數都是通過返回值返回錯誤號，雖然每個線程也都有一個`errno`，但這是為了兼容其它函數接口而提供的，pthread庫本身并不使用它，通過返回值返回錯誤碼更加清晰。在一個線程中調用pthread_create()創建新的線程后，當前線程從pthread_create()返回繼續往下執行，而新的線程所執行的代碼由我們傳給`pthread_create`的函數指針`start_routine`決定。`start_routine`函數接收一個參數，是通過`pthread_create`的`arg`參數傳遞給它的，該參數的類型為`void *`，這個指針按什么類型解釋由調用者自己定義。`start_routine`的返回值類型也是`void *`，這個指針的含義同樣由調用者自己定義。`start_routine`返回時，這個線程就退出了，其它線程可以調用`pthread_join`得到`start_routine`的返回值，類似于父進程調用`wait(2)`得到子進程的退出狀態，稍后詳細介紹`pthread_join`。 `pthread_create`成功返回后，新創建的線程的id被填寫到`thread`參數所指向的內存單元。我們知道進程id的類型是`pid_t`，每個進程的id在整個系統中是唯一的，調用`getpid(2)`可以獲得當前進程的id，是一個正整數值。線程id的類型是`thread_t`，它只在當前進程中保證是唯一的，在不同的系統中`thread_t`這個類型有不同的實現，它可能是一個整數值，也可能是一個結構體，也可能是一個地址，所以不能簡單地當成整數用`printf`打印，調用`pthread_self(3)`可以獲得當前線程的id。 `attr`參數表示線程屬性，本章不深入討論線程屬性，所有代碼例子都傳`NULL`給`attr`參數，表示線程屬性取缺省值，感興趣的讀者可以參考[[APUE2e]](bi01.html#bibli.apue "Advanced Programming in the UNIX Environment")。首先看一個簡單的例子： ``` #include <stdio.h> #include <string.h> #include <stdlib.h> #include <pthread.h> #include <unistd.h> pthread_t ntid; void printids(const char *s) { pid_t pid; pthread_t tid; pid = getpid(); tid = pthread_self(); printf("%s pid %u tid %u (0x%x)\n", s, (unsigned int)pid, (unsigned int)tid, (unsigned int)tid); } void *thr_fn(void *arg) { printids(arg); return NULL; } int main(void) { int err; err = pthread_create(&ntid, NULL, thr_fn, "new thread: "); if (err != 0) { fprintf(stderr, "can't create thread: %s\n", strerror(err)); exit(1); } printids("main thread:"); sleep(1); return 0; } ``` 編譯運行結果如下： ``` $ gcc main.c -lpthread $ ./a.out main thread: pid 7398 tid 3084450496 (0xb7d8fac0) new thread: pid 7398 tid 3084446608 (0xb7d8eb90) ``` 可知在Linux上，`thread_t`類型是一個地址值，屬于同一進程的多個線程調用`getpid(2)`可以得到相同的進程號，而調用`pthread_self(3)`得到的線程號各不相同。由于`pthread_create`的錯誤碼不保存在`errno`中，因此不能直接用`perror(3)`打印錯誤信息，可以先用`strerror(3)`把錯誤碼轉換成錯誤信息再打印。如果任意一個線程調用了`exit`或`_exit`，則整個進程的所有線程都終止，由于從`main`函數`return`也相當于調用`exit`，為了防止新創建的線程還沒有得到執行就終止，我們在`main`函數`return`之前延時1秒，這只是一種權宜之計，即使主線程等待1秒，內核也不一定會調度新創建的線程執行，下一節我們會看到更好的辦法。思考題：主線程在一個全局變量`ntid`中保存了新創建的線程的id，如果新創建的線程不調用`pthread_self`而是直接打印這個`ntid`，能不能達到同樣的效果？ ### 2.2.?終止線程如果需要只終止某個線程而不終止整個進程，可以有三種方法： * 從線程函數`return`。這種方法對主線程不適用，從`main`函數`return`相當于調用`exit`。 * 一個線程可以調用`pthread_cancel`終止同一進程中的另一個線程。 * 線程可以調用`pthread_exit`終止自己。用`pthread_cancel`終止一個線程分同步和異步兩種情況，比較復雜，本章不打算詳細介紹，讀者可以參考[[APUE2e]](bi01.html#bibli.apue "Advanced Programming in the UNIX Environment")。下面介紹`pthread_exit`的和`pthread_join`的用法。 ``` #include <pthread.h> void pthread_exit(void *value_ptr); ``` `value_ptr`是`void *`類型，和線程函數返回值的用法一樣，其它線程可以調用`pthread_join`獲得這個指針。需要注意，`pthread_exit`或者`return`返回的指針所指向的內存單元必須是全局的或者是用`malloc`分配的，不能在線程函數的棧上分配，因為當其它線程得到這個返回指針時線程函數已經退出了。 ``` #include <pthread.h> int pthread_join(pthread_t thread, void **value_ptr); ``` 返回值：成功返回0，失敗返回錯誤號調用該函數的線程將掛起等待，直到id為`thread`的線程終止。`thread`線程以不同的方法終止，通過`pthread_join`得到的終止狀態是不同的，總結如下： * 如果`thread`線程通過`return`返回，`value_ptr`所指向的單元里存放的是`thread`線程函數的返回值。 * 如果`thread`線程被別的線程調用`pthread_cancel`異常終止掉，`value_ptr`所指向的單元里存放的是常數`PTHREAD_CANCELED`。 * 如果`thread`線程是自己調用`pthread_exit`終止的，`value_ptr`所指向的單元存放的是傳給`pthread_exit`的參數。如果對`thread`線程的終止狀態不感興趣，可以傳`NULL`給`value_ptr`參數。看下面的例子（省略了出錯處理）： ``` #include <stdio.h> #include <stdlib.h> #include <pthread.h> #include <unistd.h> void *thr_fn1(void *arg) { printf("thread 1 returning\n"); return (void *)1; } void *thr_fn2(void *arg) { printf("thread 2 exiting\n"); pthread_exit((void *)2); } void *thr_fn3(void *arg) { while(1) { printf("thread 3 writing\n"); sleep(1); } } int main(void) { pthread_t tid; void *tret; pthread_create(&tid, NULL, thr_fn1, NULL); pthread_join(tid, &tret); printf("thread 1 exit code %d\n", (int)tret); pthread_create(&tid, NULL, thr_fn2, NULL); pthread_join(tid, &tret); printf("thread 2 exit code %d\n", (int)tret); pthread_create(&tid, NULL, thr_fn3, NULL); sleep(3); pthread_cancel(tid); pthread_join(tid, &tret); printf("thread 3 exit code %d\n", (int)tret); return 0; } ``` 運行結果是： ``` $ ./a.out thread 1 returning thread 1 exit code 1 thread 2 exiting thread 2 exit code 2 thread 3 writing thread 3 writing thread 3 writing thread 3 exit code -1 ``` 可見在Linux的pthread庫中常數`PTHREAD_CANCELED`的值是-1。可以在頭文件`pthread.h`中找到它的定義： ``` #define PTHREAD_CANCELED ((void *) -1) ``` 一般情況下，線程終止后，其終止狀態一直保留到其它線程調用`pthread_join`獲取它的狀態為止。但是線程也可以被置為detach狀態，這樣的線程一旦終止就立刻回收它占用的所有資源，而不保留終止狀態。不能對一個已經處于detach狀態的線程調用`pthread_join`，這樣的調用將返回`EINVAL`。對一個尚未detach的線程調用`pthread_join`或`pthread_detach`都可以把該線程置為detach狀態，也就是說，不能對同一線程調用兩次`pthread_join`，或者如果已經對一個線程調用了`pthread_detach`就不能再調用`pthread_join`了。 ``` #include <pthread.h> int pthread_detach(pthread_t tid); ``` 返回值：成功返回0，失敗返回錯誤號。 ## 3.?線程間同步 ### 3.1.?mutex 多個線程同時訪問共享數據時可能會沖突，這跟前面講信號時所說的可重入性是同樣的問題。比如兩個線程都要把某個全局變量增加1，這個操作在某平臺需要三條指令完成： 1. 從內存讀變量值到寄存器 2. 寄存器的值加1 3. 將寄存器的值寫回內存假設兩個線程在多處理器平臺上同時執行這三條指令，則可能導致下圖所示的結果，最后變量只加了一次而非兩次。 **圖?35.1.?并行訪問沖突** ![并行訪問沖突](https://box.kancloud.cn/2016-04-02_56ff80da4c5be.png) 思考一下，如果這兩個線程在單處理器平臺上執行，能夠避免這樣的問題嗎？我們通過一個簡單的程序觀察這一現象。上圖所描述的現象從理論上是存在這種可能的，但實際運行程序時很難觀察到，為了使現象更容易觀察到，我們把上述三條指令做的事情用更多條指令來做： ``` val = counter; printf("%x: %d\n", (unsigned int)pthread_self(), val + 1); counter = val + 1; ``` 我們在“讀取變量的值”和“把變量的新值保存回去”這兩步操作之間插入一個`printf`調用，它會執行`write`系統調用進內核，為內核調度別的線程執行提供了一個很好的時機。我們在一個循環中重復上述操作幾千次，就會觀察到訪問沖突的現象。 ``` #include <stdio.h> #include <stdlib.h> #include <pthread.h> #define NLOOP 5000 int counter; /* incremented by threads */ void *doit(void *); int main(int argc, char **argv) { pthread_t tidA, tidB; pthread_create(&tidA, NULL, &doit, NULL); pthread_create(&tidB, NULL, &doit, NULL); /* wait for both threads to terminate */ pthread_join(tidA, NULL); pthread_join(tidB, NULL); return 0; } void *doit(void *vptr) { int i, val; /* * Each thread fetches, prints, and increments the counter NLOOP times. * The value of the counter should increase monotonically. */ for (i = 0; i < NLOOP; i++) { val = counter; printf("%x: %d\n", (unsigned int)pthread_self(), val + 1); counter = val + 1; } return NULL; } ``` 我們創建兩個線程，各自把`counter`增加5000次，正常情況下最后`counter`應該等于10000，但事實上每次運行該程序的結果都不一樣，有時候數到5000多，有時候數到6000多。 ``` $ ./a.out b76acb90: 1 b76acb90: 2 b76acb90: 3 b76acb90: 4 b76acb90: 5 b7eadb90: 1 b7eadb90: 2 b7eadb90: 3 b7eadb90: 4 b7eadb90: 5 b76acb90: 6 b76acb90: 7 b7eadb90: 6 b76acb90: 8 ... ``` 對于多線程的程序，訪問沖突的問題是很普遍的，解決的辦法是引入互斥鎖（Mutex，Mutual Exclusive Lock），獲得鎖的線程可以完成“讀-修改-寫”的操作，然后釋放鎖給其它線程，沒有獲得鎖的線程只能等待而不能訪問共享數據，這樣“讀-修改-寫”三步操作組成一個原子操作，要么都執行，要么都不執行，不會執行到中間被打斷，也不會在其它處理器上并行做這個操作。 Mutex用`pthread_mutex_t`類型的變量表示，可以這樣初始化和銷毀： ``` #include <pthread.h> int pthread_mutex_destroy(pthread_mutex_t *mutex); int pthread_mutex_init(pthread_mutex_t *restrict mutex, const pthread_mutexattr_t *restrict attr); pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER; ``` 返回值：成功返回0，失敗返回錯誤號。 `pthread_mutex_init`函數對Mutex做初始化，參數`attr`設定Mutex的屬性，如果`attr`為`NULL`則表示缺省屬性，本章不詳細介紹Mutex屬性，感興趣的讀者可以參考[[APUE2e]](bi01.html#bibli.apue "Advanced Programming in the UNIX Environment")。用`pthread_mutex_init`函數初始化的Mutex可以用`pthread_mutex_destroy`銷毀。如果Mutex變量是靜態分配的（全局變量或`static`變量），也可以用宏定義`PTHREAD_MUTEX_INITIALIZER`來初始化，相當于用`pthread_mutex_init`初始化并且`attr`參數為`NULL`。Mutex的加鎖和解鎖操作可以用下列函數： ``` #include <pthread.h> int pthread_mutex_lock(pthread_mutex_t *mutex); int pthread_mutex_trylock(pthread_mutex_t *mutex); int pthread_mutex_unlock(pthread_mutex_t *mutex); ``` 返回值：成功返回0，失敗返回錯誤號。一個線程可以調用pthread_mutex_lock獲得Mutex，如果這時另一個線程已經調用pthread_mutex_lock獲得了該Mutex，則當前線程需要掛起等待，直到另一個線程調用pthread_mutex_unlock釋放Mutex，當前線程被喚醒，才能獲得該Mutex并繼續執行。如果一個線程既想獲得鎖，又不想掛起等待，可以調用pthread_mutex_trylock，如果Mutex已經被另一個線程獲得，這個函數會失敗返回EBUSY，而不會使線程掛起等待。現在我們用Mutex解決先前的問題： ``` #include <stdio.h> #include <stdlib.h> #include <pthread.h> #define NLOOP 5000 int counter; /* incremented by threads */ pthread_mutex_t counter_mutex = PTHREAD_MUTEX_INITIALIZER; void *doit(void *); int main(int argc, char **argv) { pthread_t tidA, tidB; pthread_create(&tidA, NULL, doit, NULL); pthread_create(&tidB, NULL, doit, NULL); /* wait for both threads to terminate */ pthread_join(tidA, NULL); pthread_join(tidB, NULL); return 0; } void *doit(void *vptr) { int i, val; /* * Each thread fetches, prints, and increments the counter NLOOP times. * The value of the counter should increase monotonically. */ for (i = 0; i < NLOOP; i++) { pthread_mutex_lock(&counter_mutex); val = counter; printf("%x: %d\n", (unsigned int)pthread_self(), val + 1); counter = val + 1; pthread_mutex_unlock(&counter_mutex); } return NULL; } ``` 這樣運行結果就正常了，每次運行都能數到10000。看到這里，讀者一定會好奇：Mutex的兩個基本操作lock和unlock是如何實現的呢？假設Mutex變量的值為1表示互斥鎖空閑，這時某個進程調用lock可以獲得鎖，而Mutex的值為0表示互斥鎖已經被某個線程獲得，其它線程再調用lock只能掛起等待。那么lock和unlock的偽代碼如下： ``` lock: if(mutex > 0){ mutex = 0; return 0; } else 掛起等待; goto lock; unlock: mutex = 1; 喚醒等待Mutex的線程; return 0; ``` unlock操作中喚醒等待線程的步驟可以有不同的實現，可以只喚醒一個等待線程，也可以喚醒所有等待該Mutex的線程，然后讓被喚醒的這些線程去競爭獲得這個Mutex，競爭失敗的線程繼續掛起等待。細心的讀者應該已經看出問題了：對Mutex變量的讀取、判斷和修改不是原子操作。如果兩個線程同時調用lock，這時Mutex是1，兩個線程都判斷mutex>0成立，然后其中一個線程置mutex=0，而另一個線程并不知道這一情況，也置mutex=0，于是兩個線程都以為自己獲得了鎖。為了實現互斥鎖操作，大多數體系結構都提供了swap或exchange指令，該指令的作用是把寄存器和內存單元的數據相交換，由于只有一條指令，保證了原子性，即使是多處理器平臺，訪問內存的總線周期也有先后，一個處理器上的交換指令執行時另一個處理器的交換指令只能等待總線周期。現在我們把lock和unlock的偽代碼改一下（以x86的xchg指令為例）： ``` lock: movb $0, %al xchgb %al, mutex if(al寄存器的內容 > 0){ return 0; } else 掛起等待; goto lock; unlock: movb $1, mutex 喚醒等待Mutex的線程; return 0; ``` unlock中的釋放鎖操作同樣只用一條指令實現，以保證它的原子性。也許還有讀者好奇，“掛起等待”和“喚醒等待線程”的操作如何實現？每個Mutex有一個等待隊列，一個線程要在Mutex上掛起等待，首先在把自己加入等待隊列中，然后置線程狀態為睡眠，然后調用調度器函數切換到別的線程。一個線程要喚醒等待隊列中的其它線程，只需從等待隊列中取出一項，把它的狀態從睡眠改為就緒，加入就緒隊列，那么下次調度器函數執行時就有可能切換到被喚醒的線程。一般情況下，如果同一個線程先后兩次調用lock，在第二次調用時，由于鎖已經被占用，該線程會掛起等待別的線程釋放鎖，然而鎖正是被自己占用著的，該線程又被掛起而沒有機會釋放鎖，因此就永遠處于掛起等待狀態了，這叫做死鎖（Deadlock）。另一種典型的死鎖情形是這樣：線程A獲得了鎖1，線程B獲得了鎖2，這時線程A調用lock試圖獲得鎖2，結果是需要掛起等待線程B釋放鎖2，而這時線程B也調用lock試圖獲得鎖1，結果是需要掛起等待線程A釋放鎖1，于是線程A和B都永遠處于掛起狀態了。不難想象，如果涉及到更多的線程和更多的鎖，有沒有可能死鎖的問題將會變得復雜和難以判斷。寫程序時應該盡量避免同時獲得多個鎖，如果一定有必要這么做，則有一個原則：如果所有線程在需要多個鎖時都按相同的先后順序（常見的是按Mutex變量的地址順序）獲得鎖，則不會出現死鎖。比如一個程序中用到鎖1、鎖2、鎖3，它們所對應的Mutex變量的地址是鎖1<鎖2<鎖3，那么所有線程在需要同時獲得2個或3個鎖時都應該按鎖1、鎖2、鎖3的順序獲得。如果要為所有的鎖確定一個先后順序比較困難，則應該盡量使用pthread_mutex_trylock調用代替pthread_mutex_lock調用，以免死鎖。 ### 3.2.?Condition Variable 線程間的同步還有這樣一種情況：線程A需要等某個條件成立才能繼續往下執行，現在這個條件不成立，線程A就阻塞等待，而線程B在執行過程中使這個條件成立了，就喚醒線程A繼續執行。在pthread庫中通過條件變量（Condition Variable）來阻塞等待一個條件，或者喚醒等待這個條件的線程。Condition Variable用`pthread_cond_t`類型的變量表示，可以這樣初始化和銷毀： ``` #include <pthread.h> int pthread_cond_destroy(pthread_cond_t *cond); int pthread_cond_init(pthread_cond_t *restrict cond, const pthread_condattr_t *restrict attr); pthread_cond_t cond = PTHREAD_COND_INITIALIZER; ``` 返回值：成功返回0，失敗返回錯誤號。和Mutex的初始化和銷毀類似，`pthread_cond_init`函數初始化一個Condition Variable，`attr`參數為`NULL`則表示缺省屬性，`pthread_cond_destroy`函數銷毀一個Condition Variable。如果Condition Variable是靜態分配的，也可以用宏定義`PTHEAD_COND_INITIALIZER`初始化，相當于用`pthread_cond_init`函數初始化并且`attr`參數為`NULL`。Condition Variable的操作可以用下列函數： ``` #include <pthread.h> int pthread_cond_timedwait(pthread_cond_t *restrict cond, pthread_mutex_t *restrict mutex, const struct timespec *restrict abstime); int pthread_cond_wait(pthread_cond_t *restrict cond, pthread_mutex_t *restrict mutex); int pthread_cond_broadcast(pthread_cond_t *cond); int pthread_cond_signal(pthread_cond_t *cond); ``` 返回值：成功返回0，失敗返回錯誤號。可見，一個Condition Variable總是和一個Mutex搭配使用的。一個線程可以調用`pthread_cond_wait`在一個Condition Variable上阻塞等待，這個函數做以下三步操作： 1. 釋放Mutex 2. 阻塞等待 3. 當被喚醒時，重新獲得Mutex并返回 `pthread_cond_timedwait`函數還有一個額外的參數可以設定等待超時，如果到達了`abstime`所指定的時刻仍然沒有別的線程來喚醒當前線程，就返回`ETIMEDOUT`。一個線程可以調用`pthread_cond_signal`喚醒在某個Condition Variable上等待的另一個線程，也可以調用`pthread_cond_broadcast`喚醒在這個Condition Variable上等待的所有線程。下面的程序演示了一個生產者-消費者的例子，生產者生產一個結構體串在鏈表的表頭上，消費者從表頭取走結構體。 ``` #include <stdlib.h> #include <pthread.h> #include <stdio.h> struct msg { struct msg *next; int num; }; struct msg *head; pthread_cond_t has_product = PTHREAD_COND_INITIALIZER; pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER; void *consumer(void *p) { struct msg *mp; for (;;) { pthread_mutex_lock(&lock); while (head == NULL) pthread_cond_wait(&has_product, &lock); mp = head; head = mp->next; pthread_mutex_unlock(&lock); printf("Consume %d\n", mp->num); free(mp); sleep(rand() % 5); } } void *producer(void *p) { struct msg *mp; for (;;) { mp = malloc(sizeof(struct msg)); mp->num = rand() % 1000 + 1; printf("Produce %d\n", mp->num); pthread_mutex_lock(&lock); mp->next = head; head = mp; pthread_mutex_unlock(&lock); pthread_cond_signal(&has_product); sleep(rand() % 5); } } int main(int argc, char *argv[]) { pthread_t pid, cid; srand(time(NULL)); pthread_create(&pid, NULL, producer, NULL); pthread_create(&cid, NULL, consumer, NULL); pthread_join(pid, NULL); pthread_join(cid, NULL); return 0; } ``` 執行結果如下： ``` $ ./a.out Produce 744 Consume 744 Produce 567 Produce 881 Consume 881 Produce 911 Consume 911 Consume 567 Produce 698 Consume 698 ``` #### 習題 1、在本節的例子中，生產者和消費者訪問鏈表的順序是LIFO的，請修改程序，把訪問順序改成FIFO。 ### 3.3.?Semaphore Mutex變量是非0即1的，可看作一種資源的可用數量，初始化時Mutex是1，表示有一個可用資源，加鎖時獲得該資源，將Mutex減到0，表示不再有可用資源，解鎖時釋放該資源，將Mutex重新加到1，表示又有了一個可用資源。信號量（Semaphore）和Mutex類似，表示可用資源的數量，和Mutex不同的是這個數量可以大于1。本節介紹的是POSIX semaphore庫函數，詳見sem_overview(7)，這種信號量不僅可用于同一進程的線程間同步，也可用于不同進程間的同步。 ``` #include <semaphore.h> int sem_init(sem_t *sem, int pshared, unsigned int value); int sem_wait(sem_t *sem); int sem_trywait(sem_t *sem); int sem_post(sem_t * sem); int sem_destroy(sem_t * sem); ``` semaphore變量的類型為sem_t，sem_init()初始化一個semaphore變量，value參數表示可用資源的數量，pshared參數為0表示信號量用于同一進程的線程間同步，本節只介紹這種情況。在用完semaphore變量之后應該調用sem_destroy()釋放與semaphore相關的資源。調用sem_wait()可以獲得資源，使semaphore的值減1，如果調用sem_wait()時semaphore的值已經是0，則掛起等待。如果不希望掛起等待，可以調用sem_trywait()。調用sem_post()可以釋放資源，使semaphore的值加1，同時喚醒掛起等待的線程。上一節生產者－消費者的例子是基于鏈表的，其空間可以動態分配，現在基于固定大小的環形隊列重寫這個程序： ``` #include <stdlib.h> #include <pthread.h> #include <stdio.h> #include <semaphore.h> #define NUM 5 int queue[NUM]; sem_t blank_number, product_number; void *producer(void *arg) { int p = 0; while (1) { sem_wait(&blank_number); queue[p] = rand() % 1000 + 1; printf("Produce %d\n", queue[p]); sem_post(&product_number); p = (p+1)%NUM; sleep(rand()%5); } } void *consumer(void *arg) { int c = 0; while (1) { sem_wait(&product_number); printf("Consume %d\n", queue[c]); queue[c] = 0; sem_post(&blank_number); c = (c+1)%NUM; sleep(rand()%5); } } int main(int argc, char *argv[]) { pthread_t pid, cid; sem_init(&blank_number, 0, NUM); sem_init(&product_number, 0, 0); pthread_create(&pid, NULL, producer, NULL); pthread_create(&cid, NULL, consumer, NULL); pthread_join(pid, NULL); pthread_join(cid, NULL); sem_destroy(&blank_number); sem_destroy(&product_number); return 0; } ``` #### 習題 1、本節和上一節的例子給出一個重要的提示：用Condition Variable可以實現Semaphore。請用Condition Variable實現Semaphore，然后用自己實現的Semaphore重寫本節的程序。 ### 3.4.?其它線程間同步機制如果共享數據是只讀的，那么各線程讀到的數據應該總是一致的，不會出現訪問沖突。只要有一個線程可以改寫數據，就必須考慮線程間同步的問題。由此引出了讀者寫者鎖（Reader-Writer Lock）的概念，Reader之間并不互斥，可以同時讀共享數據，而Writer是獨占的（exclusive），在Writer修改數據時其它Reader或Writer不能訪問數據，可見Reader-Writer Lock比Mutex具有更好的并發性。用掛起等待的方式解決訪問沖突不見得是最好的辦法，因為這樣畢竟會影響系統的并發性，在某些情況下解決訪問沖突的問題可以盡量避免掛起某個線程，例如Linux內核的Seqlock、RCU（read-copy-update）等機制。關于這些同步機制的細節，有興趣的讀者可以參考[[APUE2e]](bi01.html#bibli.apue "Advanced Programming in the UNIX Environment")和[[ULK]](bi01.html#bibli.ulk "Understanding the Linux Kernel")。 ## 4.?編程練習哲學家就餐問題。這是由計算機科學家Dijkstra提出的經典死鎖場景。原版的故事里有五個哲學家(不過我們寫的程序可以有N個哲學家)，這些哲學家們只做兩件事－－思考和吃飯，他們思考的時候不需要任何共享資源，但是吃飯的時候就必須使用餐具，而餐桌上的餐具是有限的，原版的故事里，餐具是叉子，吃飯的時候要用兩把叉子把面條從碗里撈出來。很顯然把叉子換成筷子會更合理，所以：一個哲學家需要兩根筷子才能吃飯。現在引入問題的關鍵：這些哲學家很窮，只買得起五根筷子。他們坐成一圈，兩個人的中間放一根筷子。哲學家吃飯的時候必須同時得到左手邊和右手邊的筷子。如果他身邊的任何一位正在使用筷子，那他只有等著。假設哲學家的編號是A、B、C、D、E，筷子編號是1、2、3、4、5，哲學家和筷子圍成一圈如下圖所示： **圖?35.2.?哲學家問題** ![哲學家問題](https://box.kancloud.cn/2016-04-02_56ff80da5feca.png) 每個哲學家都是一個單獨的線程，每個線程循環做以下動作：思考rand()%10秒，然后先拿左手邊的筷子再拿右手邊的筷子（筷子這種資源可以用mutex表示），有任何一邊拿不到就一直等著，全拿到就吃飯rand()%10秒，然后放下筷子。編寫程序仿真哲學家就餐的場景： ``` Philosopher A fetches chopstick 5 Philosopher B fetches chopstick 1 Philosopher B fetches chopstick 2 Philosopher D fetches chopstick 3 Philosopher B releases chopsticks 1 2 Philosopher A fetches chopstick 1 Philosopher C fetches chopstick 2 Philosopher A releases chopsticks 5 1 ... ``` 分析一下，這個過程有沒有可能產生死鎖？調用usleep(3)函數可以實現微秒級的延時，試著用usleep(3)加快仿真的速度，看能不能觀察到死鎖現象。然后修改上述算法避免產生死鎖。