10-分布式任務和配置 · Elixir 程序設計

10-分布式任務及配置 ================ 在這最后一章中，我們將回到`:kv`應用程序，給它添加一個路由層，使之可以根據桶的名字，在各個節點間分發請求。路由層會接收一個如下形式的路由表： [{?a..?m, :"foo@computer-name"}, {?n..?z, :"bar@computer-name"}] 路由者（負責轉發請求的角色，可能是個節點）將根據桶名字的第一個字節查這個路由表，然后根據路由表所示將用戶對桶的請求發給相應的節點。比如，根據上表，某個桶名字第一個字母是“a”（`?a`表示字母“a”的Unicode碼），那么對它的請求會被路由到`foo@computer-name`這個節點去。如果節點處理了路由請求，那么路由過程結束。如果節點自己也有個路由表，會根據該路由表把請求又轉發給了其它相應節點。如果最終沒有節點接收和處理請求，則報出異常。你會問，為啥我們不簡單地命令路由表中查出來的節點來直接處理數據請求，而是將路由請求傳給它？當一個路由表像上面那個例子一樣簡單的話，這樣做比較簡單。但是，考慮到程序的規模越來越大，會將大路由表分解成小塊，分開存放在不同節點。這種情況下，還是用分發路由請求的方式簡單些。也許在某些時刻，節點`foo@computer-name`將只負責路由請求，不再處理桶數據請求，它承載的桶都會分給其它節點。這種方式，節點`bar@computer-name`都不需要知道這個變化。 > 注意：本章中我們會在同一臺機器上使用兩個節點。你當然可以用同一網絡中得不同的機器，但是這種方式需要做一些準備。首先你需要確保所有的機器上都有一個名叫`~/.erlang.cookie`的文件。其次，你要保證 [epmd](http://www.erlang.org/doc/man/epmd.html) 在一個可訪問的端口運行（你可以執行`epmd -d` 查看debug信息來確定這點）。第三，如果你想學習更多關于分布式編程的知識，我們推薦[這篇文章](http://learnyousomeerlang.com/distribunomicon)。 ## 我們最初版本的分布式代碼 Elixir內置了連接節點及于期間交換信息的工具。事實上，在一個分布式的環境中進行的消息的發送和接收，和之前學習的進程的內容并無區別：因為Elixir的進程是 *位置透明* 的。意思是當我們發送消息的時候，它不管請求是在當前節點還是在別的節點，虛擬機都會傳遞消息。為了執行分布式的代碼，我們需要用某個名字啟動<abbr title="Virtual Machine">虛擬機</abbr>。名字可以使簡短的（當在同一個網絡內）或是較長的（需要附上計算機地址）。讓我們啟動一個新IEx會話： ```bash $ iex --sname foo ``` You can see now the prompt is slightly different and shows the node name followed by the computer name: Interactive Elixir - press Ctrl+C to exit (type h() ENTER for help) iex(foo@jv)1> My computer is named `jv`, so I see `foo@jv` in the example above, but you will get a different result. We will use `jv@computer-name` in the following examples and you should update them accordingly when trying out the code. 讓我們在shell中定義一個名叫`Hello`的模塊： ```iex iex> defmodule Hello do ...> def world, do: IO.puts "hello world" ...> end ``` If you have another computer on the same network with both Erlang and Elixir installed, you can start another shell on it. If you don't, you can simply start another IEx session in another terminal. In either case, give it the short name of `bar`: ```bash $ iex --sname bar ``` Note that inside this new IEx session, we cannot access `Hello.world/0`: ```iex iex> Hello.world ** (UndefinedFunctionError) undefined function: Hello.world/0 Hello.world() ``` However we can spawn a new process on `foo@computer-name` from `bar@computer-name`! Let's give it a try (where `@computer-name` is the one you see locally): ```iex iex> Node.spawn_link :"foo@computer-name", fn -> Hello.world end #PID<9014.59.0> hello world ``` Elixir spawned a process on another node and returned its pid. The code then executed on the other node where the `Hello.world/0` function exists and invoked that function. Note that the result of "hello world" was printed on the current node `bar` and not on `foo`. In other words, the message to be printed was sent back from `foo` to `bar`. This happens because the process spawned on the other node (`foo`) still has the group leader of the current node (`bar`). We have briefly talked about group leaders in the [IO chapter](/getting-started/io-and-the-file-system.html#processes-and-group-leaders). We can send and receive message from the pid returned by `Node.spawn_link/2` as usual. Let's try a quick ping-pong example: ```iex iex> pid = Node.spawn_link :"foo@computer-name", fn -> ...> receive do ...> {:ping, client} -> send client, :pong ...> end ...> end #PID<9014.59.0> iex> send pid, {:ping, self} {:ping, #PID<0.73.0>} iex> flush :pong :ok ``` From our quick exploration, we could conclude that we should simply use `Node.spawn_link/2` to spawn processes on a remote node every time we need to do a distributed computation. However we have learned throughout this guide that spawning processes outside of supervision trees should be avoided if possible, so we need to look for other options. There are three better alternatives to `Node.spawn_link/2` that we could use in our implementation: 1. We could use Erlang's [:rpc](http://www.erlang.org/doc/man/rpc.html) module to execute functions on a remote node. Inside the `bar@computer-name` shell above, you can call `:rpc.call(:"foo@computer-name", Hello, :world, [])` and it will print "hello world" 2. We could have a server running on the other node and send requests to that node via the [GenServer](/docs/stable/elixir/GenServer.html) API. For example, you can call a remote named server using `GenServer.call({name, node}, arg)` or simply passing the remote process PID as first argument 3. We could use [tasks](/docs/stable/elixir/Task.html), which we have learned about in [a previous chapter](/getting-started/mix-otp/task-and-gen-tcp.html), as they can be spawned on both local and remote nodes The options above have different properties. Both `:rpc` and using a GenServer would serialize your requests on a single server, while tasks are effectively running asynchronously on the remote node, with the only serialization point being the spawning done by the supervisor. For our routing layer, we are going to use tasks, but feel free to explore the other alternatives too. ## async/await So far we have explored tasks that are started and run in isolation, with no regard for their return value. However, sometimes it is useful to run a task to compute a value and read its result later on. For this, tasks also provide the `async/await` pattern: ```elixir task = Task.async(fn -> compute_something_expensive end) res = compute_something_else() res + Task.await(task) ``` `async/await` provides a very simple mechanism to compute values concurrently. Not only that, `async/await` can also be used with the same [`Task.Supervisor`](/docs/stable/elixir/Task.Supervisor.html) we have used in previous chapters. We just need to call `Task.Supervisor.async/2` instead of `Task.Supervisor.start_child/2` and use `Task.await/2` to read the result later on. ## Distributed tasks Distributed tasks are exactly the same as supervised tasks. The only difference is that we pass the node name when spawning the task on the supervisor. Open up `lib/kv/supervisor.ex` from the `:kv` application. Let's add a task supervisor to the tree: ```elixir supervisor(Task.Supervisor, [[name: KV.RouterTasks]]), ``` Now, let's start two named nodes again, but inside the `:kv` application: ```bash $ iex --sname foo -S mix $ iex --sname bar -S mix ``` From inside `bar@computer-name`, we can now spawn a task directly on the other node via the supervisor: ```iex iex> task = Task.Supervisor.async {KV.RouterTasks, :"foo@computer-name"}, fn -> ...> {:ok, node()} ...> end %Task{pid: #PID<12467.88.0>, ref: #Reference<0.0.0.400>} iex> Task.await(task) {:ok, :"foo@computer-name"} ``` Our first distributed task is straightforward: it simply gets the name of the node the task is running on. With this knowledge in hand, let's finally write the routing code. ## 路由層 Create a file at `lib/kv/router.ex` with the following contents: ```elixir defmodule KV.Router do @doc """ Dispatch the given `mod`, `fun`, `args` request to the appropriate node based on the `bucket`. """ def route(bucket, mod, fun, args) do # Get the first byte of the binary first = :binary.first(bucket) # Try to find an entry in the table or raise entry = Enum.find(table, fn {enum, node} -> first in enum end) || no_entry_error(bucket) # If the entry node is the current node if elem(entry, 1) == node() do apply(mod, fun, args) else sup = {KV.RouterTasks, elem(entry, 1)} Task.Supervisor.async(sup, fn -> KV.Router.route(bucket, mod, fun, args) end) |> Task.await() end end defp no_entry_error(bucket) do raise "could not find entry for #{inspect bucket} in table #{inspect table}" end @doc """ The routing table. """ def table do # Replace computer-name with your local machine name. [{?a..?m, :"foo@computer-name"}, {?n..?z, :"bar@computer-name"}] end end ``` Let's write a test to verify our router works. Create a file named `test/kv/router_test.exs` containing: ```elixir defmodule KV.RouterTest do use ExUnit.Case, async: true test "route requests across nodes" do assert KV.Router.route("hello", Kernel, :node, []) == :"foo@computer-name" assert KV.Router.route("world", Kernel, :node, []) == :"bar@computer-name" end test "raises on unknown entries" do assert_raise RuntimeError, ~r/could not find entry/, fn -> KV.Router.route(<<0>>, Kernel, :node, []) end end end ``` The first test simply invokes `Kernel.node/0`, which returns the name of the current node, based on the bucket names "hello" and "world". According to our routing table so far, we should get `foo@computer-name` and `bar@computer-name` as responses, respectively. The second test just checks that the code raises for unknown entries. In order to run the first test, we need to have two nodes running. Let's restart the node named `bar`, which is going to be used by tests. This time we'll need to run the node in the `test` environment, to ensure the compiled code being run is exactly the same as that used in the tests themselves: ```bash $ MIX_ENV=test iex --sname bar -S mix ``` And now run tests with: ```bash $ elixir --sname foo -S mix test ``` Our test should successfully pass. Excellent! ## Test filters and tags Although our tests pass, our testing structure is getting more complex. In particular, running tests with only `mix test` causes failures in our suite, since our test requires a connection to another node. Luckily, ExUnit ships with a facility to tag tests, allowing us to run specific callbacks or even filter tests altogether based on those tags. All we need to do to tag a test is simply call `@tag` before the test name. Back to `test/kv/router_test.exs`, let's add a `:distributed` tag: ```elixir @tag :distributed test "route requests across nodes" do ``` Writing `@tag :distributed` is equivalent to writing `@tag distributed: true`. With the test properly tagged, we can now check if the node is alive on the network and, if not, we can exclude all distributed tests. Open up `test/test_helper.exs` inside the `:kv` application and add the following: ```elixir exclude = if Node.alive?, do: [], else: [distributed: true] ExUnit.start(exclude: exclude) ``` Now run tests with `mix test`: ```bash $ mix test Excluding tags: [distributed: true] ....... Finished in 0.1 seconds (0.1s on load, 0.01s on tests) 7 tests, 0 failures ``` This time all tests passed and ExUnit warned us that distributed tests were being excluded. If you run tests with `$ elixir --sname foo -S mix test`, one extra test should run and successfully pass as long as the `bar@computer-name` node is available. The `mix test` command also allows us to dynamically include and exclude tags. For example, we can run `$ mix test --include distributed` to run distributed tests regardless of the value set in `test/test_helper.exs`. We could also pass `--exclude` to exclude a particular tag from the command line. Finally, `--only` can be used to run only tests with a particular tag: ```bash $ elixir --sname foo -S mix test --only distributed ``` You can read more about filters, tags and the default tags in [`ExUnit.Case` module documentation](/docs/stable/ex_unit/ExUnit.Case.html). ## Application environment and configuration So far we have hardcoded the routing table into the `KV.Router` module. However, we would like to make the table dynamic. This allows us not only to configure development/test/production, but also to allow different nodes to run with different entries in the routing table. There is a feature of <abbr title="Open Telecom Platform">OTP</abbr> that does exactly that: the application environment. Each application has an environment that stores the application's specific configuration by key. For example, we could store the routing table in the `:kv` application environment, giving it a default value and allowing other applications to change the table as needed. Open up `apps/kv/mix.exs` and change the `application/0` function to return the following: ```elixir def application do [applications: [], env: [routing_table: []], mod: {KV, []}] end ``` We have added a new `:env` key to the application. It returns the application default environment, which has an entry of key `:routing_table` and value of an empty list. It makes sense for the application environment to ship with an empty table, as the specific routing table depends on the testing/deployment structure. In order to use the application environment in our code, we just need to replace `KV.Router.table/0` with the definition below: ```elixir @doc """ The routing table. """ def table do Application.get_env(:kv, :routing_table) end ``` We use `Application.get_env/2` to read the entry for `:routing_table` in `:kv`'s environment. You can find more information and other functions to manipulate the app environment in the [Application module](/docs/stable/elixir/Application.html). Since our routing table is now empty, our distributed test should fail. Restart the apps and re-run tests to see the failure: ```bash $ iex --sname bar -S mix $ elixir --sname foo -S mix test --only distributed ``` The interesting thing about the application environment is that it can be configured not only for the current application, but for all applications. Such configuration is done by the `config/config.exs` file. For example, we can configure IEx default prompt to another value. Just open `apps/kv/config/config.exs` and add the following to the end: ```elixir config :iex, default_prompt: ">>>" ``` Start IEx with `iex -S mix` and you can see that the IEx prompt has changed. This means we can configure our `:routing_table` directly in the `config/config.exs` file as well: ```elixir # Replace computer-name with your local machine nodes. config :kv, :routing_table, [{?a..?m, :"foo@computer-name"}, {?n..?z, :"bar@computer-name"}] ``` Restart the nodes and run distributed tests again. Now they should all pass. Each application has its own `config/config.exs` file and they are not shared in any way. Configuration can also be set per environment. Read the contents of the config file for the `:kv` application for more information on how to do so. Since config files are not shared, if you run tests from the umbrella root, they will fail because the configuration we just added to `:kv` is not available there. However, if you open up `config/config.exs` in the umbrella, it has instructions on how to import config files from children applications. You just need to invoke: ```elixir import_config "../apps/kv/config/config.exs" ``` The `mix run` command also accepts a `--config` flag, which allows configuration files to be given on demand. This could be used to start different nodes, each with its own specific configuration (for example, different routing tables). Overall, the built-in ability to configure applications and the fact that we have built our software as an umbrella application gives us plenty of options when deploying the software. We can: * deploy the umbrella application to a node that will work as both TCP server and key-value storage * deploy the `:kv_server` application to work only as a TCP server as long as the routing table points only to other nodes * deploy only the `:kv` application when we want a node to work only as storage (no TCP access) As we add more applications in the future, we can continue controlling our deploy with the same level of granularity, cherry-picking which applications with which configuration are going to production. We can also consider building multiple releases with a tool like [exrm](https://github.com/bitwalker/exrm), which will package the chosen applications and configuration, including the current Erlang and Elixir installations, so we can deploy the application even if the runtime is not pre-installed on the target system. Finally, we have learned some new things in this chapter, and they could be applied to the `:kv_server` application as well. We are going to leave the next steps as an exercise: * change the `:kv_server` application to read the port from its application environment instead of using the hardcoded value of 4040 * change and configure the `:kv_server` application to use the routing functionality instead of dispatching directly to the local `KV.Registry`. For `:kv_server` tests, you can make the routing table simply point to the current node itself ## 總結這一章我們創建了一個簡單的路由，并通過它探索了Elixir以及Erlang虛擬機的分布式特性，學習了如何配置路由表。這是《Elixir高級編程手冊（Mix和<abbr title="Open Telecom Platform">OTP</abbr>）》的最后一章。通過這本手冊，我們編寫了一個非常簡單的分布式鍵-值存儲程序，領略了許多重要概念，如通用服務器、事件管理者、監督者、任務、代理、應用程序等等。不僅如此，我們還為整個程序寫了測試代碼，熟悉了ExUnit，還學習了如何使用Mix構建工具來完成許許多多的工作。如果你要找一個生成環境能用的分布式鍵-值存儲，你一定要去看看[Riak](http://basho.com/riak/)，它也運行于Erlang <abbr title="Virtual Machine">VM</abbr>之上。Riak中，桶是有冗余的，以防止數據丟失。另外，它用[相容哈希（consistent hashing）](https://en.wikipedia.org/wiki/Consistent_hashing)而不是路由機制來匹配桶和節點。因為相容哈希算法可以減少因為桶名沖突而導致的不得不將桶遷移到新節點的開銷。