第十八課: Billboard和粒子 · OpenGL中文教程

# 第十八課：Billbard和粒子公告板是3D世界中的2D元素。它既不是最頂層的2D菜單，也不是可以隨意轉動的3D平面，而是介于兩者之間的一種元素，比如游戲中的血條。公告板的獨特之處在于：它位于某個特定位置，朝向是自動計算的，這樣它就能始終面向相機（觀察者）。 ## 方案1:2D法 2D法十分簡單。只需計算出點在屏幕空間的坐標，然后在該處顯示2D文本（參見第十一課）即可。 ~~~ // Everything here is explained in Tutorial 3 ! There's nothing new. glm::vec4 BillboardPos_worldspace(x,y,z, 1.0f); glm::vec4 BillboardPos_screenspace = ProjectionMatrix * ViewMatrix * BillboardPos_worldspace; BillboardPos_screenspace /= BillboardPos_screenspace.w; if (BillboardPos_screenspace.z < 0.0f){ // Object is behind the camera, don't display it. } ~~~ 就這么搞定了！ 2D法優點是簡單易行，無論點與相機距離遠近，公告板始終保持大小不變。但此法總是把文本顯示在最頂層，有可能會遮擋其他物體，影響渲染效果。 ## 方案2:3D法與2D法相比，3D法常常效果更好，也沒復雜多少。我們的目的就是無論相機如何移動，都要讓公告板網格正對著相機： ![2a.gif](https://box.kancloud.cn/2015-11-02_5636f30b3cb20.gif) 可將此視為模型矩陣的構造問題之簡化版。基本思路是將公告板的各角落置于（存疑待查）The idea is that each corner of the billboard is at the center position, displaced by the camera’s up and right vectors : ![principle](https://box.kancloud.cn/2015-11-02_5636f30b719ca.png) 當然，我們僅僅知道世界空間中的公告板中心位置，因此還需要相機在世界空間中的up/right向量。在相機空間，相機的up向量為(0,1,0)。要把up向量變換到世界空間，只需乘以觀察矩陣的逆矩陣（由相機空間變換至世界空間的矩陣）。用數學公式表示即： CameraRight_worldspace = {ViewMatrix[0][0], ViewMatrix[1][0], ViewMatrix[2][0]}CameraUp_worldspace = {ViewMatrix[0][1], ViewMatrix[1][1], ViewMatrix[2][1]} 接下來，頂點坐標的計算就很簡單了： ~~~ vec3 vertexPosition_worldspace = particleCenter_wordspace + CameraRight_worldspace * squareVertices.x * BillboardSize.x + CameraUp_worldspace * squareVertices.y * BillboardSize.y; ~~~ - `particleCenter_worldspace`顧名思義即公告板的中心位置，以vec3類型的uniform變量表示。 - `squareVertices`是原始的網格。左頂點的`squareVertices.x`為-0.5（存疑待查），which are thus moved towars the left of the camera (because of the *CameraRight_worldspace) - `BillboardSize`是公告板大小，以世界單位為單位，uniform變量。效果如下。怎么樣，是不是很簡單？ ![2.gif](https://box.kancloud.cn/2015-11-02_5636f30b81ed3.gif) 為了保證內容完整性，這里給出`squareVertices`的數據： ~~~ // The VBO containing the 4 vertices of the particles. static const GLfloat g_vertex_buffer_data[] = { -0.5f, -0.5f, 0.0f, 0.5f, -0.5f, 0.0f, -0.5f, 0.5f, 0.0f, 0.5f, 0.5f, 0.0f, }; ~~~ ## 方案3：固定大小3D法正如上面所看到的，公告板大小隨著相機與之的距離變化。有些情況下的確需要這樣的效果，但血條這類公告板則需要保持大小不變。 ~~~ vertexPosition_worldspace = particleCenter_wordspace; // Get the screen-space position of the particle's center gl_Position = VP * vec4(vertexPosition_worldspace, 1.0f); // Here we have to do the perspective division ourselves. gl_Position /= gl_Position.w; // Move the vertex in directly screen space. No need for CameraUp/Right_worlspace here. gl_Position.xy += squareVertices.xy * vec2(0.2, 0.05); ~~~ ![3.gif](https://box.kancloud.cn/2015-11-02_5636f30bb2797.gif) ## 方案4：限制垂直旋轉法一些引擎以公告板表示遠處的樹和燈。不過，這些樹可不能任意轉向，**必須**是豎直的。So you need an hybrid system that rotates only around one axis.（存疑待查）這個方案作為練習留給讀者。 # 粒子（Particles）與實例（Instancing）粒子與3D公告板很類似。不過，粒子有如下四個特點： - 數量較大 - 可以運動 - 有生有死 - 半透明伴隨這些特點而來的是一系列問題。本課僅介紹**其中一種**解決方案，其他解決方案還多著呢…… ## 一大波粒子正在接近中…… 首先想到的思路就是套用上一課的代碼，調用`glDrawArrays`逐個繪制粒子。這可不是個好辦法。因為這種思路意味著你那锃光瓦亮的GTX 512顯卡一次只能繪制**一個**四邊形（很明顯，性能損失高達99%）。就這么一個接一個地繪制公告板。顯然，我們得一次性繪制所有的粒子。方法有很多種，如下是其中三種： - 生成一個VBO，將所有粒子置于其中。簡單，有效，在各種平臺上均可行。 - 使用geometry shader。這不在本教程范圍內，主要是因為50%的機器不支持該特性。 - 使用實例（instancing）。大部分機器都支持該特性。本課將采用第三種方法。這種方法兼具性能優勢和普適性，更重要的是，如果此法行得通，那第一種方法也就輕而易舉了。 ## 實例 “實例”的意思是以一個網格（比如本課中由兩個三角形組成的四邊形）為藍本，創建多個該網格的實例。具體地講，我們通過如下一些buffer實現instancing： - 一部分用于描述原始網格 - 一部分用于描述各實例的特性這些buffer的內容可自行選擇。在我們這個簡單的例子包含了： - 一個網格頂點buffer。沒有index buffer，因此一共有6個`vec3`變量，構成兩個三角形，進而組合成一個四邊形。 - 一個buffer存儲粒子的中心。 - 一個buffer存儲粒子的顏色。這些buffer都是標準buffer。創建方式如下： ~~~ // The VBO containing the 4 vertices of the particles. // Thanks to instancing, they will be shared by all particles. static const GLfloat g_vertex_buffer_data[] = { -0.5f, -0.5f, 0.0f, 0.5f, -0.5f, 0.0f, -0.5f, 0.5f, 0.0f, 0.5f, 0.5f, 0.0f, }; GLuint billboard_vertex_buffer; glGenBuffers(1, &billboard_vertex_buffer); glBindBuffer(GL_ARRAY_BUFFER, billboard_vertex_buffer); glBufferData(GL_ARRAY_BUFFER, sizeof(g_vertex_buffer_data), g_vertex_buffer_data, GL_STATIC_DRAW); // The VBO containing the positions and sizes of the particles GLuint particles_position_buffer; glGenBuffers(1, &particles_position_buffer); glBindBuffer(GL_ARRAY_BUFFER, particles_position_buffer); // Initialize with empty (NULL) buffer : it will be updated later, each frame. glBufferData(GL_ARRAY_BUFFER, MaxParticles * 4 * sizeof(GLfloat), NULL, GL_STREAM_DRAW); // The VBO containing the colors of the particles GLuint particles_color_buffer; glGenBuffers(1, &particles_color_buffer); glBindBuffer(GL_ARRAY_BUFFER, particles_color_buffer); // Initialize with empty (NULL) buffer : it will be updated later, each frame. glBufferData(GL_ARRAY_BUFFER, MaxParticles * 4 * sizeof(GLubyte), NULL, GL_STREAM_DRAW); ~~~ 粒子更新方法如下： ~~~ // Update the buffers that OpenGL uses for rendering. // There are much more sophisticated means to stream data from the CPU to the GPU, // but this is outside the scope of this tutorial. // http://www.opengl.org/wiki/Buffer_Object_Streaming glBindBuffer(GL_ARRAY_BUFFER, particles_position_buffer); glBufferData(GL_ARRAY_BUFFER, MaxParticles * 4 * sizeof(GLfloat), NULL, GL_STREAM_DRAW); // Buffer orphaning, a common way to improve streaming perf. See above link for details. glBufferSubData(GL_ARRAY_BUFFER, 0, ParticlesCount * sizeof(GLfloat) * 4, g_particule_position_size_data); glBindBuffer(GL_ARRAY_BUFFER, particles_color_buffer); glBufferData(GL_ARRAY_BUFFER, MaxParticles * 4 * sizeof(GLubyte), NULL, GL_STREAM_DRAW); // Buffer orphaning, a common way to improve streaming perf. See above link for details. glBufferSubData(GL_ARRAY_BUFFER, 0, ParticlesCount * sizeof(GLubyte) * 4, g_particule_color_data); ~~~ 繪制之前還需綁定buffer。綁定方法如下： ~~~ // 1rst attribute buffer : vertices glEnableVertexAttribArray(0); glBindBuffer(GL_ARRAY_BUFFER, billboard_vertex_buffer); glVertexAttribPointer( 0, // attribute. No particular reason for 0, but must match the layout in the shader. 3, // size GL_FLOAT, // type GL_FALSE, // normalized? 0, // stride (void*)0 // array buffer offset ); // 2nd attribute buffer : positions of particles' centers glEnableVertexAttribArray(1); glBindBuffer(GL_ARRAY_BUFFER, particles_position_buffer); glVertexAttribPointer( 1, // attribute. No particular reason for 1, but must match the layout in the shader. 4, // size : x + y + z + size => 4 GL_FLOAT, // type GL_FALSE, // normalized? 0, // stride (void*)0 // array buffer offset ); // 3rd attribute buffer : particles' colors glEnableVertexAttribArray(2); glBindBuffer(GL_ARRAY_BUFFER, particles_color_buffer); glVertexAttribPointer( 2, // attribute. No particular reason for 1, but must match the layout in the shader. 4, // size : r + g + b + a => 4 GL_UNSIGNED_BYTE, // type GL_TRUE, // normalized? *** YES, this means that the unsigned char[4] will be accessible with a vec4 (floats) in the shader *** 0, // stride (void*)0 // array buffer offset ); ~~~ 繪制方法與以往有所不同。這次不使用`glDrawArrays`或者`glDrawElements`（如果原始網格有index buffer的話）。這次用的是`glDrawArraysInstanced`或者`glDrawElementsInstanced`，效果等同于調用`glDrawArrays`N次（N是最后一個參數，此例中即`ParticlesCount`）。 ~~~ glDrawArraysInstanced(GL_TRIANGLE_STRIP, 0, 4, ParticlesCount); ~~~ 有件事差點忘了。我們還沒告訴OpenGL哪個buffer是原始網格，哪些buffer是各實例的特性。調用`glVertexAttribDivisor`即可完成。有完整注釋的代碼如下： ~~~ // These functions are specific to glDrawArrays*Instanced*. // The first parameter is the attribute buffer we're talking about. // The second parameter is the "rate at which generic vertex attributes advance when rendering multiple instances" // http://www.opengl.org/sdk/docs/man/xhtml/glVertexAttribDivisor.xml glVertexAttribDivisor(0, 0); // particles vertices : always reuse the same 4 vertices -> 0 glVertexAttribDivisor(1, 1); // positions : one per quad (its center) -> 1 glVertexAttribDivisor(2, 1); // color : one per quad -> 1 // Draw the particules ! // This draws many times a small triangle_strip (which looks like a quad). // This is equivalent to : // for(i in ParticlesCount) : glDrawArrays(GL_TRIANGLE_STRIP, 0, 4), // but faster. glDrawArraysInstanced(GL_TRIANGLE_STRIP, 0, 4, ParticlesCount); ~~~ 如你所見，instancing是很靈活的，你可以將`AttribDivisor`設為任意整數。例如，'glVertexAttribDivisor(2, 10)'即設置后續10個實例都擁有相同的顏色。 ## 意義何在？意義在于如今我們只需在每幀中更新一個很小的buffer（粒子中心位置），而非整個網格。如此一來，帶寬利用效率提升了4倍。 ## 生與死于場景中其它對象不同的是，粒子的生死更替十分頻繁。我們得用一種速度相當快的方式來創建新粒子，拋棄舊粒子。`new Particle()`這種辦法顯然不夠好。 ## 創建新粒子首先得創建一個大的粒子容器： ~~~ // CPU representation of a particle struct Particle{ glm::vec3 pos, speed; unsigned char r,g,b,a; // Color float size, angle, weight; float life; // Remaining life of the particle. if < 0 : dead and unused. }; const int MaxParticles = 100000; Particle ParticlesContainer[MaxParticles]; ~~~ 接下來，我們得想辦法創建新粒子。如下的函數在`ParticleContainer`中線性搜索（聽起來有些暴力）新粒子。不過，它是從上次已知位置開始搜索的，因此一般很快就返回了。 ~~~ int LastUsedParticle = 0; // Finds a Particle in ParticlesContainer which isn't used yet. // (i.e. life < 0); int FindUnusedParticle(){ for(int i=LastUsedParticle; i<MaxParticles; i++){ if (ParticlesContainer[i].life < 0){ LastUsedParticle = i; return i; } } for(int i=0; i<LastUsedParticle; i++){ if (ParticlesContainer[i].life < 0){ LastUsedParticle = i; return i; } } return 0; // All particles are taken, override the first one } ~~~ 現在我們可以把`ParticlesContainer[particleIndex]`當中的`life`、`color`、`speed`和`position`設置成一些有趣的值。欲知詳情請看代碼，此處大有文章可作。我們比較關心的是每一幀中要生成多少粒子。這跟具體的應用有關，我們就設為每秒10000個（噢噢，略多啊）新粒子好了： ~~~ int newparticles = (int)(deltaTime*10000.0); ~~~ 記得把個數限定在一個固定范圍內： ~~~ // Generate 10 new particule each millisecond, // but limit this to 16 ms (60 fps), or if you have 1 long frame (1sec), // newparticles will be huge and the next frame even longer. int newparticles = (int)(deltaTime*10000.0); if (newparticles > (int)(0.016f*10000.0)) newparticles = (int)(0.016f*10000.0); ~~~ ## 刪除舊粒子這個需要一些技巧，參見下文=) ## 仿真主循環 `ParticlesContainer`同時容納了“活著的”和“死亡的”粒子，但發送到GPU的buffer僅含活著的粒子。所以，我們要遍歷每個粒子，看它是否是活著的，是否應該“處死”。如果一切正常，那就添加重力，最后將其拷貝到GPU上相應的buffer中。 ~~~ // Simulate all particles int ParticlesCount = 0; for(int i=0; i<MaxParticles; i++){ Particle& p = ParticlesContainer[i]; // shortcut if(p.life > 0.0f){ // Decrease life p.life -= delta; if (p.life > 0.0f){ // Simulate simple physics : gravity only, no collisions p.speed += glm::vec3(0.0f,-9.81f, 0.0f) * (float)delta * 0.5f; p.pos += p.speed * (float)delta; p.cameradistance = glm::length2( p.pos - CameraPosition ); //ParticlesContainer[i].pos += glm::vec3(0.0f,10.0f, 0.0f) * (float)delta; // Fill the GPU buffer g_particule_position_size_data[4*ParticlesCount+0] = p.pos.x; g_particule_position_size_data[4*ParticlesCount+1] = p.pos.y; g_particule_position_size_data[4*ParticlesCount+2] = p.pos.z; g_particule_position_size_data[4*ParticlesCount+3] = p.size; g_particule_color_data[4*ParticlesCount+0] = p.r; g_particule_color_data[4*ParticlesCount+1] = p.g; g_particule_color_data[4*ParticlesCount+2] = p.b; g_particule_color_data[4*ParticlesCount+3] = p.a; }else{ // Particles that just died will be put at the end of the buffer in SortParticles(); p.cameradistance = -1.0f; } ParticlesCount++; } } ~~~ 如下所示，效果看上去差不多了，不過還有一個問題…… ![particle_unsor](https://box.kancloud.cn/2015-11-02_5636f30be9653.png) ## 排序正如[第十課][1]中所講，你必須按從后往前的順序對半透明對象排序，方可獲得正確的混合效果。 ~~~ void SortParticles(){ std::sort(&ParticlesContainer[0], &ParticlesContainer[MaxParticles]); } ~~~ `std::sort`需要一個函數判斷粒子的在容器中的先后順序。重載`Particle::operator<`即可： ~~~ // CPU representation of a particle struct Particle{ ... bool operator<(Particle& that){ // Sort in reverse order : far particles drawn first. return this->cameradistance > that.cameradistance; } }; ~~~ 這樣`ParticleContainer`中的粒子就是排好序的了，顯示效果已經變正確了： ![particles_final](https://box.kancloud.cn/2015-11-02_5636f30c60d30.gif) ## 延伸課題 ## 動畫粒子你可以用紋理圖集（texture atlas）實現粒子的動畫效果。將各粒子的年齡和位置發送到GPU，按照[2D字體一課][2]的方法在shader中計算UV坐標，紋理圖集是這樣的： ![particleatlas](https://box.kancloud.cn/2015-11-02_5636f30cc5d3f.png) ## 處理多個粒子系統如果你需要多個粒子系統，有兩種方案可選：要么僅用一個粒子容器，要么每個粒子系統一個。如果選擇將**所有**粒子放在一個容器中，那么就能很好地對粒子進行排序。主要缺陷是所有的粒子都得使用同一個紋理。這個問題可借助紋理圖集加以解決。紋理圖集是一張包含所有紋理的大紋理，可通過UV坐標訪問各紋理，其使用和編輯并不是很方便。如果為每個粒子系統設置一個粒子容器，那么只能在各容器內部對粒子進行排序。這就導致一個問題：如果兩粒子系統相互重疊，我們就會看到瑕疵。不過，如果你的應用中不會出現兩粒子系統重疊的情況，那這就不是問題。當然，你也可以采用一種混合系統：若干個粒子系統，各自配備紋理圖集（足夠小，易于管理）。 ## 平滑粒子你很快就能發現一個常見的瑕疵：當粒子和幾何體相交時，粒子的邊界變得很明顯，十分難看： ![kjkj](https://box.kancloud.cn/2015-11-02_5636f30d05cc9.jpg) (image from [http://www.gamerendering.com/2009/09/16/soft-particles/](http://www.gamerendering.com/2009/09/16/soft-particles/) ) 一個通常采用的解決方法是測試當前繪制的片斷的深度值。如果該片斷的深度值是“較近”的，就將其淡出。然而，這就需要對Z-Buffer進行采樣。這在“正常”的Z-Buffer中是不可行的。你得將場景渲染到一個[渲染目標][3]。或者，你可以用`glBlitFrameBuffer`把Z-Buffer內容從一個幀緩沖拷貝到另一個。 [http://developer.download.nvidia.com/whitepapers/2007/SDK10/SoftParticles_hi.pdf](http://developer.download.nvidia.com/whitepapers/2007/SDK10/SoftParticles_hi.pdf) ## 提高填充率當前GPU的一個主要限制因素就是填充率：在16.6ms內可寫片段（像素）數量要足夠多，以達到60FPS。這是一個大問題。由于粒子一般需要**很高**的填充率，同一個片段要重復繪制10多次，每次都是不同的粒子。如果不這么做，最終效果就會出現上述瑕疵。在所有寫入的的片段中，很多都是毫無用處的：比如位于邊界上的片段。你的粒子紋理在邊界上通常是完全透明的，但粒子的網格卻仍然得繪制這些無用的片段，然后用與之前完全相同的值更新顏色緩沖。這個小工具能夠計算紋理的緊湊包圍網格（這個也就是用`glDrawArraysInstanced()`渲染的那個網格）： ![trimmer](https://box.kancloud.cn/2015-11-02_5636f30d197cc.jpg) [[http://www.humus.name/index.php?page=Cool&ID=8][4]。Emil](http://www.humus.name/index.php?page=Cool&ID=8][4]。Emil) Person的網站上也有很多精彩的文章。 ## 粒子物理效果有些應用中，你可能想讓粒子和世界產生一些交互。比如，粒子可以在撞到地面時反彈。比較簡單的做法是為每個粒子做光線投射（raycasting），投射方向為當前位置與未來位置形成的向量。我們將在[拾取教程][5]。但這種做法開銷太大了，你沒法做到在每一幀中為每個粒子做光線投射。根據你的具體應用，可以用一系列平面來近似幾何體（譯注：k-DOP），然后對這些平面做光線投射。你也可以采用真正的光線投射，將結果緩存起來，然后據此近似計算附近的碰撞（也可以兼用兩種方法）。另一種迥異的技術是將現有的Z-Buffer作為幾何體的粗略近似，在此之上進行粒子碰撞。這種方法效果“足夠好”，速度快。不過由于無法在CPU端訪問Z-Buffer(至少速度不夠快)，你得完全在GPU上進行仿真。因此，這種方法更加復雜。如下是一些相關文章：[[http://www.altdevblogaday.com/2012/06/19/hack-day-report/][6](http://www.altdevblogaday.com/2012/06/19/hack-day-report/][6)] [[http://www.gdcvault.com/search.php#&category=free&firstfocus=&keyword=Chris+Tchou’s%2BHalo%2BReach%2BEffects&conference_id=][7](http://www.gdcvault.com/search.php#&category=free&firstfocus=&keyword=Chris+Tchou’s%2BHalo%2BReach%2BEffects&conference_id=][7)] ## GPU仿真如上所述，你可以完全在GPU上模擬粒子的運動。你還是得在CPU端管理粒子的生命周期——至少在創建粒子時。可選方案很多，不過都不屬于本課程討論范圍。這里僅給出一些指引。 - 采用變換反饋（Transform Feedback）機制。Transform Feedback讓你能夠將頂點著色器的輸出結果存儲到GPU端的VBO中。把新位置存儲到這個VBO，然后在下一幀以這個VBO為起點，然后再將更新的位置存儲到前一個VBO中。原理相同但無需Transform Feedback的方法：將粒子的位置編碼到一張紋理中，然后利用渲染到紋理（Render-To-Texture）更新之。 - 采用通用GPU計算庫：CUDA或OpenCL。這些庫具有與OpenGL互操作的函數。 - 采用計算著色器Compute Shader。這是最漂亮的解決方案，不過只在較新的GPU上可用。 > 請注意，為了簡化問題，在本課的實現中`ParticleContainer`是在GPU buffer都更新之后再排序的。這使得粒子的排序變得不準確了（有一幀的延遲），不過不是太明顯。你可以把主循環拆分成仿真、排序兩部分，然后再更新，就可以解決這個問題。