擴大流量的設計決策 · HighScalability 中文示例

# 擴大流量的設計決策 > 原文： [http://highscalability.com/blog/2013/10/28/design-decisions-for-scaling-your-high-traffic-feeds.html](http://highscalability.com/blog/2013/10/28/design-decisions-for-scaling-your-high-traffic-feeds.html) ![](https://img.kancloud.cn/2c/1a/2c1a6d79a1d83458f3561c63fb2b068c_350x266.png) *Fashiolista.com 的創始人/ CTO Thierry Schellenbach 的來賓帖子，在 Twitter 和 Github 上關注@tschellenbach* [Fashiolista](http://www.fashiolista.com/) 最初是我們在側面開發的一項業余愛好項目。我們絕對不知道它將成長為最大的在線時尚社區之一。整個第一個版本的開發花費了大約兩周的時間，而我們的 feed 實現卻非常簡單。自那時以來，我們已經走了很長一段路，我想分享我們在縮放 Feed 系統方面的經驗。 Feed 是 Pinterest，Instagram，Wanelo 和 Fashiolista 等許多大型初創公司的核心組成部分。在 Fashiolista，供稿系統為[統一供稿](http://www.fashiolista.com/feed/?feed_type=F)，[聚合供稿](http://www.fashiolista.com/feed/?feed_type=A)和[通知](http://www.fashiolista.com/my_style/notification/)系統提供動力。本文將說明在擴展 Feed 時遇到的麻煩以及與構建自己的解決方案有關的設計決策。隨著越來越多的應用程序依賴它們，了解這些供稿系統如何工作的基礎至關重要。此外，我們開源了 [Feedly](https://github.com/tschellenbach/Feedly) ，它是為 Feed 提供動力的 Python 模塊。在適用的情況下，我將參考如何使用它來快速構建自己的供稿解決方案。 ## 提要簡介縮放進紙系統的問題已被廣泛討論，但讓我首先闡明基礎知識。該解決方案旨在使諸如 Facebook 新聞提要，Twitter 流或 [Fashiolista](http://www.fashiolista.com/) feed 之類的頁面在高流量條件下工作。所有這些系統的共同點在于，它們顯示了您所關注的人的活動。在我們的案例中，我們基于活動流的標準將活動對象基于[。活動示例包括“ Thierry 在 Fashiolista 上的列表中添加了項目”或“ Tommaso 發了一條消息”。](http://activitystrea.ms/specs/atom/1.0/) There are two strategies for building feed systems: 1. **拉**，讀取時將在其中收集提要 2. **按下**，在寫入期間所有提要均已預先計算。大多數實際的實時應用程序將結合使用這兩種方法。將活動推向所有關注者的過程稱為扇出。 ## 歷史&背景 Fashiolista 的 Feed 系統經歷了三項重大的重新設計。第一個版本在 PostgreSQL 數據庫上工作，第二個版本在 Redis 上使用，第三個當前版本在 Cassandra 上運行。為了讓您了解這些解決方案何時以及為什么會失敗，我將簡要介紹一些歷史。 ### 第一部分-數據庫我們的第一個設置只是查詢了一個 PostgreSQL 數據庫，看起來像這樣選擇* from love，其中 user_id 位于（...）最令人驚訝的是該系統的強大功能。我們通過了 1M 的愛，并且繼續發揮作用，在達到 5M 的愛之后，它仍然繼續起作用。我們敢打賭，經過一千萬次戀愛，它會破裂，但它一直保持平穩運行。進行了一些數據庫調整，但是這個簡單的系統在大約 1 億個愛人和 1 百萬個用戶中占有優勢。大約在那時，該解決方案的性能開始波動。通常，它可以繼續工作，但是對于某些用戶而言，延遲會飆升至數秒。閱讀了許多有關 feed 設計的文章之后，我們使用 Redis 構建了 Feedly 的第一個版本。 ### 第二部分-Redis & Feedly 我們的第二種方法是在 Redis 中為每個用戶存儲一個提要。當您喜歡某件商品時，此活動就會散發給所有關注者。我們使用了一些巧妙的技巧來保持較低的內存使用率，這將在下一部分中介紹。 Redis 真的很容易設置和維護。我們使用 Nydus 在數臺 Redis 機器上進行了分片，并使用 [Sentinel](http://redis.io/topics/sentinel) 進行自動故障轉移。（當前，我們建議使用 [Twemproxy](https://github.com/twitter/twemproxy) 代替 [Nydus](https://github.com/disqus/nydus) ） Redis was a good solution, but several reasons made us look for an alternative. Firstly we wanted to support multiple content types, which made falling back to the database harder and increased our storage requirements. In addition the database fallback we were still relying on became slower as our community grew. Both these problems could be addressed by storing more data in Redis, but doing so was prohibitively expensive. ### 第三部分-Cassandra & Feedly We briefly looked at [HBase](http://hbase.apache.org/), [DynamoDB](http://aws.amazon.com/dynamodb/) and [Cassandra 2.0](http://cassandra.apache.org/download/). Eventually we opted for Cassandra since it has few moving parts, is used by Instagram and is supported by [Datastax](http://www.datastax.com/). Fashiolista currently does a full push flow for the flat feed and a combination between push and pull for the aggregated feed. We store a maximum of 3600 activities in your feed, which currently takes up 2.12TB of storage. The fluctuations caused by high profile users are mitigated using priority queues, overcapacity and auto scaling. ## 飼料設計我認為我們的歷史可以很好地代表其他公司所經歷的過程。當需要構建自己的供稿系統（希望使用 Feedly）時，需要考慮一些重要的設計決策。 ### 1.歸一化與歸一化 There are two approaches you can choose here. The feed with the activities by people you follow either contains the ids of the activities (normalized) or the full activity (denormalized). 僅存儲 ID 會大大減少您的內存使用量。但是，這也意味著每次加載 Feed 時都要再次訪問數據存儲。要考慮的因素之一是非規范化時多久復制一次數據。如果您要構建通知系統或新聞提要系統，則將產生巨大的變化。對于通知，您通常會針對發生的每個操作通知 1 或 2 個用戶。但是，對于基于關注的 Feed 系統，該操作可能會復制到數千個關注者中。此外，最佳選擇實際上取決于您的存儲后端。使用 Redis，您需要注意內存使用情況。另一方面，Cassandra 具有足夠的存儲空間，但是如果您對數據進行規范化，則很難使用。對于通知供稿和基于 Cassandra 構建的供稿，我們建議對數據進行非規范化。對于基于 Redis 的供稿，您希望最大程度地減少內存使用并使數據規范化。 [Feedly](https://github.com/tschellenbach/Feedly) 允許您選擇自己喜歡的方法。 ### 2.基于生產者的選擇性扇出 In their paper [Yahoo’s Adam Silberstein et.al.](http://research.yahoo.com/files/sigmod278-silberstein.pdf ) argue for a selective approach for pushing to users feeds. A similar approach is currently used by [Twitter](http://highscalability.com/blog/2013/7/8/the-architecture-twitter-uses-to-deal-with-150m-active-users.html ). The basic idea is that doing fan-outs for high profile users can cause a high and sudden load on your systems. This means you need a lot of spare capacity on standby to keep things real time (or be ok with waiting for autoscaling to kick in). In their paper they suggest reducing the load caused by these high profile users by selectively disabling fanouts. Twitter has apparently seen great performance improvements by disabling fanout for high profile users and instead loading their tweets during reads (pull). ### 3.基于消費者的選擇性扇出 Another possibility of selective fanouts is to only fan-out to your active users. (Say users who logged in during the last week). At Fashiolista we used a modified version of this idea, by storing the last 3600 activities for active users, but only 180 activities for inactive ones. After those 180 items we would fallback to the database. This setup slows down the experience for inactive users returning to your site, but can really reduce your memory usage and costs. Silberstein 等。通過查看消費者和生產者對，使事情變得更有趣。基本直覺是，在以下情況下，推送方法最有意義： Fortunately such a complex solution hasn’t been needed yet for Fashiolista. I’m curious at which scale you need such solutions. Be sure to let us know in the comments. ### 4.優先事項 An alternative strategy is using different priorities for the fan-out tasks. You simply mark fan-outs to active users as high priority and fan-outs to inactive users as low priority. At Fashiolista we keep a higher buffer of capacity for the high priority cluster allowing us to cope with spikes. For the low priority cluster we rely on autoscaling and spot instances. In practice this means that less active user’s feeds may occasionally lag a few minutes behind. Using priorities reduces the impact high profile users have on system load. It doesn’t solve the problem, but greatly reduces the magnitude of the spikes. ### 5\. Redis vs 卡桑德拉 Both Fashiolista and [Instagram](http://planetcassandra.org/blog/post/instagram-making-the-switch-to-cassandra-from-redis-75-instasavings) started out with Redis but eventually switched to Cassandra. I would recommend starting with Redis as it’s just so much easier to setup and maintain. 但是，Redis 有一些限制。您的所有數據都需要存儲在 RAM 中，這最終會變得昂貴。此外，Redis 中不支持分片。這意味著您必須滾動自己的系統才能在節點之間進行分片。（ [Twemproxy](https://github.com/twitter/twemproxy) 是一個很好的選擇）。在節點之間進行分片非常容易，但是在添加或刪除節點時移動數據很麻煩。您可以通過使用 Redis 作為緩存并回退到數據庫來解決這些限制。一旦難以回退到數據庫，我會考慮從 Redis 遷移到 Cassandra。 Cassandra Python 生態系統仍在快速變化。 [CQLEngine](https://github.com/cqlengine/cqlengine) 和 [Python-Driver](https://github.com/datastax/python-driver) 都是出色的項目，但是它們需要一些[分叉](https://github.com/tbarbugli/cqlengine)才能一起工作。如果您選擇 Cassandra，則需要準備好花一些時間來了解 Cassandra 并為客戶庫做貢獻。 ## 結論 There are many factors to take into account when building your own feed solution. Which storage backend do you choose, how do you handle spikes in load caused by high profile users and to what extend do you denormalize your data? I hope this blogpost has provided you with some inspiration. [Feedly](https://github.com/tschellenbach/Feedly) 不會為您做出任何選擇。這是一個構建供稿系統的框架，由您自己決定哪種方法最適合您的用例。有關 Feedly 的介紹，請閱讀[自述文件](https://github.com/tschellenbach/Feedly)或本教程，以構建 [Pinterest](http://www.mellowmorning.com/2013/10/18/scalable-pinterest-tutorial-feedly-redis/) [樣式](http://www.mellowmorning.com/2013/10/18/scalable-pinterest-tutorial-feedly-redis/) [應用程序](http://www.mellowmorning.com/2013/10/18/scalable-pinterest-tutorial-feedly-redis/)。如果您嘗試一下，請務必在遇到問題時通知我們。請注意，只有在數據庫中獲得數百萬個活動后，才需要解決此問題。在 Fashiolista，簡單的數據庫解決方案使我們接觸到了最初的 1 億愛人和 100 萬用戶。要了解有關 Feed 設計的更多信息，我強烈建議您閱讀我們基于 Feedly 的一些文章： * [Yahoo Research Paper](http://research.yahoo.com/files/sigmod278-silberstein.pdf) * [Twitter 2013](http://highscalability.com/blog/2013/7/8/the-architecture-twitter-uses-to-deal-with-150m-active-users.html) 基于 Redis，具有后備功能 * [Instagram 上的 Cassandra](http://planetcassandra.org/blog/post/instagram-making-the-switch-to-cassandra-from-redis-75-instasavings) * [Etsy Feed 縮放比例](http://www.slideshare.net/danmckinley/etsy-activity-feeds-architecture/) * [Facebook 歷史記錄](http://www.infoq.com/presentations/Facebook-Software-Stack) * [Django 項目，具有良好的命名約定。](http://justquick.github.com/django-activity-stream/) （但僅限數據庫） * [http://activitystrea.ms/specs/atom/1.0/](http://activitystrea.ms/specs/atom/1.0/) （演員，動詞??，賓語，目標） * [有關最佳做法的 Quora 帖子](http://www.quora.com/What-are-best-practices-for-building-something-like-a-News-Feed?q=news+feeds) * [Quora 擴展了社交網絡供稿](http://www.quora.com/What-are-the-scaling-issues-to-keep-in-mind-while-developing-a-social-network-feed) * [Redis 紅寶石示例](http://web.archive.org/web/20130525202810/http://blog.waxman.me/how-to-build-a-fast-news-feed-in-redis) * [FriendFeed 方法](http://backchannel.org/blog/friendfeed-schemaless-mysql) * [Thoonk 設置](http://blog.thoonk.com/) * [Twitter 的方法](http://www.slideshare.net/nkallen/q-con-3770885) 很棒的文章！我經常想知道為什么要對大規模流立即做出扇出的決定。 [Collabinate](http://www.collabinate.com "Collabinate") 活動供稿 API 使用了 Rene Pickhardt 令人驚嘆的 [Graphity 算法](http://www.rene-pickhardt.de/graphity-an-efficient-graph-model-for-retrieving-the-top-k-news-feeds-for-users-in-social-networks "Graphity")，這是一種圖形數據庫支持的 Feed 算法，具有極高的吞吐量，且無需重復。它依靠圖數據庫通過 n 路合并（“拉”）完成所有操作。我想談談您在原始實現中看到的延遲峰值，Redis 遇到的內存利用率問題以及現在的情況。它將真正幫助我們的未來客戶過渡到 Collabinate。我會大聲疾呼，以談論有關 Feedly 的更多信息。是否考慮使用 Postgresql 復制？一個寫 DB 和多個從數據庫為只讀。 Feedly 開源軟件包一直在快速增長。目前，我們正在對 Feedly 背后的團隊構建的托管解決方案進行 Beta 測試。您可以在 https://getstream.io 上找到它我一直想學習如何做到這一點！你是最好的！感謝您提供這個值得推薦的帖子。