X ने recommendation feed algorithm सार्वजनिक किया

xguru · 2026-01-21T12:33:02+09:00

X का "For You" फ़ीड एक machine learning-आधारित recommendation system है, जिसे personalized content recommendation की गुणवत्ता बढ़ाने के लिए विकसित किया गया है फ़ीड को follow किए गए accounts (Thunder) और non-follow content (Phoenix Retrieval), इन दो sources को मिलाकर बनाया जाता है सभी candidate posts का मूल्यांकन Grok-आधारित Transformer model Phoenix से किया जाता है, और उसी के आधार पर final ranking निकाली जाती है यह model हर post के लिए engagement probability का अनुमान लगाता है सिस्टम में manually designed सभी features और अधिकांश heuristic algorithms हटा दिए गए हैं उपयोगकर्ता की activity history (likes, replies, shares) का विश्लेषण करके relevant content की पहचान की जाती है सिस्टम आर्किटेक्चर Home Mixer पूरे pipeline को coordinate करने वाली orchestration layer है इसमें user behavior history और following information इकट्ठा करने वाला Query Hydration चरण शामिल है candidate collection, data enrichment, filtering, scoring और final selection तक की पूरी प्रक्रिया को manage करता है gRPC-आधारित ScoredPostsService के जरिए user-specific sorted posts लौटाता है Thunder एक in-memory store है, जो Kafka event stream के जरिए real-time posts इकट्ठा करता है original posts, replies/reposts और video posts के लिए user-specific storage manage करता है request करने वाले user द्वारा follow किए गए accounts से "In-network" post candidates देता है follow किए गए accounts के latest posts को ultra-low latency के साथ उपलब्ध कराता है external database access के बिना sub-millisecond level lookup performance हासिल करता है Phoenix recommendation का मुख्य ML component है, जो Retrieval और Ranking के दो चरणों से बना है retrieval: Two-Tower model का उपयोग करके user features/engagement history embeddings और post embeddings के बीच similarity calculate कर Top-K posts retrieve किए जाते हैं ranking: Transformer with Candidate Isolation architecture इस तरह डिज़ाइन किया गया है कि हर candidate का स्वतंत्र रूप से मूल्यांकन हो यह user context (engagement history) और candidate post को input के रूप में लेता है और हर post के लिए like, reply, repost, click जैसी multiple actions की probability predict करता है Candidate Pipeline एक reusable recommendation pipeline framework है Source, Hydrator, Filter, Scorer, Selector जैसी Traits define करता है built-in parallel execution, error handling और logging के साथ scalability और stability सुनिश्चित करता है काम करने का तरीका pipeline चरण 1. query data लाना : user की recent activity history और metadata (जैसे follow list) लाई जाती है 2. candidate discovery : candidate posts को इन sources से खोजा जाता है Thunder : follow किए गए accounts (network के अंदर) के recent posts Phoenix Retrieval : global corpus से machine learning द्वारा खोजे गए posts (network के बाहर) 3. नीचे दी गई जानकारी का उपयोग करके candidate hydration : core post data (text, media आदि) author information (user name, verification status) video length (video post के मामले में) subscription status 4. pre-score filter : जो posts नीचे की शर्तें पूरी करते हैं, उन्हें हटा दिया जाता है duplicates बहुत पुराने posts viewer के अपने posts blocked/muted accounts के posts muted keywords वाले posts पहले देखे जा चुके या हाल में serve किए गए posts non-subscribable content 5. scoring method : कई scorers को क्रमवार लागू किया जाता है Phoenix Scorer : Phoenix Transformer model से machine learning prediction results लाता है weighted score calculator : prediction results को मिलाकर final relevance score निकालता है author diversity score calculator : diversity के लिए duplicate authors के score impact को कम करता है OON scorer : out-of-network content के score को adjust करता है 6. selection : score के अनुसार sort करके top K candidates चुने जाते हैं 7. post-selection procedure : candidate posts का final validation किया जाता है scoring और ranking Phoenix model के prediction values को weighted sum approach से combine किया जाता है positive actions (likes, shares आदि) को अधिक weight दिया जाता है, जबकि negative actions (blocks, reports आदि) पर score घटाया जाता है two-stage filtering score calculation से पहले filtering: duplicates, threshold से पुराने posts, अपने posts, unavailable paid content, पहले देखे या serve किए गए posts, blocked accounts, muted keywords आदि हटाए जाते हैं selection के बाद filtering: deleted posts, spam posts, violent content, graphic posts, और उसी conversation thread की कई duplicate branches हटाई जाती हैं मुख्य design principles manual feature engineering हटाना, ताकि Transformer सीधे user behavior sequence से सीख सके candidate isolation के जरिए consistent scoring और आसान caching hash-based embeddings के जरिए retrieval और ranking दोनों में embedding lookup के लिए multiple hash functions का उपयोग multi-action prediction के जरिए एक single "relevance" score की जगह कई actions के लिए prediction modular pipeline architecture pipeline execution और monitoring को business logic से अलग करता है independent stages की parallel execution और errors का उचित handling नए sources, hydration, filters और scorers आसानी से जोड़े जा सकते हैं लाइसेंस Apache License 2.0

X का "For You" फ़ीड एक machine learning-आधारित recommendation system है, जिसे personalized content recommendation की गुणवत्ता बढ़ाने के लिए विकसित किया गया है
फ़ीड को follow किए गए accounts (Thunder) और non-follow content (Phoenix Retrieval), इन दो sources को मिलाकर बनाया जाता है
सभी candidate posts का मूल्यांकन Grok-आधारित Transformer model Phoenix से किया जाता है, और उसी के आधार पर final ranking निकाली जाती है
- यह model हर post के लिए engagement probability का अनुमान लगाता है
सिस्टम में manually designed सभी features और अधिकांश heuristic algorithms हटा दिए गए हैं
उपयोगकर्ता की activity history (likes, replies, shares) का विश्लेषण करके relevant content की पहचान की जाती है

सिस्टम आर्किटेक्चर

Home Mixer पूरे pipeline को coordinate करने वाली orchestration layer है
- इसमें user behavior history और following information इकट्ठा करने वाला Query Hydration चरण शामिल है
- candidate collection, data enrichment, filtering, scoring और final selection तक की पूरी प्रक्रिया को manage करता है
- gRPC-आधारित ScoredPostsService के जरिए user-specific sorted posts लौटाता है
Thunder एक in-memory store है, जो Kafka event stream के जरिए real-time posts इकट्ठा करता है
- original posts, replies/reposts और video posts के लिए user-specific storage manage करता है
- request करने वाले user द्वारा follow किए गए accounts से "In-network" post candidates देता है
- follow किए गए accounts के latest posts को ultra-low latency के साथ उपलब्ध कराता है
- external database access के बिना sub-millisecond level lookup performance हासिल करता है
Phoenix recommendation का मुख्य ML component है, जो Retrieval और Ranking के दो चरणों से बना है
- retrieval: Two-Tower model का उपयोग करके user features/engagement history embeddings और post embeddings के बीच similarity calculate कर Top-K posts retrieve किए जाते हैं
- ranking: Transformer with Candidate Isolation architecture इस तरह डिज़ाइन किया गया है कि हर candidate का स्वतंत्र रूप से मूल्यांकन हो
  - यह user context (engagement history) और candidate post को input के रूप में लेता है
  - और हर post के लिए like, reply, repost, click जैसी multiple actions की probability predict करता है
Candidate Pipeline एक reusable recommendation pipeline framework है
- Source, Hydrator, Filter, Scorer, Selector जैसी Traits define करता है
- built-in parallel execution, error handling और logging के साथ scalability और stability सुनिश्चित करता है

काम करने का तरीका

pipeline चरण
- 1. query data लाना : user की recent activity history और metadata (जैसे follow list) लाई जाती है
- 2. candidate discovery : candidate posts को इन sources से खोजा जाता है
  - Thunder : follow किए गए accounts (network के अंदर) के recent posts
  - Phoenix Retrieval : global corpus से machine learning द्वारा खोजे गए posts (network के बाहर)
- 3. नीचे दी गई जानकारी का उपयोग करके candidate hydration :
  - core post data (text, media आदि)
  - author information (user name, verification status)
  - video length (video post के मामले में)
  - subscription status
- 4. pre-score filter : जो posts नीचे की शर्तें पूरी करते हैं, उन्हें हटा दिया जाता है
  - duplicates
  - बहुत पुराने posts
  - viewer के अपने posts
  - blocked/muted accounts के posts
  - muted keywords वाले posts
  - पहले देखे जा चुके या हाल में serve किए गए posts
  - non-subscribable content
- 5. scoring method : कई scorers को क्रमवार लागू किया जाता है
  - Phoenix Scorer : Phoenix Transformer model से machine learning prediction results लाता है
  - weighted score calculator : prediction results को मिलाकर final relevance score निकालता है
  - author diversity score calculator : diversity के लिए duplicate authors के score impact को कम करता है
  - OON scorer : out-of-network content के score को adjust करता है
- 6. selection : score के अनुसार sort करके top K candidates चुने जाते हैं
- 7. post-selection procedure : candidate posts का final validation किया जाता है
scoring और ranking
- Phoenix model के prediction values को weighted sum approach से combine किया जाता है
- positive actions (likes, shares आदि) को अधिक weight दिया जाता है, जबकि negative actions (blocks, reports आदि) पर score घटाया जाता है
two-stage filtering
- score calculation से पहले filtering: duplicates, threshold से पुराने posts, अपने posts, unavailable paid content, पहले देखे या serve किए गए posts, blocked accounts, muted keywords आदि हटाए जाते हैं
- selection के बाद filtering: deleted posts, spam posts, violent content, graphic posts, और उसी conversation thread की कई duplicate branches हटाई जाती हैं

मुख्य design principles

manual feature engineering हटाना, ताकि Transformer सीधे user behavior sequence से सीख सके
candidate isolation के जरिए consistent scoring और आसान caching
hash-based embeddings के जरिए retrieval और ranking दोनों में embedding lookup के लिए multiple hash functions का उपयोग
multi-action prediction के जरिए एक single "relevance" score की जगह कई actions के लिए prediction
modular pipeline architecture
- pipeline execution और monitoring को business logic से अलग करता है
- independent stages की parallel execution और errors का उचित handling
- नए sources, hydration, filters और scorers आसानी से जोड़े जा सकते हैं

लाइसेंस

Apache License 2.0

X ने recommendation feed algorithm सार्वजनिक किया

सिस्टम आर्किटेक्चर

काम करने का तरीका

pipeline चरण

scoring और ranking

two-stage filtering

मुख्य design principles

लाइसेंस

संबंधित पढ़ाई

2 टिप्पणियां