बिना pre-training के ARC-AGI हल करना

(iliao2345.github.io)

1 पॉइंट द्वारा GN⁺ 2025-03-05 | 1 टिप्पणियां | WhatsApp पर शेयर करें

ARC-AGI जैसी समस्याओं में, जहाँ थोड़े उदाहरणों से नियम खोजने होते हैं, CompressARC pre-training, external dataset या बड़े पैमाने की search के बिना, केवल puzzle-specific inference-time learning से समाधान ढूँढता है
मुख्य विचार यह प्रयोग है कि अगर सही उत्तर सहित puzzle representation को छोटा बनाने के लिए lossless information compression objective को optimize किया जाए, तो intelligent behavior उभर सकता है
RTX 4070 पर प्रति puzzle लगभग 20 मिनट लगाकर इसने training set 34.75%, evaluation set 20% हासिल किया; खास बात यह है कि यह target puzzle को ही training data के रूप में उपयोग करने वाला neural network approach है
architecture को example order, color permutation, rotation और reflection के लिए equivariance पर केंद्रित करके design किया गया है, और यह कई ranks के tensors को समेटने वाली multitensor representation का उपयोग करता है
color-procedure mapping, filling, cropping, point connection और short movement में यह मजबूत है, लेकिन counting, long-range pattern extension, rotation/copying/resizing और agent planning अभी भी bottleneck बने हुए हैं

क्या सिर्फ compression से ARC-AGI हल किया जा सकता है: एक प्रयोग

मुख्य सवाल यह है कि क्या केवल lossless information compression से intelligent behavior बनाया जा सकता है
CompressARC ARC-AGI puzzles के लिए compression-based objective function पर ही काम करने वाली method है
इसमें तीन constraints रखे गए हैं
- कोई pre-training नहीं: model random initialize होता है और inference time पर train होता है
- कोई dataset नहीं: एक model केवल एक target ARC-AGI puzzle पर train होता है और एक answer output करता है
- कोई search नहीं: ज्यादातर अर्थों में search के बिना, केवल gradient descent का उपयोग होता है
परिणाम training set पर 34.75% और evaluation set पर 20% हैं, और हर puzzle में RTX 4070 पर लगभग 20 मिनट लगते हैं
इसे target puzzle को ही training data के रूप में उपयोग करने वाली ARC-AGI के लिए पहली neural network method बताया गया है

ARC-AGI problem setting

ARC-AGI 2019 में पेश किया गया AI benchmark है, जो थोड़े उदाहरणों से abstract rules infer करने और generalize करने की क्षमता को test करता है
हर puzzle कई input-output examples और एक test input देता है, और system को test output grid का अनुमान लगाना होता है
हर puzzle के लिए दो attempts की अनुमति होती है, और इनमें से एक सही हो तो 1 point मिलता है
output grid का size बदला जा सकता है और हर pixel का color चुना जा सकता है
puzzles ऐसे design किए गए हैं कि इंसान उन्हें तर्कसंगत तरीके से हल कर सके, लेकिन मशीनों के लिए वे ज्यादा कठिन हों
- औसत इंसान training set के 76.2% हल करता है
- human experts 98.5% हल करते हैं
400 training puzzles बाकी puzzles से आसान हैं और इन patterns को सीखने के उद्देश्य से हैं
- objectness: objects बिना कारण दिखाई या गायब नहीं होते
- goal-directedness: कुछ objects इरादे वाले agents की तरह behave करते हैं
- numbers and counting: object count, alignment, comparison, addition-subtraction जैसी basic math इस्तेमाल होती है
- geometry and topology: shapes के reflection, rotation, movement, deformation, combination, repetition, distance difference आदि शामिल हैं
ARC Prize की हाल की Kaggle competition में अधिकतम 1 million dollars से ज्यादा की prize money थी, और मुख्य prize उस method के लिए था जो restricted environment में 12 घंटे की computation से private 100 problems पर 85% हासिल करे

CompressARC कैसे काम करता है

CompressARC में कम bits में compress होने वाली representation ज्यादा accurate puzzle solution से जुड़ी होती है
system एक compressed representation खोजता है जो incomplete puzzle को completed puzzle में बदलती है, और इस representation को decompress करने पर puzzle और answer reconstruct हो जाएँ
neural network decoder की भूमिका निभाता है
- अलग encoder neural network नहीं है
- encoding को inference time पर decoder को train करने वाले gradient descent से implement किया जाता है
- optimized weights और input distribution settings puzzle और answer को समेटने वाली compressed bit representation की तरह काम करते हैं
standard machine learning representation में process इस तरह है
- ARC-AGI puzzle लिया जाता है
- puzzle के example count और observed colors की संख्या के अनुसार neural network f बनाया जाता है
- z ~ N(μ, Σ) random normal input लेकर सभी grids के pixel-wise color logits output किए जाते हैं
- ज्ञात grids के लिए cross-entropy sum minimize किया जाता है, और answer grid को ignore किया जाता है
- N(μ, Σ) को N(0,1) के करीब रखने के लिए KL divergence penalty लगाया जाता है
- training के दौरान generated answer grids को store किया जाता है, और सबसे ज्यादा बार आए answer को final prediction चुना जाता है
fθ को input-output pair order changes, color permutations, spatial rotations/reflections जैसी सामान्य augmentations के लिए equivariant design किया गया है

compression perspective से derivation

lossless compression में information को जितने कम bits में संभव हो represent करना होता है, लेकिन उस bit representation से original को exactly restore करना भी जरूरी होता है
ARC-AGI में आम तौर पर puzzle और answer pair को पूरा symbol मानकर compress करना चाहिए, लेकिन असल में answer encoder input के रूप में उपलब्ध नहीं है और puzzle generation distribution भी नहीं पता
माना गया है कि ARC-AGI dataset के लिए practically implementable bit-efficient compression system मौजूद है
distribution p न पता होने पर भी program f और input s की length len(f)+len(s) minimize करने वाला universal compressor सोचा जा सकता है
- decoder f(s) चलाकर original restore करता है
- algorithmic information theory के अनुसार यह original compressor से केवल f की length जितना ही अधिक inefficient हो सकता है
- व्यवहार में program space search करने वाला encoder practical नहीं है
CompressARC program space search के बजाय neural network forward pass को fixed program के रूप में चुनता है
- s weights θ, input z और output correction ε से मिलकर बनता है
- θ और z के code length को Relative Entropy Coding(REC) perspective से, और ε को arithmetic coding perspective से calculate किया जाता है
- output correction code length ज्ञात grids की total cross-entropy के बराबर हो जाती है
- z की code length KL(pz || qz) होती है, और qz = N(0,I) रखा जाता है
total code length VAE loss जैसी form में है
- reconstruction error
- z के लिए KL
- decoder regularization
CompressARC implementation में इस derivation से regularization के कुछ हिस्सों और equivariance तथा puzzles के बीच independence से जुड़े modifications शामिल हैं

Architecture: multitensor और equivariance

architecture की सबसे महत्वपूर्ण विशेषता equivariance है
- यदि input z transform होता है, तो output ARC-AGI puzzle भी उसी तरह transform होना चाहिए
- उदाहरण हैं input-output pairs की reordering, color shuffle, grid flip/rotation/reflection
design पहले पूरी तरह symmetric base architecture बनाता है, फिर जरूरी non-equivariant abilities देने वाली asymmetric layers जोड़कर अनावश्यक symmetries को एक-एक करके तोड़ता है
internal data multitensor नाम के format में flow करता है
- यह अलग-अलग ranks और shapes वाले tensors का bundle है
- dimensions अधिकतम [n_examples, n_colors, n_directions, height, width, n_channels] के subsets से बनती हैं
- channel dimension हमेशा बनी रहती है
- rules लागू करके multitensor में legal tensors की संख्या 18 तक घटाई जाती है
puzzle को [examples, colors, height, width, channel] tensor के रूप में represent किया जा सकता है
- channel input या output grid चुनने के लिए इस्तेमाल होता है
- width और height pixel positions को दिखाते हैं
- color dimension pixel color की one-hot representation रखती है
पूरा architecture इस flow का पालन करता है
- z distribution parameters से शुरुआत
- Decoding Layer
- Multitensor Communication, Softmax, Directional Cummax, Directional Shift, Directional Communication, Nonlinear, Normalization को 4 बार repeat करना
- Linear Heads से ARC-AGI puzzle distribution output करना

Performance results

training Adam से 2000 iterations तक की गई
- learning rate 0.01
- β1 = 0.5, β2 = 0.9
training set results
- 100 iteration: Pass@2 2.25%
- 500 iteration: Pass@2 27.5%
- 1000 iteration: Pass@2 31.75%
- 2000 iteration: Pass@2 34.75%
- 2000 iteration पर Pass@1000 52.75%
evaluation set results
- 100 iteration: Pass@2 1.25%
- 500 iteration: Pass@2 15%
- 1000 iteration: Pass@2 19.25%
- 2000 iteration: Pass@2 20%
- 2000 iteration पर Pass@1000 33.75%
ARC-AGI scoring में दो attempts की अनुमति होती है, इसलिए article का main result Pass@2 के आधार पर है

कौन-से puzzles हल हो सकते हैं और कौन-से कठिन हैं

CompressARC अपनी क्षमताओं के भीतर नियम समझने की कोशिश करता है, लेकिन जिन abilities की architecture में कमी है, वहाँ bottleneck में फँसता है
संभव tasks के examples इस प्रकार हैं
- अलग-अलग colors को अलग-अलग procedures से assign करना
- filling
- cropping
- 45-degree diagonals सहित point connection
- same color detection
- pixel adjacency पहचानना
- per-example color assignment
- shape parts पहचानना
- short-distance movement
कठिन tasks के examples भी स्पष्ट हैं
- दो colors को एक-दूसरे से map करना
- same operation को लगातार कई बार repeat करना
- movement, rotation, reflection, resizing, image duplication
- connectivity जैसी topological properties detect करना
- agent behavior planning और simulation
- long-range pattern extension
training puzzle 28e73c20 में edge से center तक pattern extend करना होता है; CompressARC short-distance extension करता है, लेकिन center के पास guesses पर निर्भर रहता है

Case: Color the Boxes

human solution में समझा जाता है कि input boxes में divided है और output में वे boxes colored होते हैं
- corners हमेशा black हैं
- center हमेशा magenta है
- side boxes का color direction पर निर्भर करता है: ऊपर red, नीचे blue, दाईं ओर green, बाईं ओर yellow
CompressARC की training progress step-by-step बदलती है
- 50 step: यह reflect करता है कि input की sky-blue rows/columns output में भी correspond करती हैं
- 150 step: output में पास-पास pixels के similar colors होने का pattern दिखता है
- 200 step: sky-blue boundaries से कटे बड़े color blobs और black corner blobs की नकल करता है
- 350 step: center के सापेक्ष direction के अनुसार box colors को अधिकतर सही करता है
- 1500 step: output लगभग refined हो जाता है, लेकिन samples में कभी-कभी rare mistakes बचती हैं
learned z distribution का analysis दिखाता है कि इसने color-direction correspondence table और row/column divider positions को code किया है
केवल चार tensors information बनाए रखते हैं
- (examples, height, channel): हर example की sky-blue row position रखता है
- (examples, width, channel): हर example की sky-blue column position रखता है
- (direction, color, channel): direction और color correspondence रखता है
- (color, channel): magenta और sky-blue की special roles को अलग करता है

अतिरिक्त cases और representation analysis

Bounding Box puzzle 6d75e8bb
- human solution red shape को घेरने वाला सबसे छोटा sky-blue box खींचने का तरीका है
- CompressARC 100 step पर common bounding box समझने के संकेत दिखाता है, और 150 step पर answer जान लेने के बाद आगे की training से answer refine करता है
- बचने वाले main tensors (examples, height, channel), (examples, width, channel), (color, channel) हैं
- row/column tensors उन rows और columns को दिखाते हैं जिनमें sky-blue pixels ज्यादा हैं, लेकिन boundary positions कैसे पता चलती हैं यह unclear है
Center Cross puzzle 41e4d17e
- input के blue bubble center से magenta rays ऊपर-नीचे-बाएँ-दाएँ खींचनी हैं, और bubble color को rays के ऊपर overwrite करना है
- CompressARC input copy करने के बाद magenta row/column दिखाता है, और वे धीरे-धीरे सही position पर stable हो जाते हैं
- human solution जैसी ray को bubble के ऊपर गलत खींचने की mistake नहीं दिखती
- बचने वाले tensors (examples, height, width, channel) और (color, channel) हैं
- (examples, height, width, channel) bubble center को code करता है

सुधार के ideas

puzzles को अलग-अलग compress करने के बजाय पूरे ARC-AGI dataset को साथ में compress करने पर puzzles के बीच computation share करके बेहतर inductive bias मिल सकता है
- वही network weights सभी puzzles के लिए इस्तेमाल करने और हर puzzle के लिए limited perturbation देने की method पर विचार किया गया
- हर puzzle के लिए high-dimensional embedding सीखकर, इस embedding से network weights तक linear mapping सीखने वाली hypernetwork method भी propose की गई
- इस direction से research iteration speed धीमी हो सकती थी, इसलिए इसे try नहीं किया गया
shape copying tasks में convolution-family layers useful हो सकती हैं
- एक grid shape store करे और दूसरी grid copy location दिखाए, तो convolution से copy result बनाया जा सकता है
- सामान्य convolution में noise को signal से ज्यादा amplify करने की समस्या थी
- tropical convolution toy puzzles में अच्छी तरह काम करता था, लेकिन ARC-AGI training puzzles के लिए पर्याप्त नहीं था
posterior collapse कम करने के लिए KL floor रखने की method पर भी विचार हुआ
- observation था कि important tensor का KL 0 पर गिर जाए तो वह फिर recover नहीं करता
- KL को कुछ समय तक 0 से ऊपर बनाए रखने पर network उस information का उपयोग करना सीख सकता है
- implementation किया गया, लेकिन tensor recover होने के cases नहीं दिखे, और KL floor schedule को अलग तरह से design करने की जरूरत है
regularization implementation में इस्तेमाल नहीं किया गया
- problem formulation में यह f की complexity मापने वाला element है और CompressARC derivation में शामिल है
- implementation से इसे बाहर रखना कुछ हद तक reckless माना गया

Related work और research positioning

compression और intelligence की equivalence का idea Hutter Prize से inspired है
- Hutter Prize Wikipedia text file को सबसे अच्छी तरह compress करने वाले system को award देता है, और information compression ability को intelligence से जोड़ता है
theoretical background में Solomonoff Induction, Kolmogorov Complexity, Minimum Description Length शामिल हैं
information theory side पर Relative Entropy Coding key है
- माना जाता है कि अगर KL divergence को limit किया जा सके, तो compression algorithm construct किया जा सकता है, और actual binary code implementation की problem को abstract किया जाता है
VAE perspective में decoder decompression algorithm की भूमिका निभाता है
- अधिक general capabilities वाली neural Turing machine पर भी विचार किया जा सकता है, लेकिन वह gradient descent optimization के लिए suitable नहीं है, इसलिए VAE side का उपयोग किया गया
- beta-VAE-style reconstruction loss reweighting इस case में अच्छी तरह काम करता है
existing ARC-AGI methods मुख्य रूप से LLM, data augmentation, alternative datasets, test-time learning, domain-specific language based program search का उपयोग करती हैं
CompressARC यह emphasize करता है कि यह external pre-training और large-scale search के बिना deep learning का उपयोग करने वाली method है
project code GitHub पर public है

1 टिप्पणियां

GN⁺ 2025-03-05

Hacker News की टिप्पणियां

बड़े पैमाने की pretraining generality के मकसद के खिलाफ लगती है
अगर आपने ऐसी general machine बना ली जो सिर्फ 3 उदाहरण देखकर चौथे की भविष्यवाणी करने वाला program synthesize कर सकती है, तो असल में आपने oracle synthesis हल कर लिया है
इसके उलट, अगर आपने puzzle बनाने तक सहित मानव ज्ञान के पूरे corpus पर network को train किया, dataset के 99% पर fine-tune किया और फिर आखिरी 1% पर कई बार कोशिश करने दी, तो वह exam setter की psychology को compress करने वाला महंगा compressor बनाने जैसा है
- यह ज्ञान और समझ को लेकर काफी भोला-भाला नजरिया दिखाता है
  इसमें मान लिया जाता है कि Platonic logic और reason का कोई domain है, जिससे AGI को बस connect करना है, लेकिन context के बिना न meaning हो सकता है, न inference, न logic
  shape patterns मिलाने के लिए shape की अवधारणा चाहिए, जो spatial relations की अवधारणा मानकर चलती है, और वह फिर 2D या 3D space की अवधारणा मानकर चलती है
  ये चीजें obvious और implicit इसलिए लगती हैं क्योंकि वे उस environment में गहराई से बसी हैं जिसे human mind करोड़ों सालों से interpret करने के लिए evolve हुआ है, और जिसे उसने दशकों तक consume और process किया है
  AGI की असली परीक्षा अलग-अलग जानकारी को एक coherent worldview में assimilate करने की क्षमता है, और pretraining असल में यही काम कर रही है
  ऐसी क्षमता वाली intelligence में भी जिस दुनिया में उसे रखा जाएगा, उसके बारे में structural assumptions “पहले से loaded” होने की काफी संभावना है। यह spatial relations, language और sensory interpretation में दक्ष brain areas जैसा है
- जब machine किसी बिल्कुल नए type की problem से मिलती है, तो अगर वह खुद तय कर सके कि कैसे सीखना है, यानी weights adjust करने का तरीका, तो मुझे नहीं लगता कि यह general intelligence के मकसद के खिलाफ है
  इंसान भी जब किसी चीज में बेहतर होना चाहते हैं, तो यह पता लगाते हैं कि उस task की practice कैसे करें, और सचमुच improve करने वाले तरीके से सीखते हैं
- सही। मौजूदा paradigm के कई problems भी वहीं हैं, और वे real generalization की इजाजत नहीं देते
  इसलिए कुछ लोग मानते हैं कि फिलहाल AGI नहीं आएगा: https://www.lycee.ai/blog/why-no-agi-openai
- मेरे हिसाब से human learning का बड़ा हिस्सा सालों के sensory input से आता है
  बिना background knowledge के हमें क्यों उम्मीद करनी चाहिए कि machine अच्छी तरह generalize करेगी?
- ARC, 4 images के tuple पर distribution के बराबर है, और prior distribution न हो तो पहले 3 दिए जाने पर भी आखिरी image uniform distribution ही होगी
Marcus Hutter वाले Lex Fridman podcast की याद आती है
Joshua Bach ने भी intelligence को reality को सटीक model करने की क्षमता के रूप में define किया था, और मैं सोचता हूं कि lossless compression खुद intelligence है या optimal fitted model। दोनों में कोई फर्क है क्या?
https://www.youtube.com/watch?v=E1AxVXt2Gv4
- संदर्भ के लिए, ARC-AGI बनाने वाले François Chollet ने 2020 के Lex Fridman podcast में कहा था कि intelligence compression नहीं है: https://youtu.be/-V-vOXLyKGw
- intelligence, complex reality को high accuracy और low latency के साथ predict करने वाला simple model खोजने की क्षमता है
  इसलिए simplicity, accuracy, latency और reality की complexity—इन चार axes को देखना होगा, और artificial intelligence इसी space के किसी region में होगी
  असल में intelligence को अलग करने वाला एक simple test है: क्या आप C function code पढ़कर बता सकते हैं कि input में बदलाव output को कैसे affect करता है
  complex algorithms में internal model बनाना पड़ता है। वरना दस लाख items पर qsort कैसे चलेगा, इसे दिमाग में कैसे execute करेंगे
  छात्र समझने का दिखावा कर रहा है या सचमुच समझा है, यह भी इसी तरीके से अलग किया जा सकता है
  इससे मुश्किल test उल्टा है: सिर्फ कुछ input-output examples देखकर algorithm बना देना
- पूरा podcast देखे बिना जल्दी से जोड़ूं तो, Hutter का stance Hutter Prize[1] के रूप में दिखाई देता है, और कुछ मायनों में यह ARC-AGI के goal से काफी मिलता-जुलता है, लेकिन compression को ही intelligence की ओर जाने का benchmark मानता है
  [1] http://prize.hutter1.net/
मैं इस approach का सार निकालने की कोशिश कर रहा हूँ, लेकिन लगता है कि यह किसी खास compression method की पसंद या prior distribution जैसी गैर-ज़रूरी details के पीछे छिपा हुआ है
मुख्य innovation यह लगता है कि उन्होंने एक ऐसा “model” बनाया जिसे gradient descent से optimize किया जा सकता है, और उसका optimum ऐसा सबसे “simple” model बनता है जो input-output relation को याद रखता है
यहाँ “simplicity” का मतलब खास तौर पर “efficiently compressible” है, लेकिन अधिक सामान्य रूप में यह शायद model complexity को जितना हो सके कम रखने के करीब है
यह standard machine learning से साफ़ तौर पर अलग है। आम तौर पर पहले model structure और कई complexity parameters चुनकर complexity budget तय किया जाता है, फिर data से train करके ऐसा solution खोजा जाता है जो input-output relation को अच्छी तरह याद रखे
यह नया तरीका machine learning को उलट देता है। input-output pairs को याद रखते हुए, model complexity को जितना हो सके कम करने के लिए optimize करता है
सिर्फ 2 training examples से generalize कर पाना सच में हैरान करने वाला है, और मुझे लगता है कि यह generalization से निपटने की सही दिशा की ओर मज़बूती से इशारा करता है
authors जिस रास्ते से इस structure तक पहुँचे, वह information theory था, लेकिन मुझे नहीं पता कि वही असल सार है या नहीं
core बात यह समझ लगती है कि fixed complexity budget में सबसे अच्छा model खोजने के बजाय, संभव minimum complexity model खोजा जा सकता है
- complexity minimization का idea उतना नया नहीं है जितना दिखता है
  optimization के loss objective function में regularization term अक्सर जोड़ा जाता है, और ऐसे regularization को अक्सर complexity पर penalty लगाने के रूप में समझा जा सकता है
  duality की वजह से उसी objective function को कई तरीकों से देखा जा सकता है: data error और complexity के weighted sum को minimize करना, या data error को किसी सीमा से नीचे रखते हुए complexity minimize करना, या complexity को सीमा से नीचे रखते हुए data error minimize करना
  ऐसी classic regularization हाल के समय में trend से बाहर लगती है
  मुझे नहीं लगता कि अधिकांश Transformer architectures में इसकी बड़ी भूमिका है, लेकिन अगर यह किसी न किसी रूप में लौटे तो दिलचस्प होगा
  इसके अलावा इस approach में इतने नए elements हैं कि यह अलग करना मुश्किल है कि असल में performance किससे आ रही है
  उदाहरण के लिए neural network architecture खुद भी ARC-AGI type tasks पर performance maximize करने के लिए काफी मेहनत से tuned दिखता है, और उससे आगे यह कैसे generalize होगा, साफ़ नहीं दिखता
- मुख्य ingredients के बारे में आप सही लगते हैं, लेकिन मुझे यह result काफी ARC-AGI-specific लगता है
  हर puzzle का format मिलता-जुलता है, और puzzle के भीतर बदलने वाला data उस जानकारी से लगभग ठीक-ठीक मेल खाता है जो rule infer करने के लिए चाहिए
  rule को explain करने के लिए जरूरी information की मात्रा घटाने पर, information loss को minimize करने के लिए codec लगभग वही करने पर मजबूर हो जाता है जो rule खुद करता है
  अगर हर puzzle में ज्यादा noise या random data होता, तो मुझे नहीं लगता कि यह technique काम करती
  बेशक किसी point के बाद puzzle “puzzle कहाँ है यह ढूँढना” नहीं बन जाना चाहिए, लेकिन यहाँ यह इसलिए काम करता है क्योंकि हर example puzzle के बारे में pure information है
दिलचस्प। मुझे धीरे-धीरे लगने लगा है कि machine learning का भविष्य शायद उस अर्थ में “machine learning” कम होगा, जिसके हम आदी हो चुके हैं
pretraining, data और search कम; direct representation, symbolic processing, constraint satisfaction, meta-learning जैसी चीजें ज्यादा
जो चीजें कम जरूरी होंगी—यानी pretraining और data वगैरह—वे messy, brute-force और accidental हैं
उन पर निर्भर रहने से आप हमेशा data quality के अधीन रहते हैं, और अगर मकसद data mining है तो ठीक है, लेकिन data के root causes को model करने के मकसद के लिए यह सही नहीं है
मेरी समझ में ये लोग solution/problem space की minimum representation उजागर करने की कोशिश कर रहे हैं
equivariance के जरिए problem की वास्तविक structure को track करते हुए, कई solution examples में संयोग से पकड़ लेने की उम्मीद करने के बजाय वे puzzle की वास्तविक underlying representation और solving method के करीब कुछ निकाल रहे हैं
शानदार documentation और explanation है। यह मेरी introspection से भी मेल खाता है, इसलिए अच्छा लगा
मेरा मानना है कि “intelligence सूचना को irreducible representation में compress करना है”
- intelligence की अच्छी अभिव्यक्ति है
  https://en.wikipedia.org/wiki/Kolmogorov_complexity
  https://en.wikipedia.org/wiki/Solomonoff%27s_theory_of_induc...
  https://en.wikipedia.org/wiki/Minimum_description_length
  यह इन concepts से जुड़ा लगता है, इसलिए मैं और गहराई से देखूँगा
- अगर “intelligence सूचना को irreducible representation में compress करना है”, तो मैंने सोचा था कि वह physics है ;)
  https://en.wikipedia.org/wiki/Wigner%27s_classification
अगर ARC-AGI एक benchmark है जो minimal examples से abstract rules infer करके generalize करने की क्षमता test करता है, तो आखिरकार यह intelligence को information को rule set में compress करने की क्षमता के रूप में define कर रहा है
तो compression वही काम करता है, यह कहना सही है
- यह उतना circular या obvious नहीं है जितना claim लग सकता है
  उत्सुकता है कि क्या आपने ARC-AGI problems खुद solve की हैं
  problems काफी subtle हैं, और abstract concepts की wide range test करती हैं
  reference के लिए, o1-preview ने public evaluation में 21% score किया, और original approach 34% है
कुछ हद तक संबंधित Schmidhuber paper: https://arxiv.org/abs/0812.4360
“हर puzzle को RTX 4070 पर लगभग 20 मिनट process करना” कहने का मतलब लगता है कि 100-problem challenge में 33.3 hours लगेंगे
यह challenge target 12 hours से ज्यादा है, लेकिन approach खुद काफी शानदार है
यह structure को बहुत मेहनत से design करने वाली बात को छोड़ दें, तो लगभग standard Bayesian deep learning approach जैसा दिखता है

बिना pre-training के ARC-AGI हल करना

क्या सिर्फ compression से ARC-AGI हल किया जा सकता है: एक प्रयोग

ARC-AGI problem setting

CompressARC कैसे काम करता है

compression perspective से derivation

Architecture: multitensor और equivariance

Performance results

कौन-से puzzles हल हो सकते हैं और कौन-से कठिन हैं

Case: Color the Boxes

अतिरिक्त cases और representation analysis

Bounding Box puzzle 6d75e8bb

Center Cross puzzle 41e4d17e

सुधार के ideas

Related work और research positioning

संबंधित पढ़ाई

1 टिप्पणियां

Hacker News की टिप्पणियां