Toon3D: कॉमिक्स को नए दृष्टिकोणों से देखना

(toon3d.studio)

1 पॉइंट द्वारा GN⁺ 2024-05-18 | 1 टिप्पणियां | WhatsApp पर शेयर करें

Toon3D एक ऐसी विधि है जो एक ही दृश्य को दर्शाने वाली कॉमिक और animation images से camera pose और dense 3D structure को recover करती है, ताकि उन viewpoints के views synthesize किए जा सकें जिन्हें वास्तव में कभी draw नहीं किया गया था
Hand-drawn scenes में explicit 3D consistency की कमी होती है, इसलिए मौजूदा SfM के fail होने की संभावना अधिक रहती है; Toon3D images को deform करते हुए camera और scene geometry को साथ-साथ fit करता है
Pipeline Marigold depth prediction, SAM transient mask candidates, और Toon3D Labeler की human labeling को combine करके correspondences और transient regions को alignment में शामिल करता है
Recovered dense point cloud का इस्तेमाल Gaussian Splatting initialization में होता है, और Nerfstudio-based optimization व depth regularization के जरिए comic scenes के fly-through renders बनाए जाते हैं
COLMAP, Bundle Adjustment, और DUSt3R की तुलना में अधिक stable camera poses और scene geometry पाने पर focus करता है, और Airbnb rooms व paintings reconstruction के examples पर भी लागू होता है

Hand-drawn scenes में SfM मुश्किल क्यों है

इंसान 3D के लिहाज से पूरी तरह consistent न होने वाली images में भी underlying 3D scene को पहचान सकते हैं, लेकिन machines उन्हीं conditions में struggle करती हैं
Comics और animation images अक्सर storytelling और creative expression के लिए explicit geometric consistency के बिना draw की जाती हैं
मौजूदा Structure-from-Motion(SfM) methods 3D consistency assume करते हैं, इसलिए ऐसी hand-drawn images पर बुरी तरह fail होते हैं
Perfect correspondences होने पर भी COLMAP non-geometric hand-drawn images को reconstruct नहीं कर पाता, और Bundle Adjustment व DUSt3R भी बहुत खराब performance दिखाते हैं

Inconsistencies को absorb करने वाली alignment method

Toon3D geometrically inconsistent images को deform करते हुए camera poses और scene geometry को साथ-साथ recover करता है
Core idea यह है कि images के बीच geometric inconsistencies को deformation में absorb करके scene को अधिक consistent 3D structure से fit किया जाए
Monocular depth prediction से मिली structure information इस alignment process को guide करती है
Manually labeled keypoints के आधार पर piecewise-rigid deformation optimization किया जाता है, जिससे camera poses और dense geometry recover होती है

Processing pipeline

हर image की depth Marigold से predict की जाती है
Transient mask candidates SAM से मिलते हैं
User Toon3D Labeler से images के बीच correspondences label करता है और transient regions mark करता है
Optimization stage में camera poses align किए जाते हैं और images को warp करके corrected perspective cameras प्राप्त किए जाते हैं
आखिर में aligned dense point cloud से Gaussians initialize किए जाते हैं और refinement run किया जाता है

Camera और deformation का simultaneous optimization

Toon3D के दो मुख्य objectives camera alignment और deformation alignment हैं
camera alignment objective camera parameters recover करता है
deformation alignment objective ज्यादा करीब alignment के लिए mesh को warp करता है
वास्तविक optimization में दोनों objectives को साथ-साथ fit किया जाता है
Method visualization में cameras, sparse correspondences, warping meshes, point clouds, gaussians जैसे कई layers शामिल होते हैं

नए viewpoints की synthesis और fly-through render

Toon3D पहले camera poses और aligned point cloud recover करता है
इसके बाद dense point cloud से Gaussians initialize करता है, और recovered cameras के साथ Gaussian Splatting optimize करता है
Implementation Nerfstudio based है और इसमें depth regularization शामिल है
Result को comic scene के fly-through render के रूप में देखा जा सकता है
Example scenes में Bob's Burgers, Family Guy, SpongeBob SquarePants, Rick and Morty, Simpsons, Spirited Away, Futurama, Avatar, BoJack Horseman, Magic School Bus, Scooby-Doo शामिल हैं

Toon3D Dataset और labeling tool

Toon3D Dataset comics और animation की multiview images से बना है
Dataset में reliable sparse correspondences annotations शामिल हैं
Annotation work के लिए user-friendly Toon3D annotation tool इस्तेमाल होता है
Recovered point cloud को novel-view synthesis methods से जोड़ा जाता है, जिससे cartoons को ऐसे viewpoints से देखा जा सकता है जो वास्तव में draw नहीं किए गए थे
Page 12 comic scenes के point clouds और recovered cameras को visualize करता है

Rick and Morty घर के interior का reconstruction

Rick and Morty घर का interior walls और ceiling के बीच labeling करके rooms को connect करने के तरीके से reconstruct किया गया है
पहला video point cloud, camera, और custom labeling interface दिखाता है
दूसरे video में slider के जरिए घर के interior का walkthrough देखा जा सकता है
सबसे नजदीकी camera की image screen के नीचे दाईं ओर दिखाई देती है

Sparse-view और अन्य input examples

Toon3D कम images और बड़े viewpoint changes वाले scenes को भी reconstruct कर सकता है
जहां COLMAP fail हो सकता है, वहां Toon3D Labeler से human-labeled correspondences add किए जा सकते हैं
Airbnb listing के दो rooms “Living room” और “Bedroom 2” के लिए fly-through render प्रस्तुत किया गया है
COLMAP सभी cameras recover नहीं कर पाया, लेकिन labels COLMAP को सफल बना सकते हैं
Scene completion के लिहाज से Toon3D सबसे अच्छा result देता है

Warping visualization और painting reconstruction

Comics hand-drawn होती हैं, इसलिए 3D consistency पाने के लिए images को warp करना पड़ता है
Alignment optimization के दौरान warping कैसे आगे बढ़ती है, इसे video में दिखाया गया है
Original drawing और warped drawing, तथा दोनों images के overlap की तुलना करने वाला visualization भी शामिल है
धुंधले areas उन जगहों को दिखाते हैं जहां ज्यादा warping हुई है
Toon3D hand-drawn paintings पर भी apply होता है; हर image की depth predict करने के बाद point clouds को align और warp किया जाता है, और Gaussian refinement से video बनाया जाता है

Public resources

arXiv: Toon3D paper
Code: implementation code
Toon3D Labeler: correspondences और transient regions labeling tool
Demo: Hugging Face demo
Overview Video: problem setup और method overview video

1 टिप्पणियां

GN⁺ 2024-05-18

Hacker News की राय

Futurama की Planet Express इमारत को 3D असंगति के उदाहरण के तौर पर देना दिलचस्प है
बाहरी हिस्सा असल में 3D model से computer-generated चीज़ के ज़्यादा करीब लगता है। सीरीज़ देखें तो इमारत के आसपास स्मूद और जटिल तरीके से घूमने वाले establishing shots अक्सर दिखते हैं
- सहमत। Planet Express इमारत और spaceship का ज़्यादातर या पूरा हिस्सा शुरुआती seasons से ही 3D rendering था, और Bender के space में होने वाले कुछ scenes में भी जब जटिल और लगातार perspective changes की ज़रूरत होती थी, तब 3D rendering इस्तेमाल की गई थी
  फोटो जैसा न दिखने वाला 3D art (NPR) animation में हमारी सोच से कहीं पहले से इस्तेमाल होता रहा है। हाल ही में मैंने 1988 की Disney animation "Oliver and Company" फिर से देखी, और कारों व इमारतों के "cell-shaded" 3D models होने पर हैरानी हुई। पहले लगा कि शायद यह remastered version है, लेकिन खोजने पर पता चला कि यह Disney फिल्मों में बड़े पैमाने पर CGI इस्तेमाल करने वाली पहली फिल्म थी[0], और जो मैंने देखा वह original में ही मौजूद था
  मिली हुई page पर यह लिखा है: "This was the first Disney movie to make heavy use of computer animation. CGI effects were used for making the skyscrapers, the cars, trains, Fagin's scooter-cart and the climactic Subway chase. It was also the first Disney film to have a department created specifically for computer animation."
  References
  0: https://disney.fandom.com/wiki/Oliver_%26_Company
- मुझे लगता है shows या games में 3D अक्सर दर्शकों को अच्छा दिखने के लिए tricks इस्तेमाल करता है
  मुझे याद है कि मैंने एक लेख देखा था जिसमें बताया गया था कि 3D animators चीज़ों को natural दिखाने के लिए क्या-क्या करते हैं। जैसे camera के गुजरने पर असली system में height बहुत छोटी दिखती है, इसलिए character को 9 feet का बना देना; या arched door को बहुत बड़ा बनाना लेकिन किसी खास perspective shot में उसे normal दिखाना; या height difference बहुत extreme दिखकर अजीब न लगे, इसलिए छोटे character को screen के बाहर एक नीले box पर खड़ा कर देना। कभी corridor असल में 1,000 feet का होता है, लेकिन camera जिस तरह गुजरता है उसके कारण दुनिया के भीतर 100 feet जैसा दिखता है, और उस corridor का हर दरवाज़ा 18 feet ऊँचा होता है
  अगर Futurama जैसी works ने भी ऐसी techniques इस्तेमाल की हों, तो इस तरह reverse-engineer करके animator ने जिस 3D space में काम किया था उसे reconstruct करने पर विशाल दरवाज़े, 9 feet लंबे लोग और non-Euclidean corridors दिख सकते हैं। camera गुजरते समय smooth दिखता है, इसका मतलब यह नहीं कि असली 3D model दूसरे viewpoints से भी समझ में आएगा
- आजकल 3D animation जैसी न दिखने वाली animation में भी production pipeline के किसी न किसी हिस्से में 3D model शामिल होना आम है
  Digital 3D model न भी हो, तो भी animators के reference के लिए studio में प्रमुख locations के physical models रखे जाते हैं
- सही। Futurama ने 1999 के पहले episode से ही composited 3D elements इस्तेमाल किए थे, और vehicles लगभग हमेशा 3D होते थे
- बाहरी हिस्सा किसी एक 3D model से generate नहीं हुआ था, बल्कि उसी object को दिखाने वाले कई 3D models से generate हुआ लगता है
  समय के साथ बदल गया हो सकता है या scene-दर-scene अलग रहा हो सकता है, और इसे Star Trek Enterprise model की तरह देखा जा सकता है
शानदार तो है, लेकिन इसका वास्तविक application क्या होगा, यह समझना मुश्किल है
2D drawings में आम तौर पर consistent 3D space नहीं होता, और paper भी इसे मानता है, लेकिन ऐसा नहीं लगता कि इसने useful sense में उस problem को overcome किया है। जैसे ही आप original drawn camera position से हटते हैं, scene की consistency काफ़ी कमजोर हो जाती है
- Futurama और Family Guy, उदाहरण के लिए, vehicles में 3D rendering इस्तेमाल करते हैं, उन्हें cartoon जैसा render करते हैं और फिर flat 2D animation के साथ composite भी करते हैं
  इसी तरह का काम इसका application हो सकता है
  एक और use case 2D cartoon-based licensed games को 3D में बनाने वाले game development studios हो सकते हैं। planning और development के दौरान visualization tool के तौर पर इसका इस्तेमाल करके जल्दी iterate करना और original 2D का 3D में कैसे translation होगा, इसका reference लेना
- SpongeBob खुलकर 3D space rules तोड़ता है। वैसे भी यह ऐसी series है जिसमें पानी के अंदर आग भी दिखती है
  writers और artists दोनों Looney Tunes से काफ़ी inspired थे, और वहाँ ऐसे rules इसलिए तोड़े जाते हैं क्योंकि उन्हें तोड़ना funny होता है
- इसका ज़्यादा refined version cartoons को stereoscopic video में बदलने में इस्तेमाल हो सकता है
  हालांकि इस mapping process के बजाय शायद सिर्फ depth prediction इस्तेमाल करना और खाली space को image generation से भरना बेहतर होगा
- मुझे यह technology को दिखाने और आगे बढ़ाने का तरीका ज़्यादा लगता है
  ऐसे environments में 3D modeling पर बहुत मेहनत नहीं लगती, इसलिए इस context में कोई real application होगा या नहीं, इस पर शक है
- आगे और development होने पर इससे कई series के video games निकाले जा सकते हैं
  भले ही rough हो, लेकिन cartoon-based games के कुछ implementations की तुलना में यह original art को बेहतर तरीके से transfer करता हुआ दिखता है
असंगत मूल इमेजों से 3D स्पेस बनाने का विचार सचमुच दिलचस्प है
कुछ साल पहले मैंने कुछ ऐसा ही बहुत कच्चे और खराब तरीके से आज़माया था—सिर्फ ऐसे असंगत स्पेस पर नहीं जिनका कोई साफ़ सही जवाब नहीं होता, बल्कि उन शुद्ध abstract non-spatial images पर भी जो शुरू से ही 3D स्पेस दिखाने की कोशिश नहीं करतीं। यह Kandinsky या Pollock जैसी abstract paintings को explore की जा सकने वाली virtual reality spaces में बदलने की कोशिश थी। ज़ाहिर है, "Pollock की painting के अंदर चलना" क्या होता है, इसका कोई सही जवाब नहीं है; लक्ष्य बस यह देखना था कि इसे जबरन करने पर क्या होता है
वर्कफ़्लो ऐसा था: 1. एक single abstract painting source image से शुरू करें 2. SinGan से "scene" के अलग-अलग "viewpoints" generate करें 3. मूल और SinGan images पर 3d-photo-inpainting या Ken Burns जैसे project लागू करें, monocular depth mapping से zoom/rotate/pan video output करें 4. 3d-photo-inpainting frames को photogrammetry app में डालें। NeRF तब तक नहीं था, और errors व inconsistencies को जितना हो सके allow करने के लिए सभी settings बढ़ा दीं 5. प्रार्थना करें कि photogrammetry process फट न पड़े। 10 में से 9 बार यह 24 घंटे बाद crash हो जाता था, जो बहुत बेरहम था
शायद मैंने Twitter पर examples डाले थे, लेकिन search terms नहीं मिल रहे। फिर भी 2019-स्तर की depth mapping से ही abstract paintings से काफी मज़ेदार videos बने थे: https://x.com/jonathanfly/status/1174033265524690949 सबसे करीब चीज़ frame-to-frame consistency के बिना NVIDIA GauGAN videos के photogrammetry results हैं: https://x.com/jonathanfly/status/1258127899401609217
जिज्ञासा है कि क्या यह project उसी idea को बेहतर कर सकता है। इस weekend try कर सकता हूँ
- कोई ऐसी technique या library क्या है जो 3D environment image या कमरे की drawing लेकर floor, walls, obstacles को highlight करने वाला rough mesh detect कर सके?
पहले Quest 2 खरीदने के बाद photogrammetry की दुनिया में उतरते हुए, मैंने अलग-अलग angles से ली गई object photos से 3D model बनाने की पूरी pipeline देखी थी
MeshRoom और mesh को clean करके Unity में ले जाने के लिए कुछ software इस्तेमाल किए थे
मेरी सतही समझ के हिसाब से, Unity में किसी object के आसपास चलने जैसा कुछ VR में लाने का core हिस्सा clean mesh बनाना है। इस लेख के tools जैसी चीज़ें जो 3D model बनाती हैं, उन्हें मैंने अभी गहराई से नहीं देखा है, लेकिन वे 3D space के point cloud के ज्यादा करीब हैं। वे 3D mesh generate नहीं करतीं
research के दौरान देखे tools में https://developer.nvidia.com/blog/getting-started-with-nvidi... जैसी चीज़ थी, लेकिन यह भी mesh नहीं बनाता। यह बस video जैसा है, और VR में सीधे walk around की जा सकने वाली चीज़ नहीं लगता
छिपा हुआ motivation Matterport जैसी चीज़ को clone करना या model बनाकर real estate companies को बेचना था। मेरी समझ में बड़ा gap, और interest खोने की वजह, यह थी कि camera photos की series से clean mesh generate करने वाला step कैसे automate किया जाए, इसका भरोसा नहीं था। मुझे यही हिस्सा सबसे ज्यादा labor-intensive लगा। बाद में सुना कि इस step को कर सकने वाले machine learning models हैं, लेकिन उस तरफ मुझे ज्यादा जानकारी नहीं
- Unreal + Nanite + PCVR इस्तेमाल करना शायद बेहतर हो सकता है
  Nanite बहुत complex meshes handle कर सकता है और उन्हें real-time में algorithmically simplify कर सकता है। मूल रूप से यह advanced LOD system है। इसकी limits नहीं जानता, लेकिन try करने लायक है। photogrammetry के लिए Reality Capture की जोरदार सिफारिश करूँगा। कीमत बहुत कम है और per-scan pay करना होता है
- NeRF कुछ हद तक पिछले साल की technology है, और आजकल hype Gaussian splats की तरफ है
  मेरी समझ में ये technologies कुछ images को input के रूप में लेकर model train करती हैं, और वह model किसी अर्थ में images को scene model के तौर पर render करने का best तरीका सीखता है। Gaussian splats images को space में एक तरह के "lumps" के रूप में represent करते हैं, और हर image को उसी lumps के set को किसी खास viewpoint से इस्तेमाल करके render होना चाहिए। इसलिए अगर splat positions इस तरह तय कर दी जाएँ कि हर image सही render हो, तो scene reproduce किया जा सकता है
  अभी यह training बहुत महंगी है और हर model के लिए फिर से करनी पड़ती है, लेकिन output real-time में explore किया जा सकता है
  Matterport आदि जो photogrammetry approach इस्तेमाल करते हैं वह पुराना तरीका है और उसे कहीं ज्यादा high-quality input data चाहिए, लेकिन लगता है कि modern approaches कम और lower-quality data से भी काम कर सकती हैं
- https://www.reddit.com/r/sdforall/comments/13lenfm/free_seam...
  https://github.com/3DTopia/OpenLRM
  कहा जाता है कि यह NeRF से inspired है, लेकिन underlying paper ने vision transformer इस्तेमाल करने का फैसला किया लगता है। open source version शायद Meta के DINO को core components में से एक के रूप में इस्तेमाल करता है
- क्या यह Rhino के shrink wrap जैसा है?
यह काफी हैरान करने वाला है कि किसी की कल्पना से बनाई गई scene drawing लेकर खराब ही सही, लेकिन 3D model बनाया जा सकता है
भविष्य में कल्पना की जा सकती है कि artist सिर्फ किसी scene के कुछ sketches बनाए और उसे accurate 3D model मिल जाए
या कोई 2D artist सिर्फ कुछ poses sketch करे और well-structured 3D model व textures अपने-आप निकल आएँ
industry में AI और ऐसे tools का artists पर impact लेकर काफी चिंता है, लेकिन language prompt-based rendering के बजाय ऐसा future भी कल्पनीय लगता है जहाँ machine learning systems artists के साथ ज्यादा सीधे collaborate करें
AI training की नैतिक बहस पर मेरी भावनाएँ साफ़ नहीं हैं। मुझे training कैसे हुई, उससे ज्यादा चिंता यह है कि इसका लोगों पर क्या असर पड़ेगा। अगर पूरी तरह "ethical" तरीके से trained model perfect art बनाए और artists niche profession बन जाएँ, तब भी यह पूरी civilization के लिए खराब result हो सकता है, क्योंकि मेरे हिसाब से इंसानों द्वारा art बनाने में value है और यह काम कुछ हद तक sustainable रहे, इसमें भी समाज के लिए value है
दूसरी तरफ image models से लोग जो results बना रहे हैं वे भी कमाल के हैं, इसलिए मैं निश्चित नहीं हूँ। ideally तो अच्छा होगा कि market न भी हो तब भी लोगों को वह काम करने में support मिल सके जो वे करना चाहते हैं, लेकिन दुनिया अभी उसके लिए तैयार नहीं है
मैं ग्राफ़िक आर्टिस्ट नहीं हूँ, लेकिन मुझे लगता है कि illustrator के काम में जटिल अर्थ पहुँचाने के लिए काफ़ी रचनात्मक expression techniques शामिल होती हैं
हालांकि वीडियो में दिखने वाला बेढंगा 3D space reconstruction हाल के large language models के hype की याद दिलाता है
यानी, output का आधार सामग्री की “सच्चाई” या “तथ्य” से साफ़ संबंध तो है, लेकिन इतना accurate नहीं कि उसे आगे के काम के लिए source material के तौर पर उपयोगी माना जा सके
- पहले भी मैंने यही बात कही थी, मुझे उम्मीद है कि LLM मौजूदा episodes जैसी ही feel वाले नए episodes लिख पाएगा
  पुरानी comics के “नए” episodes देखना वाकई मज़ेदार होगा। बेशक, उसके बाद आने वाली copyright की अफरा-तफरी अलग बात है
यह देखकर हैरानी हुई कि किसी खास image के viewpoint से दिखने वाला रूप यह कितनी खराब तरह reproduce करता है
उदाहरण के लिए नीचे वाले Magic School Bus को देखें, तो लगता है कि algorithm को image पर ज़्यादा भरोसा करने की दिशा में adjust किया जा सकता है
- art का बड़ा हिस्सा इस फर्क को समझने में है कि reality में क्या सही है और feel के हिसाब से क्या सही लगता है
  जिन 3D animation और films पर मैं ज़्यादातर काम करता हूँ, उनमें भी backgrounds या धुंधले foreground objects को अगर real-world construction में map करें तो वे भले ही बेतुके हों, लेकिन देखने में सही लगने के लिए अक्सर distort किए जाते हैं और अजीब ढंग से place किए जाते हैं। 2D art तो उससे भी कम real-world representation से बँधा होता है
  ऐसी applications को देखकर समझ आता है कि हमारा brain, जो अपेक्षाकृत abstract representations के आधार पर concepts बनाता है, कितना कमाल का है, और artists की उस कम-defined क्षेत्र में काम करने की क्षमता कितनी अद्भुत है। scene दर्शक को consistent perspective वाला लग सकता है, लेकिन background का sofa और side table शायद 120mm lens से shot जैसा draw किया गया हो और foreground जानबूझकर तंग 30mm lens जैसा draw किया गया हो। यह फिर भी ठीक लग सकता है, क्योंकि हमें characters जिस realistic 3D space में मौजूद हैं उसे infer करने की ज़रूरत नहीं होती, बस यह समझना होता है कि वे ऐसे space में हैं। हम जानते हैं कि किसी space में होना कैसा होता है, और लोग उस space के साथ कैसे interact करते हैं
  अच्छा art core idea पहुँचाने भर की चीज़ें देता है, उसे message का focus बनाता है, और फिर brain को अनजाने में connections बनाने और context जोड़कर पूरी “experience” तैयार करने देता है। sofa और side table के प्रकार से लेकर अक्सर मोड़े या बढ़ा-चढ़ाकर दिखाए जाने वाले scale और objects के बीच relations तक, हर चीज़ intended artistic effect के लिए communication की एक layer हो सकती है, और कई बार real world में उसका consistent representation होता ही नहीं। और किसी भी shot में composition की मदद करने या interaction को highlight करने के लिए objects को move किया जाना भी निश्चित रूप से होता है। अगर आप notice कर लें तो वह continuity issue है, और अगर notice न करें तो काम अच्छा हुआ है। ज़्यादातर मामलों में कोई notice नहीं करता, बस ऐसा महसूस करता है कि उसने एक ऐसी दुनिया देखी है जिसकी composition हर angle से convincing है
  जो algorithm lines देखकर real world में उस representation से मेल खाता scenario ढूँढना चाहता है, वह शायद ऐसी चीज़ बनाने की कोशिश कर रहा है जो शुरू से ही किसी consistent form में मौजूद हो ही नहीं सकती
समझ नहीं आता कि जिस site पर इतने सारे videos हैं, वहाँ सबके लिए autoplay और infinite loop क्यों चालू रखे गए हैं
मैं दूसरे screen पर video देख रहा था, लेकिन site खोलते ही हर बार lag होने लगता है
- क्या यह Chrome की समस्या है? Windows पर Firefox में videos autoplay नहीं होते
- शायद इसी वजह से iPhone के Firefox में load करते समय phone freeze हो गया होगा
  power restart करने पर ही ठीक हुआ
अगर Spirited Away वाला example Miyazaki को दिखाया जाए, तो वे शायद इसे जीवन का ही अपमान कहेंगे
- जिन्हें जिज्ञासा हो, उनके लिए यह एक पुराने video का reference है: https://www.youtube.com/watch?v=ngZ0K3lWKRc
  इसलिए यह अतिशयोक्ति नहीं है
हैरानी है कि यह लेख लिखने से पहले शायद किसी 3D animator से बात नहीं की गई। नीचे वाला वाक्य बस गलत है

The hand-drawn images are usually faithful representations of the world, but only in a qualitative sense, since it is difficult for humans to draw multiple perspectives of an object or scene 3D consistently. Nevertheless, people can easily perceive 3D scenes from inconsistent inputs!
यह सही है कि human artists के लिए perfect geometric consistency बनाए रखना मुश्किल होता है। लेकिन 2D animation में 3D scenes के geometrically inconsistent होने की वजह यह नहीं है। वजह यह है कि artist किसी खास artistic intent के लिए 3D scene को stylize करके emphasize करता है। SpongeBob जैसे surreal कामों में यह खास तौर पर सच है, और King of the Hill तक में "living room perspective", "kitchen perspective" जैसी stylization होती है। Artist चीज़ों को realistic दिखाने की कोशिश नहीं कर रहा होता, बल्कि उन्हें देखने में अच्छा बनाने की कोशिश कर रहा होता है। और मकसद इंसानों से perfect 3D image reconstruct कराना भी नहीं है, बल्कि हमारी 3D imagination को जगाना है। यह बिल्कुल अलग काम है
Pixar और दूसरे high-quality 3D animation studios cinematic effect के लिए scene की असल geometry को जानबूझकर distort करते हैं। किसी adult के viewpoint से देखा गया बच्चा अजीब तरह से लंबी गर्दन और छोटे, ठिगने धड़ के साथ render किया जा सकता है, क्योंकि animator छोटे बच्चे के emotional effect को emphasize करने के लिए visual foreshortening को जानबूझकर exaggerate करता है। Realistic perspective बस boring होता है। ऐसी techniques Pixar फिल्मों में हर जगह मिलती हैं, और इसी वजह से वे low-cost studios जैसे Euclidean 3D space में सिर्फ virtual camera घुमाने वाले नतीजों से कहीं बेहतर दिखती हैं
Technical details पर बात नहीं करना चाहता, लेकिन लगता है authors ने artistic core को miss कर दिया है
- इस field में काम करने वाले के तौर पर, मेरी हथेली और चेहरा इससे ज़्यादा करीब कभी नहीं आए थे
  Project में अपने-आप में कोई दिक्कत नहीं है। Research तो research है, और वे इसे "solved problem" की तरह package भी नहीं कर रहे। लेकिन tech-side के कुछ खास लोगों में AI image tools बिल्कुल बेबुनियाद हमने art solve कर दिया वाली शेखी पैदा कर देते हैं। नतीजतन वे basic art principles के बारे में बेबुनियाद assumptions घमंड से, कभी-कभी तो हुक्म चलाने के अंदाज़ में, फेंकते हैं
  मैंने software field में लंबे समय तक काम किया है, और software development का arrogance मेरे लिए नया नहीं है; यह भी जानता हूं कि कभी-कभी यह फायदेमंद भी हो सकता है। लेकिन software world के अंदर किसी एक topic पर इतना intense collective overconfidence मैंने शायद ही कभी देखा हो
- यह सोचकर खास तौर पर हंसी आती है कि असली TV cameras में भी यही चीज़ होती है
  आसान example के तौर पर, sitcoms में square room जैसे दिखने वाले कई sets असल में trapezoid होते हैं और दीवारें obtuse angles पर मिलती हैं। इसे लगभग कोई notice नहीं करता
- किसी खास artistic वजह से की गई stylization को अलग भी रख दें, तो इस context के works में camera या "camera" की साधारण जरूरतों के कारण distortion हमेशा unavoidable है
  HD से पहले के works में यह और भी ज्यादा था। Facial expressions और gestures पढ़ने लायक perspective करीब बनाने के लिए लोगों या characters को frame में काफी tight fit करना पड़ता था। उस दौर के सबसे "realistic" और sober shows को भी खंगालें, तो आखिरकार आपको ऐसे moments मिल ही जाएंगे जहां किसी खास shot को workable बनाने के लिए furniture या यहां तक कि walls को भी चुपचाप खिसकाया गया था

Toon3D: कॉमिक्स को नए दृष्टिकोणों से देखना

Hand-drawn scenes में SfM मुश्किल क्यों है

Inconsistencies को absorb करने वाली alignment method

Processing pipeline

Camera और deformation का simultaneous optimization

नए viewpoints की synthesis और fly-through render

Toon3D Dataset और labeling tool

Rick and Morty घर के interior का reconstruction

Sparse-view और अन्य input examples

Warping visualization और painting reconstruction

Public resources

संबंधित पढ़ाई

1 टिप्पणियां

Hacker News की राय