Coin3D develops three -dimensional generation with careful control and interaction
Links table
Abstract and 1 introduction
2 relevant business
3 method and 3.1 3D air conditioning for deployment
3.2 Interactive workflow for the army and 3.3 Conditional reconstruction
4 experiment and 4.1 comparison about the 3D generation -based generation and a photo -based
4.2 Compared to the generation of three -dimensional organisms, and 4.3 interactive generation with parts editing and 4.4 Muthabia studies
5 conclusions, recognition, and references
Supplementary materials
A. Implementation details
More discussions
Jim more experiences
2.1 Generating 3D objects
The generation of 3D objects is a common task in seeing a computer and graphics. Early business [Achlioptas et al. 2018; Dubrovina et al. 2019; Kluger et al. 2021] Focusing mainly on generating three -dimensional representation of models, such as polygons [Gao et al. 2022; Groueix et al. 2018; Kanazawa et al. 2018; Nash et al. 2020; Wang et al. 2018]PointClouds [Achlioptas et al. 2018; Fan et al. 2017; Nichol et al. 2022; Yu et al. 2023]Border [Hong et al. 2022; Jiang et al. 2022]Foxel [Choy et al. 2016; Sanghi et al. 2022; Wu et al. 2017; Xie et al. 2019]Or implicit fields [Chan et al. 2022, 2021; Cheng et al. 2023a; Gu et al. 2021; Jun and Nichol 2023; Li et al. 2023b; Mescheder et al. 2019; Park et al. 2019; Skorokhodov et al. 2022]That learn from a specific CAD database [Chang et al. 2015] Often limited by specific categories (for example, chairs, cars,
And so on) Because of the limited network capacity and diversity of data. Recently, with the rapid development in the widespread obstetric models, especially the great success in the prevalence models [Ramesh et al. 2022; Rombach et al. 2022; Saharia et al. 2022]Ways like Dreamfusion [Poole et al. 2022]SJC [Wang et al. 2023a] And their follow -up work [Chen et al. 2023a; Lin et al. 2023; Melas-Kyriazi et al. 2023; Raj et al. 2023; Seo et al. 2023; Tang et al. 2023a,b; Xu et al. 2023c] Try to cut the two -dimensional Priors from the process of reducing the removal using the loss of distillation samples (SDS) or variables, which directs the nervous reconstruction of each form after the user text claims. Although it is general to unlimited categories and various results consisting of instant engineering, these lines of work often suffer from unstable rapprochement due to the loud and inconsistent gradient signal, which often leads to inconvenient results or “Junus multi -facade problem” [Chen et al. 2023a]. After that, zero123 [Liu et al. 2023c] It analyzes the problem of biased the view of the general -dimensional inherent prevalence model (LDM), and suggests a specific LDM training with a relative view as a condition using objaverse datest [Deitke et al. 2023]This indicates promising results in the image tasks to 3D, and it has been widely adopted in 3D follow -up works [Liu et al. 2023d; Qian et al. 2023]. While it is seized on multiple images, Zero123 still has the problem of inconsistency through the display because the resulting images cannot fulfill the reconstruction requirements.
Thus, the works later like mvdream [Shi et al. 2023b]Syncreamer [Liu et al. 2023a]Zero123 ++ [Shi et al. 2023a] And Wond3d [Long et al. 2023] I suggest strengthening multi -closet images, which is trained with either stacked display methods [Long et al. 2023; Shi et al. 2023a,b] Or it builds simultaneous sizes online for the condition of the spread [Liu et al. 2023a]And it can usually be able to produce very fixed images or give three -dimensional reconstruction in a few seconds. Recently, LRM [Hong et al. 2023] And its changing methods [Wang et al. 2023b; Xu et al. 2023b] He suggested training a transformer -to -tored model, which directly produces nervous reconstruction that gives one or a few visible images. However, the methods of generating the current 3D objects focus mainly on the use of text claims (text to 3D) or images (image to 3D) as an input, which cannot accurately conveil the three -dimensional shapes or accurately control the generation in a three -dimensional way. On the contrary, our method first adds a 3D controlled control of the process of publishing Multivief without prejudice to the speed of obstetrics, which realizes the interactive workflow with the 3D agent as circumstances.
2.2 generation is controlled and interactive
Adding an accurate control to obstetric methods is very important to create productive content [Bao et al. 2023; Epstein et al. 2022; Yang et al. 2022a, 2024, 2022b, 2021]. Previous obstetric works [Bao et al. 2024; Chen et al. 2022; Deng et al. 2023; Hao et al. 2021; Melnik et al. 2024] Essentially learn to draw the latent maps of the features to add control to the generation, but they are limited to specific categories (for example, human faces or natural scene). The latest progress in the prevalence models [Zhang et al. 2023] And T2i-Edapter [Mou et al. 2023]Many of the two -dimensional photo hints (for example, depth, normal, submarine, human situations, color networks, etc.), to interactively control the process of reducing images. However, similar capabilities can be controlled [Bhat et al. 2023; Cohen-Bar et al. 2023; Pandey et al. 2023] In the 3D generation, it is far from applying. For 3D Editing, Modern Business [Cheng et al. 2023b; Li et al. 2023a] He suggested restricting the 3D generation, moved by the text in the required area, but it cannot support control in the form of fine engineering. For the 3D generation, it can [Metzer et al. 2023] And Fantasia3d [Chen et al. 2023a]. However, these two actions cannot guarantee steady rapprochement, and the results created are usually far from the 3D format (see Article 4.2), because it adds the naivety of controlling 3D representation regardless of changing the supervision of Priors 2D (i.e. a loss SDS).
Other works such as Control3d [Chen et al. 2023b] Add control only from 2D graphics/silhouettes instead of 3D space. Moreover, all of these methods require a long time of reconstruction (for example, from dozens of minutes to hours) to inspect the effect of editing or control, which cannot meet the demand for interactive modeling. On the contrary, our method is directly combined with three -dimensional control in the spread, which not only guarantees loyal and adjustable control over the 3D generation but also allows to examine the three -dimensional organism that has been interactive/edited interactively in a few seconds.
Authors:
(1) Wenqi Dong, from the University of Zhejiang, and this work was conducted during his training period in Pico, bytedance;
(2) BangBang Yang contributed equally to this work with Wenqi Dong;
(3) Lynn, Pitayans;
(4) Xiao Liu, Bitayans;
(5) Liwan Koi, University of Zhejiang;
(6) Hujun Bao, Zhejiang University;
(7) YueWen MA, Bytedance;
(8) Zhapeng Cui, composed of the University of Zhejiang.