【读论文】【泛读】三篇生成式自动驾驶场景生成: Bevstreet, DisCoScene, BerfScene

🕗 发布于 2024-04-21 08:35 自动驾驶 NeRF 计算机视觉

文章目录

1. Street-View Image Generation from a Bird’s-Eye View Layout
2. DisCoScene: Spatially Disentangled Generative Radiance Fields for Controllable 3D-aware Scene Synthesis
3. BerfScene: Bev-conditioned Equivariant Radiance Fields for Infinite 3D Scene Generation(Follow DisCoScene)

1. Street-View Image Generation from a Bird’s-Eye View Layout

1.1 Problem introduction

From the title of this paper, we know it bound a relation from Bev(Bird’s-Eye View) to Street view image.

在这里插入图片描述

Concretely, the input (Bev) is a two-dimensional representation of a three-dimensional environment from a top perspective. In the BEV diagram, squares of different colors represent different objects or road features, such as vehicles, pedestrians, lane lines, etc. And green square means an ego vehicle that has three cameras in front.

The task is to generate three street-view images aligned to the Bev according to the relative position among these square objects.

As for the concept of “layout”, it should consider the effects of these factors:

Cameras with an overlapping field-of-view (FoV) must ensure overlapping content is correctly shown
The visual styling of the scene also needs to be consistent such that all virtual views appear to be created in the same geographical area (e.g., urban vs. rural), at the same time of day, with the same weather conditions, and so on.
In addition to this consistency, the images must correspond to the HD
map, faithfully reproducing the specified road layout, lane lines, and vehicle locations.

1.2 Why

It is the first attempt to explore the generative side of BEV perception for driving scenes.

1.3 How

Methods

As shown in this pipeline, the Bev layout and source images were encoded as an input of the autoregressive transformer collaborating with direction and camera information to help the understanding of space. New mv-images were output.
Experiments
Three metrics are used.

FID represents the diversity and quality of generated images. Road mIoU and Vehicle mIoU can be used to represent the overlapping to verify the relative position in the Bev inputs.
Scene edit was achieved by the change of Bev layout:

1.4 My takeaway

How to utilize the ability of an autoregressive transformer!!! Why do we use it other than others?
I have known about what is Bev.

2. DisCoScene: Spatially Disentangled Generative Radiance Fields for Controllable 3D-aware Scene Synthesis

2.1 What

An editable 3D generative model using object bounding boxes without semantic annotation as layout prior, allowing for high-quality scene synthesis and flexible user control of both the camera and scene objects.

2.2 Why

Existing generative models focus on individual objects, lacking the ability to handle non-trivial scenes.
Some works like GSN can only generate scenes, without object-level editing. That is because of the lack of explicit object definition in NeRF.
GIRAFFE explicitly composites object-centric radiance fields to support object-level control. Yet, it works poorly on mixed scenes due to the absence of proper spatial priors.
Interesting refer:

17: Layout-transformer: Layout generation and completion with self-attention.

26: Layout-gan: Generating graphic layouts with wireframe discriminators.

58: Blockplanner: City block generation with vectorized graph representation.

2.3 How

在这里插入图片描述

Bounding boxes as layout priors to generate the objects, combined with the generated background were used in neural rendering. Meanwhile, an extra object discriminator for local discrimination is added, leading to better object-level supervision.

2.4 My takeaway

Is it possible to cancel the manually marked bbox and automatically identify and regenerate the corresponding area in Gaussian?

3. BerfScene: Bev-conditioned Equivariant Radiance Fields for Infinite 3D Scene Generation(Follow DisCoScene)

3.1 What

Incorporating an equivariant radiance field with the guidance of a BEV map, this method allows us to produce large-scale, even infinite-scale, 3D scenes via synthesizing local scenes and then stitching them with smooth consistency.

Understood as the superposition of patches in a bev:
在ddd

3.2 Why

Generating large-scale 3D scenes cannot simply apply existing 3D object synthesis techniques since 3D scenes usually hold complex spatial configurations and consist of many objects at varying scales.
Previous approaches often relied on scene graphs, facing limitations in processing due to unstructured topology.
DiscoScene introduces complexity in interpreting the entire scene and
faces scalability challenges when using Bbox.
BEV maps could specify the composition and scales of objects clearly but lack insights into the detailed visual appearance of the objects. Recent attempts like InfiniCity and SceneDreamer try to avoid the ambiguity of BEV maps, but they are inefficiency.

3.3 How

在这里插入图片描述

To integrate the prior information provided by the BEV map into the radiation field, the researchers introduced a generator $U$ , which can generate a 2D feature map based on BEV map conditions. Builder $U$ adopts a network structure that combines U-Net architecture and StyleGAN blocks.

3.4 My takeaway

Confused about how to use this U-Net, need some other time to supplement background knowledge. 🤡

原文地址：https://blog.csdn.net/weixin_62012485/article/details/137821890

免责声明：本站文章内容转载自网络资源，如本站内容侵犯了原著者的合法权益，可联系本站删除。更多内容请关注自学内容网（zxcms.com）！

深入剖析：.Net8 引入非root用户运行的新特性提升应用安全性
确保您的.NET应用程序能够以非root用户身份运行，不仅是一个好的安全实践，也是面向未来的必要步骤，毕竟未来的.NET版本与云原生技术的整合将会越来越紧密。如果攻击者能够利用应用中的漏洞或容器配置上
阅读更多2024-10-19
用Python保存PowerPoint演示文稿中的形状为图片
本文演示如何使用Python将PowerPoint演示文稿中的形状保存图像文件。
阅读更多2024-10-19
MySQL在5.6为啥引入索引条件下推
是 MySQL 在 5.6 版本引入的一项优化技术。它通过将某些查询条件推送到存储引擎层来减少回表操作，从而提高查询效率。通常在使用范围查询或多列索引时，当查询条件没有完全匹配最左列，MySQL 会进
阅读更多2024-10-19
西瓜书书本内容杂谈
西瓜书只能说快速过了一遍，花了一个多星期吧，然后后边的内容是一点也看不懂了（能发现前面记得比较详细，到了后边是看不懂一点了，脑壳痛QWQ就不像是一本新手的入门书籍，太难受了ಥ_ಥ，了解概念这本书也不适
阅读更多2024-10-19
【开源免费】基于SpringBoot+Vue.JS社区团购系统（JAVA毕业设计）
社区团购系统作为一种创新的商业模式，具有显著的社会可行性。首先，它通过集中采购和配送，有效降低了物流成本，使得消费者能够以更低的价格购买到优质的商品。其次，社区团购系统依托于社区内的居民，形成了一种基
阅读更多2024-10-19
C for Graphic：径向模糊
原理：获取中心点（centeruv）到当前像素（pixeluv）的朝向法向量（ndir），pixeluv沿着ndir进行向前向后的像素颜色采样，并叠加到当前像素颜色（pixelcolor）以pixel
阅读更多2024-10-19
【MR开发】在Pico设备上接入MRTK3（三）——在Unity中运行MRTK示例
在Pico的Unity开发工程导入MRTK3
阅读更多2024-10-19
AI 编译器学习笔记之七 -- 机器学习的应用
1、MeloTTS:
阅读更多2024-10-19
UNI VFX Missiles Explosions for Visual Effect Graph
它包含以下事件/效果：创建、循环、击中、结束和停止。在此情况下，您从“创建”事件开始，然后自动继续“循环”事件。每次波浪击中敌人时，您都会发送“击中”事件以显示相应的效果。当波浪结束时，发送一个“结束
阅读更多2024-10-19
Agent自动执行异常场景下，通过人工介入提高可用性|实在Agent研究
此时，人工介入成为关键，凭借人类的直觉、经验和全局视角，迅速定位问题并制定解决方案，有效避免错误扩大，显著提升系统的稳定性和可用性。可以看到Tars-Agent不再提示找不到启动程序路径了，而是切换成
阅读更多2024-10-19

【读论文】【泛读】三篇生成式自动驾驶场景生成: Bevstreet, DisCoScene, BerfScene

文章目录

1. Street-View Image Generation from a Bird’s-Eye View Layout

1.1 Problem introduction

1.2 Why

1.3 How

1.4 My takeaway

2. DisCoScene: Spatially Disentangled Generative Radiance Fields for Controllable 3D-aware Scene Synthesis

2.1 What

2.2 Why

2.3 How

2.4 My takeaway

3. BerfScene: Bev-conditioned Equivariant Radiance Fields for Infinite 3D Scene Generation(Follow DisCoScene)

3.1 What

3.2 Why

3.3 How

3.4 My takeaway

相关文章