3-机器人视觉-机器人抓取与操作

🕗 发布于 2024-12-14 08:54 机器人 机器学习 python

3机器人视觉

1. 传感器和标定

摄像头模型

Pinhole Camera Model
3 coordinates
• World coordinate
• Camera coordinate
• Image coordinate (2D pixel)

在这里插入图片描述

Task:
• Given: pixel(u, v), and depth z;
• Compute: world coordinate （ $x\_w,y\_w,z\_w$ )

在这里插入图片描述

Intrinsic Matrix

T: 获取外界坐标到相机坐标

$f_x, f_y, c_x,c_y$
Distortion: S

c_x,c_y是像素分辨率的一半
• 假设相机传感器的宽度为36，高度为24，图像分辨率为6000*4000像素。如果相机的等效 35mm 焦距为50，则
在这里插入图片描述

Extrinsic Matrix

T：
• Model the transformation between camera coordinate to world coordinate
• Deal with “variable” frame (i.e. camera frame is moving)
在这里插入图片描述

标定

内参标定

• 内参矩阵K
• 畸变系数 $k_1,k_2,p_1,p_2,k_3$

工具：
• ROS，OpenCV（内置工具进行标定）

对棋盘格不同角度拍照测量
• 张氏标定法

在这里插入图片描述

手眼标定和外参标定

手眼标定
• 工具：ROS，OpenCV
• 流程（eye-in-hand）：
• 机械臂移动到不同位姿对标定板拍照
• 记录机械臂法兰位姿和对应的图片

眼在手上，求夹爪在相机坐标系下位姿
眼在手外：base_link在camera_link但坐标系下单位姿
在这里插入图片描述

机器人内参标定：

• 机器人内参误差一般小于摄像头外参带来的误差
• 相关产品需要验证机器人的内参误差（需要工程化验证）
• 标定方法
• 测量：一般为激光跟踪仪或者拖动到特定位置
• 算法：POE或者DH参数后，构建参数迭代

可参考：https://www.universal-robots.com/articles/ur/robot-care-maintenance/kinematic-robot-calibration/

手眼的实践问题(基于RGB-D的测量)
• 用于经典的pipeline
• 误差源较多（机器人内参，摄像头内外参，机器人工具到法兰，摄像头深度和RGB测量等），较难分离；
• 标定和验证流程较长，长期使用中出现精度下降问题比较难定位
• 用于数据生产，训练的模型可能会有硬件依赖问题

在这里插入图片描述

Depth摄像头：
• 结构光（Structured Light）：干扰，室外
• 飞行时间（Time of Flight, ToF）：干扰
• 双目视觉（Stereo Vision）：低纹理
Depth 信息：
• Pointcloud and depth image
• 有缺失值

在这里插入图片描述
Depth-RGB位置关系和标定
• 结构光，标定IR和RGB位置
其它：
• Pointcloud信息可以直接用于识别分割等任务
• RGB的识别任务，需要做2D到3D的投影

深度摄像头问题
实践中-深度缺失&不准问题
• 材料，光照，边缘
• 人体头发深度
• 玻璃深度
p 实际量产场景
• 入厂测试，
• 功能&参数测试
• 系统测试
• 供应商问题

力传感器&其它传感器

末端力传感器
• 末端力控

单轴,6轴或者3轴
• 安装和使用
在这里插入图片描述

关节扭矩传感器
• 电流估计，电磁式，应变片式
• 可以用于关节力控
• 可以用于末端六维力估计和末端力控（可靠性和性能相对不如直接测量）
在这里插入图片描述

实践中力传感器问题
• 零飘，异常数据等

其它传感器

• 编码器
在这里插入图片描述
电机侧编码器：18位一一2^18=262144
输出端多圈绝对值编码器：19位一一2^19=524288
关节减速比：1:101
问关节转90°，输出端编码器数值及电机转了多少度？

$输出端编码器数值=(90/360)*2^19$
电机转了 101*90=9090度

关节位置信息
• 触觉传感器
在这里插入图片描述

2. 神经网络和图像处理

在这里插入图片描述

优化视角
• 寻找最优网络参数组合，使得训练数据中的loss最小化；
主要元素
• 网络结构
•
特征处理和任务头
• 数据集和dataloader
• Loss function and optimizer
• Training and inference
在这里插入图片描述

2D特征处理

p Conv2d
在这里插入图片描述
这个关于维度的变换图经常在图像卷积中用到

在这里插入图片描述

p MLP(Linear)
p Other layers
• Pooling
• Activation
Max pooling
• normalization

在这里插入图片描述

在这里插入图片描述
一幅搞笑图片

Normalization

可以参考一文弄懂Batch Norm / Layer Norm / Instance Norm / Group Norm 归一化方法
在这里插入图片描述

常见架构

CNN
• Residual bock
• U-net
在这里插入图片描述

Transformer：
• ViT

训练流程

• 1 准备数据集
• 2 准备模型
• 3 准备Loss函数和优化器
• 4 训练循环（模型评估）
• 4.1 optimizer.zero_grad()
• 4.2 outputs = model(images)
• 4.3 loss = criterion(outputs, labels)
• 4.4 loss.backward()
• 4.5 optimizer.step()

在这里插入图片描述

推理流程

• 1 读取图像
• 2 加载模型参数
• 3 模型forward推理

训练流程

在这里插入图片描述

部署流程

在这里插入图片描述

2D 图像任务

p 常见任务
• 分类
• 检测
• 分割
• 其它：生成，人脸，OCR，抠图，降噪，检索等
p 机器人相关：
• 位姿估计和追踪
在这里插入图片描述

3D Point Cloud Feature

在这里插入图片描述
注意这里会用一个T-Net生成转换矩阵，这是考虑到点云在空间的坐标变换。

PointNet 示例
在这里插入图片描述

pointNet++
推荐阅读：https://zhuanlan.zhihu.com/p/266324173
关注局部范围的point
在这里插入图片描述
Autonomous Driving Prediction and ML Planning:
• PointNet for subgraph feature extraction

PointNet Application

Autonomous Driving Prediction and ML Planning:
• PointNet for subgraph feature extraction
在这里插入图片描述

在这里插入图片描述

3. 3D位姿估计

Vision-based robotic grasping from object localization, object pose estimation to grasp estimation for parallel grippers: a review

位姿估计数据集

在这里插入图片描述

BOP: Benchmark for 6D Object Pose Estimation

 BOP Tookit
 BOP Dataset：
https://bop.felk.cvut.cz/datasets/
可以直接使用huggingface cli和toolkit来准备相关数据集
部分数据集略有改动
Data: depth, rgb, model, camera_info, mask
 BOP Format：
• https://github.com/thodan/bop_toolkit/blob/master/docs/bop_datasets_format.md
 Leaderboard
在这里插入图片描述

YCB数据集

 21 YCB objects captured in 92 videos.
 常见物品, 在超市能够买到
 扩展数据集：
• DexYCB，YCB Affordance
• YCB-Sight: A visuo-tactile dataset (视觉和触觉两种模态数据集)
在这里插入图片描述

位姿估计指标

ADD和ADD-S较为常用.
 Visible Surface Discrepancy (VSD)
 Maximum Symmetry-Aware Surface Distance (MSSD)
 Maximum Symmetry-Aware Projection Distance (MSPD)
 average point distance (ADD): ref
 average closest point distance (ADD-S)
 其它：
• Intersection-over-Union (IoU) 3D
在这里插入图片描述

 BOP评估方式
• 对VSD，MSSD，MSPD增加threshold
• 然后对相关数据集计算AR，然后取平均
在这里插入图片描述

传统方法

SFM(Structure From Motion )
在这里插入图片描述

在这里插入图片描述

 2D Image
针对点匹配错误的问题
• PnP (Perspective-n-Point (PnP) )
• PnP+RANSAC（Random sample consensus）: ref

OpenCV solvePnPRansac()
在这里插入图片描述

3D PointCloud
• ICP
• pcl::IterativeClosestPoint

ICP Algorithm: Theory, Practice And Its SLAM-oriented Taxonomy
在这里插入图片描述

Instance-level Pose Estimation

 Correspondence based method
 Template-Based Methods
 Voting-based & Regression-Based Method
在这里插入图片描述

PoseCNN

Pose estimation with RGB Input
[]Model
Feature Extraction
Segmentation
Center point prediction
Rotation and translation regression
在这里插入图片描述

在这里插入图片描述

Task and Loss：
• Segmentation
• Center point prediction
* regress to the center direction for each pixel
* Hough voting
• Transformation prediction:
- PLoss: pose loss
- SLoss: shape match loss
计算两种loss,位姿和形状匹配
在这里插入图片描述

Model

Feature Extraction
Segmentation
Center point prediction
Rotation and translation regression

center point ,预测 x,y的方向, Td
在这里插入图片描述

DenseFusion

Pose Estimation with RGB and Depth Image
在这里插入图片描述

Feature:
• 在分割的物体上，通过CNN和PointNet的编码器分别提取图像和点云特征
• 在像素坐标下做特征融合（concat）
• 提取全局特征
• 全局特征和局部特征融合（concat）
在这里插入图片描述

 Head
• Translation, rotation, confidence
 Loss
 Pose Refinement
• Pose residual estimator

在这里插入图片描述

YOLO6D

 Simple and Fast
 feature extraction
• CNN
 Detection architecture
• Prediction 8 bbox points + 1 center point; and Class
• PnP for 3D estimation uses 9 control point correspondences
在这里插入图片描述
\

Category-level

Example application of category level perception:

Object detection in autonomous driving:
- Hierarchical categories:
- Car – Truck, SUV, Sedan, etc;
   Category-level pose estimation
  • 针对同类物体估计位姿（例如，杯子）
  • generalizing to objects within established categories

Category-level pose estimation
• 针对同类物体估计位姿（例如，杯子）
• generalizing to objects within established categories

NOCS:
https://github.com/hughw19/NOCS_CVPR2019
• Represent a category of objects
• Normalized Object Coordinate Space
•Predict NOCS map (x, y, z)
在这里插入图片描述

Data generation:

Mixed Reality data generation
- Real background with sim object
- Rendering with different lighting

在这里插入图片描述

Unseen Object Pose Estimation

 Input: CAD model, reference image
• No training on novel object
• Non-like category-level which requires training on category,
and alignment if using NOCS
 Traditional:
• template-based, or feature-based method
 Foundation Model
在这里插入图片描述

Foundation Model

 Foundation Pose
 SAM-6D
 FreeZe
在这里插入图片描述

在这里插入图片描述

Foundation Pose

https://github.com/NVlabs/FoundationPose/tree/main/learning/models

 Input:
• model-based, where a textured 3D CAD model of the object is
provided;
• model-free, where a set of reference images of the object is provided
 Good Performance in these tasks
• model-based, model-free;
• Pose estimation, pose tracking
在这里插入图片描述

 Pose generation data pipeline
• Hierarchical LLM from data generation

LLM Prompt for object description
LLM description for texture generation
• Physics engine for rendering

Pose Estimation for Grasping

 6D object pose estimation
 Grasping pose generation
 Pre-grasping pose
 Path planning and trajectory generation

REF

https://www.shenlanxueyuan.com/course/727/task/29418/show

 Deep Learning-Based Object Pose Estimation: A Comprehensive Survey; “https://github.com/CNJianLiu/Awesome-Object-Pose-Estimation”
 Vision-based Robotic Grasping From Object Localization, Object Pose Estimation to Grasp Estimation for Parallel Grippers: A Review
 Challenges for Monocular6D Object Pose Estimation in Robotics
 BOP: Benchmark for 6D Object Pose Estimation
 FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects
 DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion
 Normalized Object Coordinate Space for Category-Level 6D Object Pose and Size Estimation
 PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes
 Real-Time Seamless Single Shot 6D Object Pose Prediction
 Computer Vision: A Modern Approach
 https://deeprob.org/w24/projects/project3/, Project 3 PoseCNN

原文地址：https://blog.csdn.net/qq_37087723/article/details/143752398

免责声明：本站文章内容转载自网络资源，如本站内容侵犯了原著者的合法权益，可联系本站删除。更多内容请关注自学内容网（zxcms.com）！

上一篇：苹果开发者入门：修复 SwiftUI 中“跑偏的”动画（下）
下一篇：【速览】设计模式（更新中）

ORACLE逗号分隔的字符串字段，关联表查询
【代码】ORACLE逗号分隔的字符串字段，关联表查询。
阅读更多2024-12-14
传输层7——TCP拥塞控制（重点！！！）
透彻理解TCP实现可靠传输的实现原理，以及实现的全过程。
阅读更多2024-12-14
低代码开发：企业数智化转型中的关键角色与实践案例分析
此外，整体供应链成本降低了25%，这不仅包括了直接的物流成本，还包括了由于库存积压减少而节约的仓储成本，以及由于订单处理效率提高而节约的人力成本等间接成本。通过提供快速、高效、低成本的应用开发能力，低
阅读更多2024-12-14
【JAVA-JNA】如何通过pid找到窗口句柄，然后把窗口置顶0.5.0
【JAVA-JNA】如何通过pid找到窗口句柄，然后把窗口置顶0.5.0
阅读更多2024-12-14
certbot 服务器证书配置
certbot + nginx 服务器证书配置
阅读更多2024-12-14
BFS算法题
正常来说，在我们会了单源BFS的使用后，面对多个起点到一个终点的最短路问题也就是多源BFS，我们最先想到的就是暴力做法，也就是将多个起点分成一份份一个起点到一个终点的单源BFS问题，这样我们每个起点到
阅读更多2024-12-14
ESP32-S3模组上跑通ES8388（30）
ESP32-S3模组上跑通ES8388（30）
阅读更多2024-12-14
搭建Tomcat（二）--反射的应用
上面是一些官方回答，那么究竟是怎么判断的呢？简单而言，tomcat想要确定请求访问的是动态资源还是静态资源，先从动态资源中查找（存在动态资源映射表），若是能从中查到，则返回动态资源，若是不能从动态中匹
阅读更多2024-12-14
删除MySQL的多余实例步骤
删除MySQL配置文件的过程相对简单，但需要谨慎操作。通过定位、删除和验证这三个步骤，我们可以安全地删除不再需要的MySQL配置文件。
阅读更多2024-12-14
HarmonyOS NEXT开发进阶（三）：自定义组件
🔊：自定义组件必须使用struct定义，并且被Component装饰器修饰。在arkTs根组件：就是被装饰器@Entry装饰的入口组件，这也是自定义组件(父组件)。// 根组件(父组件) @ Entr
阅读更多2024-12-14

3-机器人视觉-机器人抓取与操作

文章目录