Robotic Vision and 3D Imaging Technology
Introduction
Robotic vision relies heavily on 3D imaging technology to enable industrial robots to perceive their environment accurately. This technology can be categorized into optical and non-optical methods, with optical methods being the most widely used. These include Time-of-Flight (TOF), structured light, laser scanning, moiré fringe, laser speckle, interferometry, photogrammetry, laser tracking, shape from motion, shape from shading, and other Shape from X techniques. This article introduces several typical solutions.
1. Time-of-Flight 3D Imaging
TOF cameras capture the depth of an object by measuring the time difference of light travel for each pixel. In classic TOF measurement, the detector system starts timing when the light pulse is emitted. When the detector receives the light echo from the target, it stores the round-trip time directly. The target distance (Z) can be estimated using the following simple equation:
This ranging method, known as Direct TOF (DTOF), is typically used in single-point ranging systems, with scanning technology usually required for area-range 3D imaging. Non-scanning TOF 3D imaging technology has only been realized recently due to the difficulty of achieving sub-nanosecond electronic timing at the pixel level. Indirect TOF (ITOF) indirectly derives the round-trip time from the time-gated measurement of light intensity, making it the commercial solution for TOF cameras using electronic and optical mixers. TOF imaging is suitable for wide-field, long-distance, low-precision, and low-cost 3D image acquisition, featuring fast detection speed, large field of view, long working distance, and low cost but has low precision and is easily affected by ambient light.
2. Scanning 3D Imaging
Scanning 3D imaging methods include scanning ranging, active triangulation, and confocal dispersion. Confocal dispersion is a type of scanning ranging method widely used in the manufacturing industry of mobile phones and tablet displays.
Scanning Ranging
Scanning ranging achieves 3D measurement by using a collimated light beam to scan the entire target surface. Typical methods include single-point TOF (e.g., Frequency Modulated Continuous Wave (FM-CW) ranging and pulsed ranging (LiDAR)), laser scattering interferometry (e.g., multi-wavelength interferometry, holographic interferometry, white light interferometry), and confocal methods (e.g., dispersion confocal and self-focusing). Single-point scanning methods like TOF are suitable for long-distance scanning but have low measurement accuracy, generally in the millimeter range.
Active Triangulation
Active triangulation is based on triangulation principles, using collimated light beams or plane light beams to scan the target surface for 3D measurement. Common methods to obtain the light beam include laser collimation, cylindrical lens beam expansion, and non-coherent light projection. It can be categorized into single-point scanning, single-line scanning, and multi-line scanning. Commercial products for robotic end-effectors mostly use single-point and single-line scanners. Multi-line scanning often uses two groups of perpendicular light planes for high-speed alternating imaging, enabling "Flying Triangulation" scanning to create a high-resolution, dense 3D surface model.
Confocal Dispersion
Confocal dispersion can scan and measure rough and smooth opaque and transparent objects, such as reflective mirrors and transparent glass. It is widely used in 3D inspection of mobile phone cover plates. Types include single-point absolute distance scanning, multi-point array scanning, and continuous line scanning. Notable commercial products include the STIL MPLS180 and the FOCALSPEC UULA.
3. Structured Light Projection 3D Imaging
Structured light projection is currently the primary method for robotic 3D vision perception. The system consists of several projectors and cameras, with common configurations including single projector-single camera, single projector-dual camera, single projector-multi camera, single camera-dual projector, and single camera-multi projector setups. The projector illuminates the target object with specific structured light patterns, and the camera captures the modulated image. Image processing and visual models determine the 3D information of the target object. Common projectors include LCD projectors, Digital Light Processing (DLP) projectors, and laser LED pattern projectors.
4. Stereo Vision 3D Imaging
Stereo vision involves perceiving 3D structures using one or two eyes, typically reconstructing the 3D structure or depth information of the target object from images taken from different viewpoints. Depth perception cues can be divided into ocular and binocular cues. Current methods include monocular, binocular, multi-view, and light field imaging.
Monocular Vision Imaging
Monocular vision depth perception cues include perspective, focal length difference, multi-view imaging, occlusion, shadow, and motion parallax. Techniques like mirror imaging and other Shape from X methods can be used.
Binocular Vision Imaging
Binocular vision depth perception cues include eye convergence position and binocular disparity. In machine vision, two cameras capture images of the same target scene from different viewpoints, and the disparity between corresponding points is calculated to obtain the 3D depth information. The typical process includes image distortion correction, stereo image rectification, image registration, and triangulation-based disparity map calculation.
Multi-View Vision Imaging
Multi-view stereo imaging uses one or more cameras to capture multiple images of the same target scene from different viewpoints, reconstructing the 3D information. This method is mainly used in scenarios where multiple cameras capture multiple images, and algorithms like feature-based stereo reconstruction derive depth and spatial structure information. Shape-from-motion (SM) technology uses the same camera to capture multiple images from different viewpoints to reconstruct the 3D information of the target scene.
Light Field Imaging
Light field 3D imaging principles differ from traditional CCD and CMOS camera imaging. Light field cameras place a microlens array in front of the sensor plane, capturing the direction and position information of the light, enabling post-capture processing to adjust focus.
Comparison of 3D Imaging Methods for Robotic Vision
- TOF Cameras and Light Field Cameras: Compact and offer good real-time performance, suitable for Eye-in-Hand systems for 3D measurement, positioning, and real-time guidance. However, they are challenging to use for ordinary Eye-in-Hand systems due to low spatial resolution and 3D accuracy.
- Structured Light Projection 3D Systems: Offer moderate precision and cost, with promising market applications. They consist of multiple cameras and projectors and can be considered binocular or multi-view 3D triangulation systems.
- Passive Stereo Vision 3D Imaging: Gaining good application in the industrial field but has limited use cases. Single-eye stereo vision is challenging, and binocular and multi-view stereo vision require clear texture or geometric features on the target object.
For Eye-in-Hand systems, the optimal solution is to develop a low-cost, moderately accurate, passive monocular 3D imaging system.