Visual SLAM with MATLAB
Visual simultaneous localization and mapping (SLAM) is a technological process that empowers robots, drones, and other autonomous systems to create maps of an unknown environment while simultaneously pinpointing their position within it. This technology is seen in many different applications, from steering autonomous vehicles through unknown areas, to enhancing robotic interaction, and even creating immersive augmented reality experiences.
Learn about features from Computer Vision Toolbox™ that leverage class objects, streamlining the development and deployment of visual SLAM projects. These new class objects feature real-time capabilities, increasing the pace of user workflows. In addition, these class objects are designed to cater to different hardware types, including monocular, stereo, and RGB-D cameras. With these new features and a new example, Computer Vision Toolbox provides its users with more tools for building the future of visual SLAM.
Published: 18 Apr 2024
Visual SLAM technology enables observers to localize their position and map their environment in real-time. To enhance your experience with such applications, MATLAB R2024a focuses on refining visual SLAM performance.
Visual Simultaneous Localization And Mapping, commonly abbreviated as visual SLAM, is a technological process that enables a robot, drone, or other autonomous system to construct a map of an unknown environment while simultaneously tracking its own location within that space. It uses visual data, typically gathered from one or more cameras, to identify distinct features in the environment. By tracking the movement of these features between different frames, visual SLAM algorithms can infer the system's trajectory and build up a consistent map.
Applications of visual SLAM can be found in many modern-day scenarios. Autonomous vehicles use cameras to safely navigate populated roads. Robotic mechanisms interact with and move through their surroundings. Even augmented reality requires a precise understanding of the surrounding environment to overlay digital information on the real world. Visual SLAM is valuable for localization when other means, like GPS, are unreliable or unavailable.
Within visual SLAM, one specific approach is monocular SLAM. Monocular refers to the use of a single camera to capture all the visual data needs for localization and mapping. Implementing visual SLAM with a monocular camera is particularly appealing due to its compact hardware and cost effectiveness, making it perfect for applications with size or budget constraints.
An example of monocular applications was first introduced in R2020a in R2024a, the Computer Vision Toolbox introduced a new class designed specifically for running monocular vision workflows in real-time. This class, called the monovslam class, is designed to streamline your visual SLAM development and deployment with efficiency and ease.
Monovslam can be set up using your camera's intrinsic parameters and then fed a few key image frames, which will allow it to query camera trajectory and run calculations for mapping points. All of the data can be visualized throughout the process, allowing for easy monitoring and data comprehension as successive camera frames track 3D position in real-time. This class substantially increased execution speed, enabling real-time visual SLAM workflows to become a reality.
Observe the difference in the processing rate between the previous and current implementations as the camera moves along its trajectory. Using monovslam, the points are plotted on the 3D axis faster than before.
Building from the improvements in performance using monovslam, R2024a also introduces a practical example that demonstrates the integration of visual slam with ROS in MATLAB. This example provides a detailed walk-through for developing and deploying a visual SLAM system, showcasing the transition from simulation to real-world application. It offers insight into leveraging the new monovslam class object alongside ROS modular framework to efficiently implement visual SLAM solutions using a monocular camera.
In R2024a, the Computer Vision Toolbox supports stereo and RGB-D cameras for enhanced 3D mapping with the advantage of direct depth measurement over monocular SLAM. The stereo SLAM class object, stereovslam, processes feeds from dual cameras to mimic human vision and depth perception, while the RGB-D SLAM class object, rgbdvslam, combines readings from a camera with a depth sensor, improving mapping precision in dynamic environments.
The new monovslam class creates new opportunities for visual SLAM class objects with the ability to handle higher frame rates and support for a wider range of camera types in just a few lines of code. Computer Vision Toolbox continues to accommodate new workflows and improve capabilities within real-time visual SLAM. We're excited to see what you do with these new capabilities from R2024a.