Abstract
During accidents involving hazardous chemicals, people in the area may be put at risk of harm. (Semi-)autonomous robots can mitigate this threat by removing leaking containers. However, teleoperation requires extensive training and is difficult in practice. To overcome these limitations, we implemented a perception system on an autonomous excavator that locates individual barrels in chaotic scenes for extraction. Following the human-in-the-loop principle, operators can remotely select which barrel to remove. An efficient U-Net-style, DCAN-flavored neural network is trained using synthetic and collected real-world RGB data (5,000 synthetic and 593 real images) and compared to an inference-heavy Mask R-CNN model. In experiments on a leave-out test set, created from the excavator, our model yielded an ODS mIoU of 85.14% and mAP of 72.19%, while Mask R-CNN achieved an ODS mIoU of 86.6% and mAP of 84.31%. With roughly 0.00584s inference time on 800×576 32-bit tensors, our model is faster than Mask R-CNN with an inference time of roughly 0.0491s. Using the robot calibration data, the point clouds of multiple LiDAR sensors are fused with the RGB segmentation to find local cylinder models for each barrel, delivering the exact poses for extraction using the motion planner to find a collision-free motion plan. Force measurements were included in the gripper to avoid deforming the barrel. Field trials showed that the barrels can be reliably extracted without any damage.
Technical Information
This dataset provides roughly 5.000 synthetically generated images, 593 real images collected with a DSLR camera and 47 real images collected online from an autonomous excavator of chaotic barrel piles together with instance mappings according to these images. The structure separates the synthetic, real and online data, for each subset there is a 'RGB' folder containing the images and a 'Instance' folder containing the instance mappings (different colors for each barrel instance) in PNG format.