IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022
Learning to autonomously assemble shapes is a crucial skill for many robotic applications. While the majority of existing part assembly methods focus on correctly posing semantic parts to recreate a whole object, we interpret assembly more literally: as mating geometric parts together to achieve a snug fit. By focusing on shape alignment rather than semantic cues, we can achieve across category generalization and scaling. In this paper, we introduce a novel task, pairwise 3D geometric shape mating, and propose Neural Shape Mating (NSM) to tackle this problem. Given point clouds of two object parts of an unknown category, NSM learns to reason about the fit of the two parts and predict a pair of 3D poses that tightly mate them together. In addition, we couple the training of NSM with an implicit shape reconstruction task, making NSM more robust to imperfect point cloud observations. To train NSM, we present a self-supervised data collection pipeline that generates pairwise shape mating data with ground truth by randomly cutting an object mesh into two parts, resulting in a dataset that consists of 200K shape mating pairs with numerous object meshes and diverse cut types. We train NSM on the collected dataset and compare it with several point cloud registration methods and one part assembly baseline approach. Extensive experimental results and ablation studies under various settings demonstrate the effectiveness of the proposed algorithm.
Figure: Dataset overview. (Left) Our dataset is composed of object meshes from three categories. (Middle) We define five different types of cut functions. Each object mesh can then be cut with many different ways using varying parametric cut functions. (Right) Finally, each pair of parts can be randomized with an initial SE(3) pose. In our dataset, we also generate solid and shell/hollow variations of each shape, when cutting a mesh to create different mating interfaces for the same problem instance.
Table: Experimental results of geometric shape mating. R and T denote rotation and translation, respectively. Lower is better on all metrics. It is worth noting that many methods can get reasonably close in position, but be completely off in orientation as demonstrated by the RMSE error in rotation.
Figure: Visual results of pairwise 3D geometric shape mating. NSM accurately predicts 3D poses that assemble the two input shapes.
Table: Generalization: Unseen category geometric shape mating. R and T denote rotation and translation, respectively. Lower is better on all metrics. The test set contains shape pairs from the box and bag categories. The training set contains shape pairs from the remaining 9 categories.
Table: Generalization: Unseen cut type geometric shape mating. R and T denote rotation and translation, respectively. Lower is better on all metrics. The training set contains the planar, sine, square and pulse cut types. The test set contains the parabolic cut type.