Abstract
Gaining understanding of process-structure-property relationships in materials at a mechanistic level relies on correlative microscopy workflows. These workflows, in turn, fundamentally depend on image matching, i.e., a computer vision task with the objective of finding point correspondences in pairs of images. Matching models are difficult to evaluate quantitatively in the materials field due to a shortage of representative benchmark datasets. Nonetheless, a few small-scale studies indicate that traditional rule-based image matching techniques such as surface-invariant feature transform currently fall short on such matching tasks. We present a dataset for cross-modal image matching and data fusion in the materials microscopy domain, which we coin AmalgaMatch, to support model benchmarking and fine-tuning efforts. All images are micrographs captured using the most widely applied imaging techniques in materials science including light-optical, scanning electron, and transmission electron microscopy, as well as electron backscatter diffraction. Therein, various detectors and imaging modes are employed to capture micrographs of diverse materials. While the majority of images are raw images, some underwent typical processing routes using digital image correlation or EBSD indexing. Common regions in image pairs are populated with hand-annotated keypoint correspondences. While mutual information is limited in cross-modal, multi-scale image pairs, we relied on characteristic defects such as dislocations, grain boundaries, triple junctions, inclusions, pores or surface features for annotation. Furthermore, the dataset is divided into groups and subsets with distinct registration tasks, materials, and/or, imaging configurations representing splitting criteria. The dataset covers many typical use cases for image matching in materials science. In total, it comprises 19 subsets with 35 scenes and 187 annotated image pairs to support autonomous multi-modal materials data fusion. For each image, we provide metadata to facilitate training of hybrid matching models which process textual alongside image-based inputs to improve the matching quality and robustness.