Abstract
This paper introduces ForametCeTera, a pioneering dataset designed to address the challenges associated with automating the analysis of benthic foraminifera in sediment cores. Foraminifera are sensitive sentinels of environmental change and are a crucial component of carbonate-denominated ecosystems, such as coral reefs. Studying their prevalence and characteristics is imperative in understanding climate change. However, analysis of foraminifera contained in core samples currently requires washing, sieving and manual quantification. These methods are thus time-consuming and require trained experts. To overcome these limitations, we propose an alternative workflow utilizing 3D X-ray computational tomography (CT) for fully automated analysis, saving time and resources. Despite recent advancements in automation, a crucial lack of methods persists for segmenting and classifying individual foraminifera from 3D scans. In response, we present ForametCeTera, a diverse dataset featuring 436 3D CT scans of individual foraminifera and non-foraminiferan material following a high-throughput scanning workflow. ForametCeTera serves as a foundational resource for generating synthetic digital core samples, facilitating the development of segmentation and classification methods of entire core sample CT scans.
Similar content being viewed by others
Background & Summary
Benthic foraminifera are unicellular organisms characterised by a calcium carbonate shell. They are responsible for about 20% of global carbonate production1. In carbonate-dominated environments, such as coral reefs, foraminifera are important contributors to the production of sediment2,3. Combined effects of global climate change impair carbonate production in these environments4,5,6, which is critical for low-lying tropical islands to withstand sea-level rise7,8. Foraminifera are also sensitive sentinels of environmental change9. As such, they act as proxies to infer data on both long and short-temporal scales. For example, the species composition of an assemblage of foraminifera is related to environmental change10,11, thus analysing the gradient of such a composition by sampling over time12 or by observing a sediment core13, shows temporal trends in habitat quality. Monitoring foraminifera is thus imperative in improving our understanding of the response of these sensitive systems to climate change.
Currently, collected sediment cores are washed and sieved to separate the foraminifera from the surrounding material. This is followed by manual partitioning, identification and quantification of the foraminifera (see Fig. 1). This is a time-consuming process that requires expert knowledge. As such, there has been active research in automating (parts of) this procedure. Such methods involve taking photographs of sieved and washed foraminifera and using machine learning to subsequently classify the imaged foraminifera. This is done by defining hand-crafted features14 or by learning the relevant features for classification based on the whole image of individual foraminifera15,16,17. Several efforts have incorporated 3D features by photographing specimens under different lighting conditions14, at different focal planes18 or both17. Despite these advances in automation, preparation of the sediment core sample (e.g. washing, sieving and sorting) remains a bottleneck. Moreover, foraminifera stuck in hardened consolidated sediment are difficult to obtain and classify in this manner16,18.
We envision an alternative workflow, in which 3D X-ray computational tomography (CT) is used to further automate the procedure. While CT scanning has been widely employed to analyse individual foraminifera19,20,21,22,23,24,25, our workflow is based on scanning the whole core before washing and sieving. The resulting 3D image can then be processed digitally to separate the individual foraminifera contained within the core sample and classify them using advanced 2D and 3D machine learning techniques. A schematic depiction of the envisioned procedure is shown in Fig. 2.
The methods needed to segment and classify individual foraminifera from such 3D scans do not exist yet. To enable the development of such methods, high-quality labelled training data are needed. While there exist some datasets of CT scans of (individual) foraminifera species19,20,26, they are not suitable for developing and training methods for separating and classifying CT scans of whole core samples, as the data is lacking in magnitude, diversity and consistency. One recent, notable CT scan dataset25 improves in terms of (planktic) species diversity and dataset magnitude but still lacks in size for training machine learning algorithms. We aim to address these shortcomings with ForametCeTera; a dataset consisting of 436 3D CT scans of 288 individual benthic foraminifera and 148 bits of non-foraminiferan material. ForametCeTera’s data can be used as building blocks to generate synthetic digital core samples, which in turn can be used to develop methods for segmenting and classifying CT scans of core samples. We also demonstrate a high-throughput, specimen-agnostic scanning procedure suitable for the aforementioned task which can be used to rapidly build upon the dataset’s breadth and depth with additional microfossils. It is the first time that a dataset of foraminifera of this scope has been collected. We believe it will be helpful for method development and validation.
The remainder of the paper is organised as follows. A detailed description of how the dataset was acquired is given in the Methods section. This is followed by a detailed description of the dataset itself, how it was validated, and how it can be used.
Methods
To create the dataset, we carefully selected samples of individual foraminifera and bits of non-foraminiferan material. We will refer to either individual foraminifera or non-foraminiferan material as specimens. The specimens were sourced from core samples acquired by the Naturalis Biodiversity Center and split up into foraminifera species and a residual group of non-foraminiferan material. Each collection of specimens was put in a tube with a filling medium. Each tube was then CT scanned (creating a group-scan) with consistent scanning parameters deemed most suitable after performing trial scans of all specimens. A summary of this process is shown in Fig. 3. ForametCeTera contains 11 group-scans, that were segmented into a total of 436 individual specimens; 288 foraminifera and 148 bits of non-foraminiferan material. This process took place over a period of 2 months. The scope of the dataset encapsulates the raw CT projections of the group-scans, reconstructed 2D cross-sectional data of the group-scans, reconstructed 3D group-scans, segmented 3D specimens and all scripts that were used throughout this process.
Sample selection
The samples from which the specimens were retrieved, were collected in Makassar, Indonesia and Espiritu Santo, Vanuatu. A detailed overview of their metadata is shown in Table 1. These specimens have previously been used in academic and educational activities and have therefore been cleaned. However several specimens had clumps of sand sticking to them. Taking care to keep sandy samples separate, all specimens were split and grouped by their respective species and region. Non-foraminiferan material was separated too. The resulting collection of specimens is described in Table 2. Several microscope images of these specimens are shown in Fig. 4.
Sample preparation
In our testing, it was found that an adequate CT scan takes at least half an hour. It would be very time-consuming to perform such a scan for each individual specimen. As such, it was favoured to scan several specimens at once, creating the previously mentioned group-scans. This group-scan can then be segmented to obtain individual specimens. The successful application of segmentation algorithms requires the specimens to be adequately spaced. To achieve this, a filling medium was needed that fulfilled several requirements; 1) having minimal overlap in density to the specimens, 2) being relatively homogeneous and 3) given the value and utility of foraminifera, being separable from the specimens when dismantling the samples. The candidate filling media were sugar and coffee creamer as they both dissolve in water (requirement 3) and are readily available. After creating scans with both media, sugar was found to be both less homogeneous and have density overlap with the specimens. Subsequently, coffee creamer was the selected filling medium.
For each group-scan, the selected specimens and filling medium were put in a plastic tube (11.6 mm inner diameter, 50.7 mm inner height, 1.5 mm thick) with a screw cap. Specimens at the bottom of the tube are at risk of being poorly captured as they are near the thicker bottom of the tube and mounting equipment. As such, a 2 cm cylindrical piece of Styrofoam was first placed in the tube, raising the tube contents from the bottom. Subsequently, a teaspoon of filling medium was dispensed on a piece of paper with a crease through the middle. The specimen of choice was then disposed onto the filling medium and mixed with it. The mixed medium and specimen materials were poured into the tube following the aforementioned crease. Upon X-ray inspection, it may turn out that the specimens are poorly distributed throughout the filling medium. In such cases, simply shaking the tube can effectively redistribute the specimens throughout the medium.
Scanner setup and parameters
To perform the CT scans, a Neoscan N80 FP micro-CT scanner was used, located at the Naturalis Biodiversity Center. The scanner contains a microfocus X-ray source (limited to 110 kV, 16 W), an active pixel CMOS flat-panel X-ray detector (7 Mp) and, in between these, a stage where the sample is mounted, capable of axial rotation and 3D translation27.
The CT scanner is operated using Neoscan’s accompanying software: Neoscan80, version 3.0.2. All scans were performed with a resolution of 15 μm, using a 0.5 mm aluminium (Al) filter. Testing revealed that a different filter may enhance the image contrast of some specimens as, from certain angles, X-rays were too attenuated by the filter and the specimens. However, as one of the downstream aims of ForametCeTera is to generate synthetic samples, scanning with different parameters would introduce per-specimen biases. This particular filter turned out to be suitable for most specimens.
Prior to each scan, the CT scanner flat-field was automatically calibrated using the Neoscan80 software. For each scan, the object made a full 360° rotation with images captured at 0.2° increments resulting in 1801 projections, including a final overlapping projection. The exposure time was 94 ms and 4 averaging frames were used. The X-ray source was set to 67 kV and 200 μA. Projections were captured at the highest possible resolution of 2400 × 2752 pixels. In case the region of interest exceeded the vertical field of view of the scanner, multiple scans were ‘stitched’ together using the oversize scanning feature of Neoscan80.
Data acquisition
The projection data which resulted from the group-scans was reconstructed to create 2D cross-sectional images using the Neoscan80 software. Only intensity values between 0.09 and 0.9 were retained. These values are of a unitless quantity proportional to material density. The 2D cross sections were exported as a stack of lossless-compressed 8-bit .png files. These stacks were converted into lossless group-scan 3D data by means of a Python script (stack.py, see Code availability), producing .nrrd files, a file format for n-dimensional raster data28. In the 3D data, the foraminifera are separated from the filling medium based on a voxel intensity threshold. This threshold was found empirically and tuned to strike a balance between preserving foraminiferan material whilst completely removing the filling medium. After checking the resulting disconnected components for segmentation faults, the individual specimens are exported as .nrrd files. This segmentation procedure is implemented in another Python script (segment.ipynb, see Code availability).
Further expansion
ForametCeTera is a diverse 3D dataset yet further expansion would enhance the robustness and generalisability of (machine learning-based) classifiers. Diversity can be expanded by scanning specimens from different species, regions and depths. Additionally, scans of unprocessed samples would enable the testing of trained classifiers on real-world data that was minimally pre-processed as envisioned in the proposed procedure (see Fig. 2).
Data Records
ForametCeTera is publicly available as a .zip file at Zenodo29 of about 4.4 GB. This file contains the 2D reconstructed group-scan cross-sections, the 3D group-scan data and the individual, segmented 3D specimens. The raw projection images of the group-scans are available upon request as the data is relatively large (332 GB uncompressed). An overview of the data and its metadata is given in Tables 1 and 2. The way the dataset is structured is shown in Fig. 5. The Group_scans folder contains the 3D reconstructions of the group-scans, the Specimens folder contains the segmented 3D specimens i.e. the segmented group-scans, the Reconstructions folder contains the reconstructed 2D cross-sectional data and a file output by Neoscan80 on the scanning parameters per scan. Specimen indices in the Specimens folder may jump due to touching specimens that were erroneously segmented as one and subsequently not exported. For the sake of reproducibility, the indices were not corrected for this. <ID a> placeholders refer to the IDs as shown in the Dataset ID column in Table 2.
Technical Validation
The Neoscan CT scanner undergoes regular maintenance and calibration. Prior to each scan, the flat-field reference is updated by the Neoscan software. The reconstructed 3D data has been examined to identify any oddities in the captured intensity values. The segmented specimens have been checked on intensity oddities and on segmentation faults.
Usage Notes
The 3D data can be viewed using, for example, the open-source program 3D slicer (https://www.slicer.org/). To perform analysis, Python was used in conjunction with several packages like pynrrd to load the data and numpy, scipy.ndimage and scikit-image to perform further analysis. For machine learning endeavours, the TorchIO or rising (exclusive to PyTorch) libraries may be used.
Code availability
All code used in this paper is available at https://github.com/JLuij/ForametCeTera_scripts. This includes the script to convert a stack of .png cross-section images to an .nrrd file (stack.py), the script for segmenting the group-scan volumes (segment.ipynb) and the script for performing the technical validation (technical_validation.ipynb). As these scripts make use of several packages, an environment.yml file is included to reproduce the conda environment, together with more instructions in the repository’s Readme.md file.
References
Langer, M. Assessing the Contribution of Foraminiferan Protists to Global Ocean Carbonate Production. Journal of Eukaryotic Microbiology 55, 163–169, https://doi.org/10.1111/j.1550-7408.2008.00321.x (2008).
Dawson, J., Smithers, S. & Hua, Q. The importance of large benthic foraminifera to reef island sediment budget and dynamics at Raine Island, northern Great Barrier Reef. Geomorphology 222, 68–71, https://doi.org/10.1016/j.geomorph.2014.03.023 (2014).
Doo, S., Hamylton, S., Finfer, J. & Byrne, M. Spatial and temporal variation in reef-scale carbonate storage of large benthic foraminifera: a case study on One Tree Reef. Faculty of Science, Medicine and Health - Papers: part A 293–303, https://doi.org/10.1007/s00338-016-1506-0 (2017).
Perry, C. T., Spencer, T. & Kench, P. S. Carbonate budgets and reef production states: a geomorphic perspective on the ecological phase-shift concept. Coral Reefs 27, 853–866, https://doi.org/10.1007/s00338-008-0418-z (2008).
Browne, N. K. et al. Predicting Responses of Geo-ecological Carbonate Reef Systems to Climate Change: A Conceptual Model and Review. In Hawkins, S. J. et al. (1st edn.) Oceanography and Marine Biology, 229–370, https://doi.org/10.1201/9781003138846-4 (CRC Press, Boca Raton, 1st edn, 2021).
Cornwall, C. E. et al. Global declines in coral reef calcium carbonate production under ocean acidification and warming. Proceedings of the National Academy of Sciences 118, e2015265118, https://doi.org/10.1073/pnas.2015265118 (2021).
Patel, F., Pinto, W., Dey, M., Alcoverro, T. & Arthur, R. Carbonate budgets in Lakshadweep Archipelago bear the signature of local impacts and global climate disturbances. Coral Reefs 42, 1–14, https://doi.org/10.1007/s00338-023-02374-8 (2023).
Courtney, T. A. et al. Rapid assessments of Pacific Ocean net coral reef carbonate budgets and net calcification following the 2014-2017 global coral bleaching event. Limnology and Oceanography 67, 1687–1700, https://doi.org/10.1002/lno.12159 (2022).
Hallock, P., Lidz, B. H., Cockey-Burkhard, E. M. & Donnelly, K. B. Foraminifera as bioindicators in coral reef assessment and monitoring: the foram index. Environmental monitoring and assessment 81, 221–238, https://doi.org/10.1023/A:1021337310386 (2003).
Norström, A., Nyström, M., Lokrantz, J. & Folke, C. Alternative states on coral reefs: beyond coral-macroalgal phase shifts. Marine Ecology Progress Series 376, 295–306, https://doi.org/10.3354/meps07815 (2009).
Roff, G. & Mumby, P. J. Global disparity in the resilience of coral reefs. Trends in Ecology & Evolution 27, 404–413, https://doi.org/10.1016/j.tree.2012.04.007 (2012).
Girard, E. B. et al. Dynamics of large benthic foraminiferal assemblages: A tool to foreshadow reef degradation? Science of The Total Environment 811, 151396, https://doi.org/10.1016/j.scitotenv.2021.151396 (2022).
Johnson, J. A., Perry, C. T., Smithers, S. G., Morgan, K. M. & Woodroffe, S. A. Reef shallowing is a critical control on benthic foraminiferal assemblage composition on nearshore turbid coral reefs. Palaeogeography, Palaeoclimatology, Palaeoecology 533, 109240, https://doi.org/10.1016/j.palaeo.2019.109240 (2019).
Ge, Q. et al. Coarse-to-fine foraminifera image segmentation through 3D and deep features. In 2017 IEEE Symposium Series on Computational Intelligence (SSCI), 1–8, https://doi.org/10.1109/SSCI.2017.8280982 (2017).
Hsiang, A. Y. et al. Endless Forams: >34,000 Modern Planktonic Foraminiferal Images for Taxonomic Training and Automated Species Recognition Using Convolutional Neural Networks. Paleoceanography and Paleoclimatology 34, 1157–1177, https://doi.org/10.1029/2019PA003612 (2019).
Gorur, K. et al. Species-Level Microfossil Prediction for Globotruncana genus Using Machine Learning Models. Arabian Journal for Science and Engineering 48, 1315–1332, https://doi.org/10.1007/s13369-022-06822-5 (2023).
Richmond, T. et al. Forabot: Automated Planktic Foraminifera Isolation and Imaging. Geochemistry, Geophysics, Geosystems 23, e2022GC010689, https://doi.org/10.1029/2022GC010689 (2022).
Elder, L. E. et al. Sixty-one thousand recent planktonic foraminifera from the Atlantic Ocean. Scientific Data 5, 180109, https://doi.org/10.1038/sdata.2018.109 (2018).
Choquel, C. Dataset of 3D foraminifera to unravel environmental changes in the Baltic Sea entrance over the last 200 years, https://doi.org/10.5878/285V-PT74 (2023).
Zarkogiannis, S. D. et al. X-ray tomographic data of planktonic foraminifera species Globigerina bulloides from the Eastern Tropical Atlantic across Termination II. GigaByte 2020, gigabyte5, https://doi.org/10.46471/gigabyte.5 (2020).
Brombacher, A., Searle-Barnes, A., Zhang, W. & Ezard, T. H. G. Analysing planktonic foraminiferal growth in three dimensions with foram3D: an R package for automated trait measurements from CT scans. Journal of Micropalaeontology 41, 149–164, https://doi.org/10.5194/jm-41-149-2022 (2022). Publisher: Copernicus GmbH.
Fox, L., Stukins, S., Hill, T. & Miller, C. G. Quantifying the Effect of Anthropogenic Climate Change on Calcifying Plankton. Scientific Reports 10, 1620, https://doi.org/10.1038/s41598-020-58501-w (2020). Publisher: Nature Publishing Group.
Johnstone, H. J. H., Schulz, M., Barker, S. & Elderfield, H. Inside story: An X-ray computed tomography method for assessing dissolution in the tests of planktonic foraminifera. Marine Micropaleontology 77, 58–70, https://doi.org/10.1016/j.marmicro.2010.07.004 (2010).
Speijer, R. P. et al. Quantifying foraminiferal growth with high-resolution X-ray computed tomography: New opportunities in foraminiferal ontogeny, phylogeny, and paleoceanographic applications. Geosphere 4, 760–763, https://doi.org/10.1130/GES00176.1 (2008). Publisher: Geological Society of America.
Siccha, M. et al. Collection of X-ray micro computed tomography images of shells of planktic foraminifera with curated taxonomy. Scientific Data 10, 679, https://doi.org/10.1038/s41597-023-02498-0 (2023).
Theresa, Fritz-Enders. Foraminarium 3D project, http://www.foraminarium.com/about-this-project.html.
Neoscan. Neoscan N80 User Manual. Neoscan (2022). Version 1.5.
Teem. nrrd: Definition of NRRD File Format, https://teem.sourceforge.net/nrrd/format.html.
Luijmes, J., van Leeuwen, T. & Renema, W. Forametcetera, a novel ct scan dataset to accelerate classification research of foraminifera and non-foraminiferan material. Zenodo https://doi.org/10.5281/zenodo.8344213 (2023).
Acknowledgements
We sincerely thank Bertie Joan van Heuven for her assistance with the preparation of the specimens and for providing instructions on using the Neoscan CT scanner and its software. We also thank Daan Pelt for his advice in the design of the experiments.
Author information
Authors and Affiliations
Contributions
W.R. collected and pre-processed the samples. J.L. prepared and scanned the samples. J.L. wrote the scripts and segmented the specimens. T.v.L. advised on the experiment design. J.L. wrote the first draft of the manuscript and created its figures. All authors made contributions to sections of the manuscript and have reviewed it.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Luijmes, J., van Leeuwen, T. & Renema, W. ForametCeTera, a novel CT scan dataset to expedite classification research of (non-)foraminifera. Sci Data 11, 642 (2024). https://doi.org/10.1038/s41597-024-03476-w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-024-03476-w