A New Study Proposes Automatic Taxonomic Identification Based On The Fossil Image Dataset (>415,000 images) And Deep Convolutional Neural Networks

Paleontology is a fascinating field that helps us understand the history of life on Earth by studying ancient life forms and their evolution. However, one of the major challenges in paleontological research is the labor-intensive and time-consuming taxonomic identification process, which requires extensive knowledge and experience in a particular taxonomic group. Moreover, identification results often must be more consistent across researchers and communities.

Deep learning techniques have emerged as a promising solution for supporting the taxonomic identification of fossils. In this context, a Chinese research team recently published an article exploring the potential of deep learning for improving taxonomic identification accuracy.

The main contribution of this paper is the creation and validation of a large and comprehensive fossil image dataset (FID) using web crawlers and manual curation. The dataset includes 415,339 images from 50 different clades of fossils, including invertebrates, vertebrates, plants, microfossils, and trace fossils. A convolutional neural network (CNN) was used to classify the fossil images and achieved high classification accuracies, demonstrating the potential of the FID for automated fossil identification and classification. The authors also made the FID publicly available for future use and development.

This study experimentally investigates the use of transfer learning with models trained on ImageNet to identify and classify fossils in the Fossil Image Database (FID). The authors found that freezing half of the network layers as feature extractors and training the remaining layers yielded the best performance. Data augmentation and dropout were effective methods to prevent overfitting, while frequent learning rate decay and large training batch sizes contributed to faster convergence and high accuracy. The study also examined the impact of imbalanced data on the algorithm and employed sampling methods for imbalanced learning. The dataset’s quality was important for accurate identification, with microfossils performing well due to the availability of high-quality images, while certain fossils with poor preservation and few samples performed poorly. The authors also found that the large intraclass morphological diversity of certain clades hindered identification accuracy due to the difficulty of the DCNN architecture in extracting discriminative characteristics.

The Inception-ResNet-v2 architecture achieved an average accuracy of 0.90 in the test dataset when using transfer learning. Microfossils and vertebrate fossils had the highest identification accuracies of 0.95 and 0.90, respectively. However, clades such as sponges, bryozoans, and trace fossils, which had various morphologies or few samples in the dataset, had identification accuracies below 0.80.

In conclusion, deep learning techniques, particularly transfer learning, have shown promising results in improving the accuracy and efficiency of taxonomic identification of fossils. The creation and validation of a large and comprehensive fossil image dataset, such as the Fossil Image Database (FID), is crucial for achieving high identification accuracy. Its availability for public use and development is beneficial for advancing the field of paleontology. However, the accuracy of deep learning models depends on the dataset’s quality and diversity, with certain clades posing challenges due to their intraclass morphological diversity or poor preservation. Further research and development in deep learning techniques and large-scale fossil image datasets are necessary to overcome these challenges and improve the accuracy and efficiency of paleontological research.

Moreover, deep learning techniques in paleontology can potentially transform the field beyond taxonomic identification. These techniques can extract more information from fossil data, such as the segmentation and reconstruction of fossils, integrating fossil data with other types of data, and detecting patterns and anomalies in large-scale fossil datasets. This expands our understanding of the history of life on Earth, paving the way for exciting discoveries and advancements.

Check out the Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 18k+ ML SubRedditDiscord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

???? Check Out 100’s AI Tools in AI Tools Club

Mahmoud is a PhD researcher in machine learning. He also holds a
bachelor’s degree in physical science and a master’s degree in
telecommunications and networking systems. His current areas of
research concern computer vision, stock market prediction and deep
learning. He produced several scientific articles about person re-
identification and the study of the robustness and stability of deep