Herbarium Species Identification

Student Name: Burhan Rashid Hussein
Supervisor: Dr. Owais Ahmed Malik
Co-Supervisor: Dr. Ong Wee Hong


HERBARIUM SPECIES IDENTIFICATION USING COMPUTER VISION AND MACHINE LEARNING

Herbaria worldwide hold treasures of millions of plants specimens preserved from the previous generation which are crucial for the current biodiversity.Massive digitalization effort of herbarium specimens around the world.There are a huge number of herbarium species yet to be identified at the species level while others needs to be updated following recent taxonomic knowledge. The main goal of this project is to develop a plant species identification system of herbarium specimens using computer vision and machine learning techniques.

Reconstruction of damaged herbarium leaves using deep learning techniques for improving classification accuracy

Leaf is one of the most commonly used organs for species identification. The traditional identification process involves a manual analysis of individual dried or fresh leaf’s features by the botanists. Recent advancements in computer vision techniques have assisted in automating the plants families/species identification process based on the digital images of leaves. However, most of the existing studies have focused on using datasets for fresh and intact leaves. A huge amount of data for preserved plants in the form of digitized herbaria specimens have not been effectively utilized for the task of automated identification because of the presence of damaged leaves in specimens. In this study, deep learning techniques have been proposed as a tool for reconstructing the damaged herbarium leaves in order to maximize the usefulness of the digitized specimens for automated plant identification task by increasing the number of individual samples of leaves. The reconstruction results of two different families of convolution neural networks (CNNs) have been compared for data from ten different plant families namely Anacardiaceae, Annonaceae, Dipterocarpaceae, Ebenaceae, Euphorbiaceae, Malvaceae, Phyllanthaceae, Polygalaceae, Rubiaceae and Sapotaceae. The performance of automated identification task was improved by more than 20% using the reconstructed leaves images as compared to using the original data (i.e. images of specimens with damaged leaves). This work evidently suggests that deep learning techniques can be utilized for reconstruction of damaged leaves even on a challenging herbarium leaves dataset.

Framework for training and evaluating damaged leaf reconstruction Model.

Semantic Segmentation of Herbarium Specimen

One challenging task in automated identification process of these species is the existence of visual noise such as plant information labels, color codes and other scientific annotations which are mostly placed at different locations on the herbarium mounting sheet. This kind of noise needs to be removed before applying different species identification models as it can significantly affect the models’ performance. Therefore in this work we proposed the use of deep learning semantic segmentation model as a method for removing the background noise from herbarium images. Two different semantic segmentation models, namely DeepLab version three plus (DeepLabv3+) and the Full-Resolution Residual Networks (FRNN-A), were applied and evaluated in this study.The results indicate that FRNN-A performed slightly better with a mean Intersection of Union (IoU) of 99.2% compared to 98.1% mean IoU attained by DeepLabv3+ model on the test set.

The pixel-wise accuracy for two classes (herbarium specimen and background) was found to be 99.5% and 99.7%, respectively using FRNN-A model while the DeepLabv3+ was able to segment herbarium specimen and the rest of the background with a pixel-wise accuracy of 98.4% and 99.6%, respectively. This work evidently suggests that deep learning semantic segmentation could be successfully applied as a pre-processing step in removing visual noise existing in herbarium images before applying different classification models.

Leaf Trait Measurements for Herbarium Species Identification

Our first study involved investigation of a set of manually extracted leaf traits measurements for herbarium species identification for species from families Annonaceae, Euphorbiaceae and Dipterocarpaceae. Furthermore, the study investigated the application of Synthetic Mi- nority Over-sampling Technique (SMOTE) in improving classifier performance as we encountered imbalanced data samples for some species. Our results showed that, for Annonaceae species, the best accuracy was 56% by LDA after applying SMOTE.For Euphorbiaceae,the best accuracy was 79% by SVM without SMOTE. For inter-species classification between Annonaceae and Euphorbiaceae, the best accuracy of 63% was achieved by LDA without SMOTE.

An accuracy of 85% was achieved by LDA for Dip- terocarpaceae species while 91% accuracy was obtained by both RF and SVM for inter-family classification between the two balanced datasets of Annonaceae and Euphorbiaceae. The results of this study show the feasibility of using ex- tracted traits for building accurate species identification models for Family Dip- terocarpaceae and Euphorbiaceae, however, the features used did not yield good results for Annonaceae family. Furthermore, there was no significant improvement when SMOTE technique was applied.

Scopus Publications

Reconstruction of damaged herbarium leaves using deep learning techniques for improving classification accuracy
Burhan Rashid Hussein, Owais Ahmed Malik, Wee-Hong Ong, and Johan Willem Frederik Slik
Ecological Informatics, ISSN: 1574-9541, eISSN: 101243, Volume: 61, Published: 2021

Semantic Segmentation of Herbarium Specimens Using Deep Learning Techniques
Burhan Rashid Hussein, Owais Ahmed Malik, Wee-Hong Ong, and Johan Willem Frederik Slik
Lecture Notes in Electrical Engineering, ISSN: 18761100, eISSN: 18761119, Volume: 603, Pages: 321-330, Published: 2020

Automated Classification of Tropical Plant Species Data Based on Machine Learning Techniques and Leaf Trait Measurements
Burhan Rashid Hussein, Owais Ahmed Malik, Wee-Hong Ong, and Johan Willem Frederik Slik
Lecture Notes in Electrical Engineering, ISSN: 18761100, eISSN: 18761119, Volume: 603, Pages: 85-94, Published: 2020