Me

ACOUSTIC SPECIES IDENTIFICATION

BIRDS

Left Image Right Image

Birds can serve as effective indicators of changes in biodiversity due to their mobility and diverse habitat needs. The alterations in the species composition and bird population can reflect the progress or shortcomings of restoration projects. However, conventional bird biodiversity surveys relying on observer-based methods are often difficult and expensive to conduct over extensive areas. Alternatively, using audio recording devices combined with modern analytical techniques that employ machine learning can enable conservationists to study the correlation between restoration interventions and biodiversity in greater depth by sampling larger spatial scales with higher temporal resolution. Ultimately, the optimal objective is to create a pipeline capable of precisely detecting a wide range of species vocalizations within a specified location where audio equipment is installed.

Our team's MVP was to design and implement a solution for analyzing audio data to identify different bird species based on their distinct calls. By utilizing machine learning techniques, we aim to create a powerful model that can accurately recognize and predict the species of birds present in the audio recordings. We successfully accomplished this.

The proposed end product of this project is a robust and efficient multi-species classifier that can seamlessly process audio data and identify the bird species present in the recordings. Our solution will use melspectogram to represent features from the audio data, which will then be used to train the machine learning model. Our work is available in this repository, where you can view our project.

Data Set:

The train data contains 264 species from Kenya, Africa, and the test set consists of 191 10-minute soundscapes. Xeno-canto provided 16,900 audio recordings which can be used to train a classifier. This data is part of the BirdClef 2023 Kaggle competition.

PyHa: A tool designed to convert audio-based "weak" labels to "strong" moment-to-moment labels. We are using the TweetyNet model variant to identify bird calls in the audio clip. The link to this tool can be found here PyHa.

This package is being developed and maintained by the Engineers for Exploration Acoustic Species Identification Team in collaboration with the San Diego Zoo Wildlife Alliance.

Step 1: Audio data is sent to PyHA, a platform capable of processing audio signals, in order to obtain a set of 5-second audio segments. These segments will be analyzed to determine the presence or absence of bird sounds, and data frames will be generated.

Step 2: To visualize and analyze this data, we utilized the librosa library, which offers a range of audio processing functionalities. In particular, we employed librosa to generate melspectrograms for the data frames.

Step 3: To initiate our analysis, we adopted a random selection process to choose 30 classes from the available data. The purpose of this selection was to generate melspectrogram images for the corresponding data frames derived from the PyHa output. By selecting a diverse set of 30 classes, we aimed to obtain a representative sample of the dataset, encompassing various bird species and vocalizations.

The models we chose were: VGG16, EfficientNet Model - B0, B3, B7, Resnet50.

VGG16: The validation accuracy achieved on the 30 classes was 89.06%. The macro average precision was determined to be 78.2%.

EfficientNet: The validation accuracy achieved on the 30 classes for the EfficientNet B0 model was 74.35%, for the EfficientNet B3 model was 64.36%, and for the EfficientNet B7 model was 38.01%. The macro average precision was determined to be 78.2% for the EfficientNet B0 model, 74.60% for the EfficientNet B3 model, and 44.06% for the EfficientNet B7 model.

Resnet50: The validation accuracy achieved was only 36%.

Data Augmentation Techniques
Team Members