Emotion Classifier from Audio Patterns (CNN)

  • Tech/ML Approach:: Python (Google CoLab, tensorflow, keras, scikit-learn, librosa, matplotlib, seaborn)
  • Github URL: Project Link

Created a Convolutional Neural Network to train a model based on raw audio sound files to classify emotions using the RAVDESS audio dataset.

Performed EDA on different NNs (RNNs, MLP, SVM, etc.) and chose CNN (ability to analyze time-frequency representations like spectrograms to form patterns- similar to their image analyzing ability). Explored PCA and dimension reductionality (unfeasible because it removed time structure of data). Found good results with data augmentation techniques of noise addition and shifting, and feature extraction using chroma, MFCC, and spectral contrast for the model.

Our CNN model achieved an accuracy of 0.71 and an F1-score of 0.70. This can be further improved in future scenarios with more gender distinction, further model architecture testing, and more diverse data to improve generalization performance.