Generate data set

Usage

To extract the features for a data set and save the resulting tensors, make sure that the ground truth metadata has been previously created by using the create_encoding.py script.

Given the amount of processing, this class uses the multiprocessing package and it is recommended to run on a GPU, using the following command:

python3 generate_dataset.py

Documentation

class generate_dataset.GenerateDataset(duration, num_classes, val_split)

GenerateDataset generates training and testing tensors for a given dataset. The training inputs to the neural network are melspectrograms with 128 mel bands and variable segment duration length. This class also performs a balanced train-validation split depending on the amount of samples in each class. All of the parameters are inherited from settings.py.

Parameters:
  • duration (int) – number of time bins to create melspectrogram.
  • num_classes (int) – number of classes to be evaluated.
  • val_split (float) – training-validation split from 0 to 1.
create_multi_spectrogram(filename, sr=22050, win_length=1024, hop_length=512, num_mel=128)

This method creates a melspectrogram from an audio file using librosa audio processing library. Parameters are default from Han et al. It also extracts three spectrograms with different window sizes (multiples of the original window size) and stacks them into a three-dimensional representation of the audio.

Parameters:
  • filename (str) – wav filename to process.
  • sr (int) – sampling rate in Hz (default: 22050).
  • win_length (int) – window length for STFT (default: 1024).
  • hop_length (int) – hop length for STFT (default: 512).
  • num_mel (int) – number of mel bands (default:128).
Returns:

ln_S (np.array) - melspectrogram of the complete audio file with logarithmic compression with dimensionality [mel bands x time frames x 3].

create_spectrogram(filename, sr=22050, win_length=1024, hop_length=512, num_mel=128)

This method creates a melspectrogram from an audio file using librosa audio processing library. Parameters are default from Han et al.

Parameters:
  • filename (str) – wav filename to process.
  • sr (int) – sampling rate in Hz (default: 22050).
  • win_length (int) – window length for STFT (default: 1024).
  • hop_length (int) – hop length for STFT (default: 512).
  • num_mel (int) – number of mel bands (default:128).
Returns:

ln_S (np.array) - melspectrogram of the complete audio file with logarithmic compression with dimensionality [mel bands x time frames].

load_metadata(path_metadata)

This method loads the metadata from the dataset previously generated by create_encoding.py.

Parameters:path_metadata (str) – path to csv with filenames and labels.
Returns:
  • filenames (list) - list of filenames that exist in the metadata.
  • labels (list) - list of labels per filename that exist in the metadata.
pure_development_split(dataset, file_id)

This method creates balanced 50-50 split between pure testing data and development testing data from the test data set, depending on the total number of examples from each class. This method is only available for testing data sets and can manage to output a single tensor (for the normal model - multi_input = False) and double tensors (for the model with two branches - multi_input = True).

Parameters:
  • dataset (list) – list of extracted spectrograms for test data set.
  • file_id (list) – file identifier from which the spectrogram was extracted.
Returns:

  • pure_test_set (np.array) - array(s) with spectrograms of pure test data set.
  • fid_pure_set (list) - list(s) of file ids for the pure test data set.
  • dev_test_set (np.array) - array(s) with spectrograms of development test data set.
  • fid_dev_set (list) - list(s) of file ids for the development test data set.

run(type_run)

This method extracts the features for the training, validation, development test, and pure test data sets and saves the resulting tensors to train and evaluate the convolutional neural network. It uses the multiprocessing package and the amount of processes created depends on the quantity of multiprocessing.cpu_count() function. It has been created for the normal model (multi_input = False).

Parameters:type_run (str) – generates ‘train’ or ‘test’ data sets.
run_multi(type_run)

This method extracts the features for the training, validation, development test, and pure test data sets and saves the resulting tensors to train and evaluate the convolutional neural network. It uses the multiprocessing package and the amount of processes created depends on the quantity of multiprocessing.cpu_count() function. It has been created for the two branch model (multi_input = True).

Parameters:type_run (str) – generates ‘train’ or ‘test’ data sets.
validation_split(dataset, class_id, file_id, num_classes, validation)

This method shuffles the samples randomly and creates a balanced train-validation split depending on the total number of examples from each class. This method is only available for training data sets and can manage to output a single tensor (for the normal model - multi_input = False) and double tensors (for the model with two branches - multi_input = True).

Parameters:
  • dataset (list) – list of extracted melspectrograms for the dataset.
  • class_id (list) – list of labels from the extracted spectrograms.
  • file_id (list) – file identifier from which the spectrogram was extracted.
  • num_classes (int) – number of classes in the dataset.
  • validation (float) – train-validation split from 0 to 1.
Returns:

  • X_train (np.array) - array(s) with the training data set of spectrograms.
  • y_train (np.array) - array(s) with training labels of training data set.
  • fid_train (list) - list(s) of file ids for the training data set.
  • X_val (np.array) - array(s) with validation data set of spectrograms.
  • Y_val (np.array) - array(s) with validation labels of validation dataset
  • fid_val (list) - list(s) of file ids for the validation data set.