ducho.multimodal.audio package

ducho.multimodal.audio.AudioFeatureExtractor module

class ducho.multimodal.audio.AudioFeatureExtractor.AudioFeatureExtractor(gpu='-1')[source]

This class represents the Audio Feature Extractor utilized for feature extraction.

extract_feature(sample_input)[source]

This function extracts features from the input data. Prior to calling this function, the framework, model, and layer have to be configured using their respective set methods.

Parameters:

sample_input – The preprocessed data.

Returns:

A numpy array representing the extracted features, which will be stored in a .npy file using the appropriate method of the Dataset Class.

set_framework(backend_libraries_list)

Set the framework(s) for use (e.g. tensorflow, pytorch, etc.).

Parameters:

backend_libraries_list (List[str]) – A list of strings representing the framework(s) to utilize. It is acceptable to have only one item in the list.

Returns:

None

set_model(model)[source]

This procedure facilitates the configuration of the Audio Feature Extractor model using YAML specifications.

Parameters:

model – The row of the YAML file containing the user’s specifications.

Returns:

None

ducho.multimodal.audio.AudioDataset module

class ducho.multimodal.audio.AudioDataset.AudioDataset(input_directory_path, output_directory_path)[source]

This class represents the Audio Dataset used for the data loading process.

create_output_file(input_batch, extracted_data, model_layer, fusion=None)

Create an output numpy file with extracted data. (E.g. datasetFolder/framework/modelName/modelLayer/fileName.npy)

Parameters:
  • input_batch (tensor) – The batch just processed by the extractor. It contains the filenames too.

  • extracted_data (Any) – The data to be stored in the .npy file.

  • model_layer (str) – The name of the layer.

  • fusion (str, optional) – The type of fusion for multimodal models.

Returns:

None

set_framework(backend_libraries_list)

Set the framework(s) to use.

Parameters:

backend_libraries_list (list of str) – A list of strings representing the framework(s) to use. It’s acceptable to have only one item in the list.

Returns:

None

set_model(model)[source]

sets the model as a string to execute the preprocessing NOTE ON MODELS: here it is accepted torchaudio and transformers (by huggingface) models. When using transformers you have to indicate in the String also the repo as ‘repo/model_name’

Parameters:

model – the model name as a String

Returns:

None