ducho.multimodal.visual package

ducho.multimodal.visual.VisualFeatureExtractor module

ducho.multimodal.visual.VisualDataset module

class ducho.multimodal.visual.VisualDataset.VisualDataset(input_directory_path, output_directory_path, model_name='VGG19', reshape=(224, 224))[source]

This class represents the Visual Dataset used for the data loading process.

create_output_file(input_batch, extracted_data, model_layer, fusion=None)

Create an output numpy file with extracted data. (E.g. datasetFolder/framework/modelName/modelLayer/fileName.npy)

Parameters:

input_batch (tensor) – The batch just processed by the extractor. It contains the filenames too.
extracted_data (Any) – The data to be stored in the .npy file.
model_layer (str) – The name of the layer.
fusion (str, optional) – The type of fusion for multimodal models.

Returns:

None

set_framework(backend_libraries_list)

Set the framework(s) to use.

Parameters:: backend_libraries_list (list of str) – A list of strings representing the framework(s) to use. It’s acceptable to have only one item in the list.
Returns:: None

set_image_processor(image_processor)[source]

Set the image_processor functional pointer for the tranformers library. :param image_processor: the image processor function.

Returns:: None

set_mean_std(mean: Tensor, std: Tensor) → None[source]

Set custom values of mean and std for z-score normalization.

Parameters:

mean – torch.Tensor containing the desired mean along the three channels.
std – torch.Tensor containing the desired standard deviation along the three channels.

Returns:

None

set_preprocessing_type(preprocessing_type: str) → None[source]

Set the desired pre-processing type. It must be between minmax and z-score.

Parameters:: preprocessing_type – the desired pre-processing.
Returns:: None

set_reshape(reshape)[source]

Set the reshape variable according to the desired value.

Parameters:: reshape – Tuple (int, int) representing the width and height for resizing the input.
Returns:: None