ducho.multimodal.multiple.visual_textual package

ducho.multimodal.multiple.visual_textual.VisualTextualFeatureExtractor module

ducho.multimodal.multiple.visual_textual.VisualTextualDataset module

class ducho.multimodal.multiple.visual_textual.VisualTextualDataset.VisualTextualDataset(input_directory_path, output_directory_path, columns=None, model_name='openai/clip-vit-base-patch32', reshape=(224, 224))[source]

This class represents the Visual-Textual Dataset used for the data loading process.

create_output_file(input_batch, extracted_data, model_layer, fusion=None)[source]

This procedure is responsible for generating output files.

Parameters:

input_batch – The batch just processed by the extractor. It contains the filenames too.
extracted_data – A tuple containing the extracted features.
model_layer – The name of the output layer for the selected model.
fusion – A string indicating the type of fusion to perform. If None, the procedure generates two separate output files. Otherwise, it creates a single output file based on the specified fusion type.

Returns:

None

set_model_name(model_name)[source]

Set the model name for the serialization dir.

Parameters:: model_name – name of the multimodal model
Returns:: None

set_reshape(reshape)[source]

Set the reshape variable according to the desired value.

Parameters:: reshape – Tuple (int, int) representing the width and height for resizing the input.
Returns:: None