ducho.multimodal.multiple.visual_textual package

ducho.multimodal.multiple.visual_textual.VisualTextualFeatureExtractor module

class ducho.multimodal.multiple.visual_textual.VisualTextualFeatureExtractor.VisualTextualFeatureExtractor(gpu='-1')[source]

This class represents the Visual-Textual Feature Extractor utilized for feature extraction.

extract_feature(sample_input)[source]

This function extracts features from the input image and textual data. Prior to calling this function, the framework, model, and layer have to be configured using their respective set methods.

Parameters:

sample_input – The preprocessed data.

Returns:

Two numpy array representing the extracted features, which will be stored in two .npy files using the appropriate method of the Dataset Class.

set_framework(backend_libraries_list)

Set the framework(s) for use (e.g. tensorflow, pytorch, etc.).

Parameters:

backend_libraries_list (List[str]) – A list of strings representing the framework(s) to utilize. It is acceptable to have only one item in the list.

Returns:

None

set_model(model)[source]

This procedure facilitates the configuration of the Visual-Textual Feature Extractor model using YAML specifications.

Parameters:

model – The row of the YAML file containing the user’s specifications.

Returns:

None

ducho.multimodal.multiple.visual_textual.VisualTextualDataset module

class ducho.multimodal.multiple.visual_textual.VisualTextualDataset.VisualTextualDataset(input_directory_path, output_directory_path, columns=None, model_name='openai/clip-vit-base-patch32', reshape=(224, 224))[source]

This class represents the Visual-Textual Dataset used for the data loading process.

create_output_file(index, extracted_data, model_layer, fusion=None)[source]

This procedure is responsible for generating output files.

Parameters:
  • index – The index of the file to be processed.

  • extracted_data – A tuple containing the extracted features.

  • model_layer – The name of the output layer for the selected model.

  • fusion – A string indicating the type of fusion to perform. If None, the procedure generates two separate output files. Otherwise, it creates a single output file based on the specified fusion type.

Returns:

None

set_model_name(model_name)[source]

Set the model name for the serialization dir.

Parameters:

model_name – name of the multimodal model

Returns:

None

set_reshape(reshape)[source]

Set the reshape variable according to the desired value.

Parameters:

reshape – Tuple (int, int) representing the width and height for resizing the input.

Returns:

None