Use Data

This guide provides a detailed step-by-step explanation of how to use a training dataset in PyTorch for AI/ML model training, focusing on HDF5 (.h5) files. It explains how to process, load and train a model using such datasets efficiently. HDF5 files are widely used for storing large scale datasets because they offer structured storage, fast access to data, and efficient handling of large files. PyTorch provides powerful tools to load and process these datasets for model training. 

Before training an AI/ML model, it is essential to understand the structure and format of the dataset. 

  • A training dataset is a collection of data used to teach an AI model to recognize patterns. 
  • It contains input features and labels. 
  • Images are used as input, and labels also known as masks indicate regions of interest. 

The dataset includes four main types of files: 

  • Contains the raw images used for training. 
  • The images are typically in standard formats such as JPEG. 
  • Each image represents solar data captured at a specific wavelength. 

 

  • Contains segmentation masks for each image. 
  • These masks identify specific regions such as coronal holes in the images. 
  • Each mask is aligned with its corresponding image, meaning they share the same dimensions. 
  • Stores processed versions of the original images. 
  • These may be cropped, enhanced, or preprocessed to help visualize labeled regions. 
  • File paths for images, masks, and visuals. 
  • Label information, such as object categories. 
  • Scientific details, including latitude, longitude, exposure time, and wavelength of the captured images. 

Helps automate the data loading process and ensures the correct matching of images and masks. 

In PyTorch, datasets are handled using the Dataset and DataLoader classes, which enable efficient data loading and batch processing. 

  • The JSON file acts as an index for the dataset, linking images to their corresponding masks. 
  • In PyTorch, the file paths are extracted and stored so that they can be accessed during training. 
  • PyTorch models require input data to be in the form of tensors. 
  • The images and masks are converted into PyTorch tensors before being used in training. 
  • Normalization is applied to ensure the pixel values are within a suitable range. 
  • Images and masks may need to be resized to a fixed size (e.g., 1024×1024 pixels). 
  • Data augmentation techniques (such as flipping, rotation, or brightness adjustments) improve model performance by increasing variability in training data. 
  • PyTorch’s torchvision.transforms provides built-in functions for resizing, normalizing, and augmenting datasets. 

PyTorch allows users to define a custom dataset class that reads images and masks from their respective folders. 

  • This dataset class ensures that each image is correctly paired with its corresponding mask. 
  • The DataLoader class in PyTorch: 
  • Loads the dataset in batches instead of all at once. 
  • Uses shuffle operations to prevent overfitting. 
  • Supports parallel data loading to speed up training. 
  • The dataset is split into: 
  • Training set which is used to teach the model. 
  • Validation set which is used to check performance on unseen data. 
  • The model is trained using mini-batches of images and masks. 
  • The images are passed through the model, and predictions are generated. 
  • The predictions are compared with the actual masks using a loss function. 
  • The optimizer adjusts the model parameters based on the loss to improve accuracy. 
  • In PyTorch, different loss functions are used depending on the task:  
  • Binary Cross-Entropy Loss: Used for binary segmentation tasks. 
  • The loss function calculates the error between the predicted mask and the ground truth mask. 
  • The optimizer updates the model’s weights to minimize loss. 
  • Adam or SGD Optimizer is commonly used because it automatically adjusts the learning rate. 
  • Learning rate schedulers can also be applied to adjust learning rates dynamically. 
  • The dataset is fed into the model in batches. 
  • The model predicts segmentation masks for each batch. 
  • The loss function computes the error, and the optimizer updates the model’s parameters. 
  • This process is repeated for multiple epochs until the model improves. 

After training, the model’s performance is tested on new, unseen images. 

  • The trained model is used to generate predictions on test images. 
  • The predicted segmentation masks are compared with the actual masks. 
  • If the model performs well on training data but poorly on test data, it is overfitting. 
  • Adding more training data. 
  • Applying data augmentation. 

 After training, the trained model is saved so it can be reused. 

  • The trained Mask R-CNN model is loaded using torch.load(), ensuring that it is in evaluation mode (model.eval()). 
  • The model is transferred to the appropriate device (CPU/GPU) for inference. 
  • Images need to be preprocessed using the same transformation steps as during training (resizing, normalization, tensor conversion). 
  • The image is passed through the model inside a torch.no_grad() block to disable gradient calculations, improving efficiency. 
  • The model outputs bounding boxes, labels, and segmentation masks for detected objects. 
  • The predicted masks contain pixel-wise probability values, where higher values indicate stronger predictions. 
  • The predicted segmentation masks can be overlayed on the original images to evaluate how well the model identifies regions of interest. 
  • The detected region in the middle and right images suggests that the model has identified a coronal hole.  
  • The segmentation mask highlights the area of interest in bright colors, indicating the detected region.