Speeding Your Model Training Extras
本文最后更新于:2023年7月10日 晚上
INTRODUCTION
Although some methods of model training acceleration have been described, over the past year I have come across a few more techniques in model training that can improve training speed. In this article, we continue to add some tips to accelerate model training.
There are primarily three stages during model training that consume significant time:
- Loading data from disk
- Processing data in datasets and dataloaders
- Actual model training
Data Loading
Primarily, to reduce time consumption, it is essential to minimize the first two stages without hardware improvements or changing the training strategy. We should aim to avoid time-consuming operations in the dataset or dataloader stage. The model training stage has to wait for the data to load before a batch of data can be processed. Hence, it’s advisable to pre-process data offline using multiprocessing, followed by loading the data without any time-consuming processing.
One method is to use datasets like HDF5
Even with offline pre-processing of data, loading large files still takes time. PyTorch datasets involve loading files from the disk, which inevitably takes some time. However, this issue can be mitigated by using the HDF5 dataset. HDF5 offers a way to store your pre-processed features in its data format. Here’s a simple example of how to store your data into an HDF5 file using Python:
1 |
|
However, if you need to process hundreds of gigabytes of data, this method can be slow. In this case, multiprocessing can be used to split all data into several parts and convert them into multiple h5 files. Below is a code example that demonstrates how to split and merge data:
Splitting File
1 |
|
Multiprocesss
1 |
|
Cell_Process
1 |
|
Merge
1 |
|
Virtual Dataset
Unfortunately, merging multiple h5 files can cause issues when memory size is scarce. In such cases, you can use h5py’s virtual dataset to merge all files.
1 |
|
Dataset
1 |
|
A Proper DataLoader Worker When Data Processing is Necessary
If you must pre-process input data online, there are ways to alleviate loading congestion. Setting the correct parameters for “num_worker” and “batch_size” can allow you to load data using multiprocessing. Please note that setting a very high “num_worker” requires substantial memory. Generally, “num_worker” should be set to the number of GPUs times a certain factor, and increasing this number can allow more data to be loaded at a time.
However, increasing this number too much can slow down the process. For example, as the number of “num_workers” increases, more data will be loaded from the dataloader at once. If a time-consuming operation on a file blocks the process, the entire operation is blocked. Therefore, setting the number to two or three times the number of GPUs may be a better choice.
It’s important to note that this number should be determined experimentally.
Model Training
For the model training phase, just utilize PyTorch’s official Distributed Data Parallel (DDP) method like the previous artcile.
本博客所有文章除特别声明外,均采用 CC BY-SA 4.0 协议 ,转载请注明出处!