In this section, we will guide you through running a method in FedPruning using multi-GPU and multi-process execution. For this example, we will use the one simple Federated Dynamic Pruning method, FedTiny-clean for Example.
Step 1: Check Available GPUs
First, determine how many GPUs are available on your machine and their memory status. You can use either gpustat or nvidia-smi to check. In this example, we have four NVIDIA RTX 5880 GPUs:
To download the dataset for training, such as CIFAR-10, follow these steps:
cd data/cifar10
sh download_cifar10.sh
cd ../../
This will download and prepare the CIFAR-10 dataset in the data/cifar10 directory for training.
Step 3: Set the Processes
Navigate to the directory containing the method you want to run. For FedAVG, this would be:
cd experiments/distributed/fedavg
You should decide how many processes you plan to use, the number of processes is client_num_per_round + 1 (where client_num_per_round is the number of clients selected per round and the additional process is for the server), you need to set the correct number of processes in gpu_mapping.yaml.
For example, if client_num_per_round is set to 10, you should configure 11 processes in gpu_mapping.yaml like this:
In this list:
The elements represent the number of processes assigned to each GPU.
The length of the list indicates the number of GPUs available.
In the above example, three processes will run on each of the first three GPUs, and two will run on the fourth GPU.
Step 4. Set the Arguments and Run the Code
To run the code, we will use a bash script. You can find details about the arguments and their meanings in the files run_fedtinyclean_distributed_pytorch.sh and main_fedtinyclean.py. By default, you can control these arguments directly through the terminal when using the bash script:
[] is mandatory arguments, {} is the optional arguments.
where
For example, if you want to evaluate the ResNet18 model on the CIFAR-10 dataset, you can run the following command:
If you are logged into WandB and its status is set to online, you can view the evaluation results on your WandB workspace website.
[gpus] specifies which GPUs to use.
[model] is the name of the model.
[dataset] is the name of the dataset.
[client_num_in_total] is the total number of clients.
[client_num_per_round] is the number of clients selected per round.
[comm_round] is the number of communication rounds.
[epochs] is the number of local epochs.
[target_density] is the target density for the sparse model.
[initial_lr] is the initial learning rate.
{--delta_T} is the interval rounds between two adjustment round.
{--T_end} is the end round number for adjustment round.
{--partition_alpha} refers to the partition alpha, higher partition alpha makes lower degree of data heterogeneity
{--num_eval}is the number data samples for validation, -1 means using the whole testing dataset.
{--frequency_of_the_test} the frequency to test/validate the performance during the training, using num_eval data samples