Why is Pruning at Initialization Immune to Reinitializing and Shuffling?

Sahib Singh, Rosanne Liu

TL;DR

Pruning at init methods work even under randomization treatments, perhaps because they maintain the weight distribution.

Abstract

Recent studies assessing the efficacy of pruning neural networks methods uncovered a surprising finding: when conducting ablation studies on existing pruning-at-initialization methods, namely SNIP, GraSP, SynFlow, and magnitude pruning, performances of these methods remain unchanged and sometimes even improve when randomly shuffling the mask positions within each layer (Layerwise Shuffling) or sampling new initial weight values (Reinit), while keeping pruning masks the same. We attempt to understand the reason behind such network immunity towards weight/mask modifications, by studying layer-wise statistics before and after randomization operations. We found that under each of the pruning-at-initialization methods, the distribution of unpruned weights changed minimally with randomization operations.

Venue

In Sparsity in Neural Networks Workshop 2021.

BibTeX

@article{singh2021pruning,
title={Why is Pruning at Initialization Immune to Reinitializing and Shuffling?},
author={Sahib Singh and Rosanne Liu},
year={2021},
eprint={2107.01808},
archivePrefix={arXiv},
primaryClass={cs.LG}
}

Date

July, 2021

Links

SNN Workshop 2021 arXiv