lotterty ticket hypothesis

How Much Pre-training Is Enough to Discover a Good Subnetwork?