Frank-Wolfe Algorithm

How Much Pre-training Is Enough to Discover a Good Subnetwork?