frank-wolfe algorithm

How Much Pre-training Is Enough to Discover a Good Subnetwork?