Weight Initialization plays a significant role in deep feedforward neural networks' training process. Xavier Glorot and Yoshua Bengio highlighted the issue of using normal distribution for initializing weights with mean 0 and variance 1, which contributes to unstable gradients. To tackle these problems, new techniques have been introduced. This video discusses these methods, their differences, and ideal activation functions they correspond to.