Under the hood of Hard Margin SVM

So author explains how Hard Margin SVM works under the hood(as partially seen on image).
But didn't explain what is the idea of minimizing $$\frac{1}{2}w^Tw$$
So why do we have to minimize it? What is the idea?

  • Is it because we have to keep vector 'w' as small as possible? so the "street" will be as wide as possible?


Erdos Erdos
  • Erdos Erdos

    Let me know if you have any further questions.

  • Are there any other reasons to choose squared l2 norm besides it's derivative? Could I choose other norms?

    • Erdos Erdos

      I added a note that the end of my solution.

  • ok thanks, now its clear

The answer is accepted.
