The HOG + SVM method is quite slow and not as accurate as a deep learning approach. Before jumping onto semantic segmentation, I recommend re-implementing this project or more generally solve this problem using a Regional Convolution Neural Network architecture (R-CNN) like Faster R-CNN[1] or YOLO[2] for instance.
Totally agree with your point on HOG + SVM, I think it is obsoleted by convolutional neural networks.
I wrote a realtime human detection library [1] for a robotics project that used HOG + a simple neural net for classification. While it worked okay, I wasn't happy with the precision (around 90%) and decided to try out a simple convnet from Torch (doing the classication on depth images instead of HOG descriptors). The Torch version was slightly slower on a CPU, but both the precision and recall jumped up drastically.
The HOG + SVM method is quite slow and not as accurate as a deep learning approach. Before jumping onto semantic segmentation, I recommend re-implementing this project or more generally solve this problem using a Regional Convolution Neural Network architecture (R-CNN) like Faster R-CNN[1] or YOLO[2] for instance.
[1]: https://arxiv.org/abs/1506.01497 [2]: https://arxiv.org/abs/1506.02640