This is one of the projects in Udacity Self-Driving Car Nanodegree in term 1. I've done it. It's simple and does not need deep learning. But if you want to go fancy, you can use image segmentation. http://blog.qure.ai/notes/semantic-segmentation-deep-learnin...
The HOG + SVM method is quite slow and not as accurate as a deep learning approach. Before jumping onto semantic segmentation, I recommend re-implementing this project or more generally solve this problem using a Regional Convolution Neural Network architecture (R-CNN) like Faster R-CNN[1] or YOLO[2] for instance.
Totally agree with your point on HOG + SVM, I think it is obsoleted by convolutional neural networks.
I wrote a realtime human detection library [1] for a robotics project that used HOG + a simple neural net for classification. While it worked okay, I wasn't happy with the precision (around 90%) and decided to try out a simple convnet from Torch (doing the classication on depth images instead of HOG descriptors). The Torch version was slightly slower on a CPU, but both the precision and recall jumped up drastically.