There are a lot of 'visual' people who learn very good with pictures and not so much with text. Mee too. A screenshot helps me getting into the setting. Later on I need some detailed documentation.
That's why tutorial videos are so popular for beginning something new
Honestly I think you're getting closer to the core of the problem than most. Appealing to multiple different learning styles is a fantastic idea. I see tons of people online talking about learning by watching Youtube videos... this doesn't work for me, but it's obviously helpful for a lot of them. Some work well with text. I personally have a great measure of success when I'm presented with playgrounds for learning, like the GraphQL docs which let you live edit the queries in the documentation to play with them and see how the concepts work. None of these is any better than the others and all should be considered for the highest quality documentation.