You need to structure it in the form of "if the user says X, you say Y."
For example: if the user asks "where do I find red pants," say "we don't sell red pants, but paint can be found here"
The OP gave a quick example. You can take raw docs and generate a Q/A data set from it, and train on that. Generating the Q/A data set could be as simple as: taking the raw PDF, asking the LLM "what questions can I ask about this doc," and the feeding that into the fine tuning. BUT, and this is important, you need need a human to look at the generated Q/A and make sure it is correct.
Key in this. Don't forget: you can't beat a human deciding what is the "right" facts and responses that you want your LLM to produce