As others have mentioned, converting html to markdown works pretty well.
With that said, we've noticed that for some sites that have nested lists or tables, we get better results by reducing those elements to a simplified html instead of markdown. Essentially providing context when the structures start and stop.
It's also been helpful for chunking docs, to ensure that lists / tables aren't broken apart in different chunks.
With that said, we've noticed that for some sites that have nested lists or tables, we get better results by reducing those elements to a simplified html instead of markdown. Essentially providing context when the structures start and stop.
It's also been helpful for chunking docs, to ensure that lists / tables aren't broken apart in different chunks.