Fine-Tuning VLMs for Data Extraction

cvaman · 2024-10-04T12:39:51 1728045591

Is it possible to host multiple fine-tuned VLMs on a single machine? like multiple models sharing the GPU(s) for inference?

spikyspider · 2024-10-04T13:23:28 1728048208

Yeah. If you have a large enough GPU you can use vanilla pytorch to load as many models as required. Docker is a good option if you have isolated services. Triton, TensorRT, TorchServe, RAY are also good services to checkout, especially when you want to load multiple adapters for the same LLM/VLM backbone. Is there anything specific you are looking to serve?