본문 바로가기

onnx1

AI 모델 추론을 위한 최적화 모델: Triton Server & Tensor RT AI모델에 대한 학습이 끝난 이후, 실제 production 환경에서 모델을 서빙할 때 필요한 부분들은 학습할 때와는 다르다. 가장 간단한 방식은 .predict()/.forward()를 실행하는 것이다. 하지만 더 속도와 TPS를 고민하고 더 좋은 방식이 없을지 생각하다 보면 다음과 같은 질문들이 떠오를 수 있다.Is there something more we can do with our model now that we don’t need to train anymore?Is there something better we can do than calling a high level .predict()/.forward() function?TRT, TRTIS는 학습이 완료된 모델을 inference만 할 때 .. 2024. 6. 18.

이전 1 다음

티스토리툴바