Nutanix NCP-AI Question Answer
An administrator is monitoring the performance of a deployed Large Language Model within the Nutanix Enterprise AI platform. After initial deployment, users report slow inference response times and occasional timeouts when accessing the model through its API endpoint.
The administrator reviews the performance metrics available in the NAI Dashboard and notes the following:
CPU usage is consistently high across all inference-serving containers.
Memory utilization is nearing the allocated limits for the model service.
The request latency graph shows increasing average inference times during peak usage.
Which action should the administrator take to improve performance and reduce latency?

