New Year Sale Special Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: ac4s65

An administrator is monitoring the performance of a deployed Large Language Model within the Nutanix...

An administrator is monitoring the performance of a deployed Large Language Model within the Nutanix Enterprise AI platform. After initial deployment, users report slow inference response times and occasional timeouts when accessing the model through its API endpoint.

The administrator reviews the performance metrics available in the NAI Dashboard and notes the following:

    CPU usage is consistently high across all inference-serving containers.

    Memory utilization is nearing the allocated limits for the model service.

    The request latency graph shows increasing average inference times during peak usage.

Which action should the administrator take to improve performance and reduce latency?

A.

Restart the model container to clear memory cache and allow the system to rebalance performance.

B.

Scale out the number of instances and allocate additional CPU and memory resources.

C.

Disable logging temporarily to reduce resource consumption during peak load periods.

D.

Increase the number of API keys assigned to the endpoint to allow more concurrent access.

NCP-AI PDF/Engine
  • Printable Format
  • Value of Money
  • 100% Pass Assurance
  • Verified Answers
  • Researched by Industry Experts
  • Based on Real Exams Scenarios
  • 100% Real Questions
buy now NCP-AI pdf
Get 65% Discount on All Products, Use Coupon: "ac4s65"