You developed an ML model with Al Platform, and you want to move it to...

Google Professional-Machine-Learning-Engineer Full Course Access

Google Professional-Machine-Learning-Engineer View All Questions

Google Professional-Machine-Learning-Engineer Question Answer

You developed an ML model with Al Platform, and you want to move it to production. You serve a few thousand queries per second and are experiencing latency issues. Incoming requests are served by a load balancer that distributes them across multiple Kubeflow CPU-only pods running on Google Kubernetes Engine (GKE). Your goal is to improve the serving latency without changing the underlying infrastructure. What should you do?

Significantly increase the max_batch_size TensorFlow Serving parameter

Switch to the tensorflow-model-server-universal version of TensorFlow Serving

Significantly increase the max_enqueued_batches TensorFlow Serving parameter

Recompile TensorFlow Serving using the source to support CPU-specific optimizations Instruct GKE to choose an appropriate baseline minimum CPU platform for serving nodes

Explanation:

TensorFlow Serving is a service that allows you to deploy and serve TensorFlow models in a scalable and efficient way. TensorFlow Serving supports various platforms and hardware, such as CPU, GPU, and TPU. However, the default TensorFlow Serving binaries are built with generic CPU instructions, which may not leverage the full potential of the CPU architecture. To improve the serving latency and performance, you can recompile TensorFlow Serving using the source code and enable CPU-specific optimizations, such as AVX, AVX2, and FMA1. These optimizations can speed up the computation and inference of the TensorFlow models, especially for deep neural networks.

Google Kubernetes Engine (GKE) is a service that allows you to run and manage containerized applications on Google Cloud using Kubernetes. GKE supports various types and sizes of nodes, which are the virtual machines that run the containers. GKE also supports different CPU platforms, which are the generations and models of the CPUs that power the nodes. GKE allows you to choose a baseline minimum CPU platform for your node pool, which is a group of nodes with the same configuration. By choosing a baseline minimum CPU platform, you can ensure that your nodes have the CPU features and capabilities that match your workload requirements2.

For the use case of serving a few thousand queries per second and experiencing latency issues, the best option is to recompile TensorFlow Serving using the source to support CPU-specific optimizations, and instruct GKE to choose an appropriate baseline minimum CPU platform for serving nodes. This option can improve the serving latency and performance without changing the underlying infrastructure, as it only involves rebuilding the TensorFlow Serving binary and selecting the CPU platform for the GKE nodes. This option can also take advantage of the CPU-only pods that are running on GKE, as it can optimize the CPU utilization and efficiency. Therefore, recompiling TensorFlow Serving using the source to support CPU-specific optimizations and instructing GKE to choose an appropriate baseline minimum CPU platform for serving nodes is the best option for this use case.

References:

Building TensorFlow Serving from source

Specifying a minimum CPU platform for a node pool

Professional-Machine-Learning-Engineer PDF/Engine

Printable Format
Value of Money
100% Pass Assurance
Verified Answers
Researched by Industry Experts
Based on Real Exams Scenarios
100% Real Questions

buy now Professional-Machine-Learning-Engineer pdf

Get 65% Discount on All Products, Use Coupon: "ac4s65"

You deployed an ML model into production a year ago.

You work for a retail company that is using a regression model built with BigQuery...

Weekend Sale Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: ac4s65

You developed an ML model with Al Platform, and you want to move it to...

The Answer Is:

Explanation:

Quick Links