Free Google Professional-Machine-Learning-Engineer Exam Actual Questions & Explanations

Last updated on: Jun 28, 2026
Author: Sara Foster (Google Cloud Certification Specialist)

The Google Professional Machine Learning Engineer certification validates your ability to design, build, and deploy machine learning solutions on Google Cloud. This exam is for practitioners who have hands-on experience with ML workflows and want to demonstrate expertise in production-grade systems. This page outlines the exam syllabus, question formats, and a practical study roadmap to help you prepare effectively for the Google Cloud Certified, Cloud Engineer path.

Professional Machine Learning Engineer Exam Syllabus & Core Topics

Use this topic map to guide your study for the Google Professional Machine Learning Engineer certification within the Google Cloud Certified, Cloud Engineer path.

  • Framing ML Problems: Define business objectives, identify success metrics, and determine whether a machine learning solution is appropriate for a given use case. You must evaluate trade-offs between model complexity, data requirements, and business constraints.
  • Architecting ML Solutions: Design end-to-end ML systems that integrate with Google Cloud services. Select appropriate tools, services, and frameworks; plan for scalability, security, and cost optimization across development and production environments.
  • Designing Data Preparation and Processing Systems: Build robust data pipelines that clean, transform, and validate data at scale. Handle missing values, outliers, and feature engineering; ensure data quality and reproducibility across training and serving.
  • Developing ML Models: Train, evaluate, and tune models using appropriate algorithms and techniques. Interpret model performance, manage hyperparameters, and select validation strategies that prevent overfitting and ensure generalization.
  • Automating and Orchestrating ML Pipelines: Build automated workflows that coordinate data ingestion, training, evaluation, and deployment. Implement CI/CD practices and manage model versioning to enable repeatable, reliable production systems.
  • Monitoring, Optimizing, and Maintaining ML Solutions: Track model performance in production, detect data drift and model degradation, and implement retraining strategies. Optimize costs and latency while maintaining accuracy and reliability over time.

Question Formats & What They Test

The exam uses multiple question types to assess both conceptual knowledge and applied reasoning in real-world ML scenarios. Questions progress in difficulty and require you to think beyond memorization to solve practical challenges.

  • Multiple Choice: Test core definitions, feature behavior, Google Cloud service capabilities, and key ML terminology. These items validate foundational understanding and help identify knowledge gaps.
  • Scenario-Based Items: Present realistic project situations and ask you to choose the best architectural, design, or optimization decision. You analyze constraints, trade-offs, and requirements to recommend solutions.
  • Case Studies: Describe a business problem and ML workflow, then ask multi-part questions about problem framing, solution design, data handling, and production considerations. These test your ability to connect concepts across the full ML lifecycle.

Questions emphasize practical judgment and integration of knowledge rather than isolated facts, reflecting how ML engineers work in production environments.

Preparation Guidance

An effective study plan maps the six core topic areas to weekly goals, incorporates hands-on practice, and includes mock exams to build confidence. Structure your preparation around real workflows rather than isolated topics to deepen understanding and retention.

  • Assign each topic area (Framing ML Problems, Architecting ML Solutions, Designing Data Preparation and Processing Systems, Developing ML Models, Automating and Orchestrating ML Pipelines, Monitoring, Optimizing, and Maintaining ML Solutions) to one or two weeks; track your progress weekly.
  • Work through practice question sets after each topic block; review explanations carefully to understand why answers are correct and where your reasoning differed.
  • Connect concepts across the ML lifecycle: observe how data preparation decisions affect model training, how architecture choices impact monitoring, and how automation enables reliability.
  • Complete a full-length, timed practice test in your final week to simulate exam conditions, identify remaining weak areas, and build pacing confidence.
  • Review Google Cloud documentation and hands-on labs for services mentioned in questions (Vertex AI, BigQuery, Dataflow, etc.) to reinforce practical context.

Explore other Google certifications: view all Google exams.

Get the PDF & Practice Test

Strengthen your preparation with up-to-date resources from validexamdumps.com. These materials align to the Professional Machine Learning Engineer exam and cover practical scenarios with clear explanations.

  • Q&A PDF with explanations: Topic-mapped questions that clarify why correct options are right and others aren't, helping you build deeper understanding.
  • Practice Test: Realistic items in timed and untimed modes, progress tracking, and detailed review to identify improvement areas.
  • Focused coverage: Aligned to Framing ML Problems, Architecting ML Solutions, Designing Data Preparation and Processing Systems, Developing ML Models, Automating and Orchestrating ML Pipelines, and Monitoring, Optimizing, and Maintaining ML Solutions, so you study what matters most.
  • Regular updates: Content refreshes that reflect syllabus and Google Cloud product changes.

Visit the exam page to download the PDF, Online Practice Test, or get a Bundle Discount for both formats: Google Professional Machine Learning Engineer.

Frequently Asked Questions

Which exam topics carry the most weight on the Professional Machine Learning Engineer exam?

Architecting ML Solutions and Developing ML Models typically account for a larger portion of the exam. However, all six topic areas are tested, and success requires balanced preparation across the full syllabus. Focus on understanding how each area connects to real production workflows rather than trying to predict question distribution.

How do data preparation, model training, and pipeline automation relate in a real project?

In practice, these topics form a continuous cycle. Data preparation ensures clean, consistent input for training; model training produces an artifact; and pipeline automation orchestrates the entire flow so it runs reliably and repeatably. Understanding this integration helps you answer scenario questions that ask you to choose the right tool or approach at each stage.

How much hands-on experience with Google Cloud is needed, and which labs should I prioritize?

Hands-on experience is valuable but not strictly required if you study the concepts thoroughly. Prioritize labs that cover Vertex AI (training and deployment), BigQuery (data processing), and Dataflow (pipeline orchestration), as these services appear frequently in questions. Practical familiarity with how services work together strengthens your ability to make architectural decisions.

What are common mistakes that cause candidates to lose points?

Common pitfalls include choosing the lowest-cost solution without considering reliability or accuracy trade-offs, overlooking data quality and preprocessing steps, and misunderstanding when to use different model types or validation strategies. Read scenario questions carefully to identify all constraints, and avoid selecting answers based on one factor alone.

How should I approach the final week before the exam?

Use your final week for review and practice testing rather than learning new material. Take a full-length timed practice test mid-week, review weak areas, and do a lighter second practice test near exam day to maintain confidence. Get adequate sleep the night before, and on exam day, read each question carefully and manage your time to avoid rushing through scenario-based items.

Question No. 1

You received a training-serving skew alert from a Vertex Al Model Monitoring job running in production. You retrained the model with more recent training data, and deployed it back to the Vertex Al endpoint but you are still receiving the same alert. What should you do?

Show Answer Hide Answer
Correct Answer: B

The best option for resolving the training-serving skew alert is to update the model monitoring job to use the more recent training data that was used to retrain the model. This option can help align the baseline distribution of the model monitoring job with the current distribution of the production data, and eliminate the false positive alerts. Model Monitoring is a service that can track and compare the results of multiple machine learning runs. Model Monitoring can monitor the model's prediction input data for feature skew and drift. Training-serving skew occurs when the feature data distribution in production deviates from the feature data distribution used to train the model. If the original training data is available, you can enable skew detection to monitor your models for training-serving skew. Model Monitoring uses TensorFlow Data Validation (TFDV) to calculate the distributions and distance scores for each feature, and compares them with a baseline distribution. The baseline distribution is the statistical distribution of the feature's values in the training data. If the distance score for a feature exceeds an alerting threshold that you set, Model Monitoring sends you an email alert. However, if you retrain the model with more recent training data, and deploy it back to the Vertex AI endpoint, the baseline distribution of the model monitoring job may become outdated and inconsistent with the current distribution of the production data. This can cause the model monitoring job to generate false positive alerts, even if the model performance is not deteriorated. To avoid this problem, you need to update the model monitoring job to use the more recent training data that was used to retrain the model. This can help the model monitoring job to recalculate the baseline distribution and the distance scores, and compare them with the current distribution of the production data.This can also help the model monitoring job to detect any true positive alerts, such as a sudden change in the production data that causes the model performance to degrade1.

The other options are not as good as option B, for the following reasons:

Option A: Updating the model monitoring job to use a lower sampling rate would not resolve the training-serving skew alert, and could reduce the accuracy and reliability of the model monitoring job. The sampling rate is a parameter that determines the percentage of prediction requests that are logged and analyzed by the model monitoring job. Using a lower sampling rate can reduce the storage and computation costs of the model monitoring job, but also the quality and validity of the data. Using a lower sampling rate can introduce sampling bias and noise into the data, and make the model monitoring job miss some important features or patterns of the data.Moreover, using a lower sampling rate would not address the root cause of the training-serving skew alert, which is the mismatch between the baseline distribution and the current distribution of the production data2.

Option C: Temporarily disabling the alert, and enabling the alert again after a sufficient amount of new production traffic has passed through the Vertex AI endpoint, would not resolve the training-serving skew alert, and could expose the model to potential risks and errors. Disabling the alert would stop the model monitoring job from sending email notifications when the distance score for a feature exceeds the alerting threshold, but it would not stop the model monitoring job from calculating and comparing the distributions and distance scores. Therefore, disabling the alert would not address the root cause of the training-serving skew alert, which is the mismatch between the baseline distribution and the current distribution of the production data. Moreover, disabling the alert would prevent the model monitoring job from detecting any true positive alerts, such as a sudden change in the production data that causes the model performance to degrade.This can expose the model to potential risks and errors, and affect the user satisfaction and trust1.

Option D: Temporarily disabling the alert until the model can be retrained again on newer training data, and retraining the model again after a sufficient amount of new production traffic has passed through the Vertex AI endpoint, would not resolve the training-serving skew alert, and could cause unnecessary costs and efforts. Disabling the alert would stop the model monitoring job from sending email notifications when the distance score for a feature exceeds the alerting threshold, but it would not stop the model monitoring job from calculating and comparing the distributions and distance scores. Therefore, disabling the alert would not address the root cause of the training-serving skew alert, which is the mismatch between the baseline distribution and the current distribution of the production data. Moreover, disabling the alert would prevent the model monitoring job from detecting any true positive alerts, such as a sudden change in the production data that causes the model performance to degrade. This can expose the model to potential risks and errors, and affect the user satisfaction and trust. Retraining the model again on newer training data would create a new model version, but it would not update the model monitoring job to use the newer training data as the baseline distribution.Therefore, retraining the model again on newer training data would not resolve the training-serving skew alert, and could cause unnecessary costs and efforts1.


Preparing for Google Cloud Certification: Machine Learning Engineer, Course 3: Production ML Systems, Week 4: Evaluation

Google Cloud Professional Machine Learning Engineer Exam Guide, Section 3: Scaling ML models in production, 3.3 Monitoring ML models in production

Official Google Cloud Certified Professional Machine Learning Engineer Study Guide, Chapter 6: Production ML Systems, Section 6.3: Monitoring ML Models

Using Model Monitoring

Understanding the score threshold slider

Sampling rate

Question No. 2

You are experimenting with a built-in distributed XGBoost model in Vertex AI Workbench user-managed notebooks. You use BigQuery to split your data into training and validation sets using the following queries:

CREATE OR REPLACE TABLE 'myproject.mydataset.training' AS

(SELECT * FROM 'myproject.mydataset.mytable' WHERE RAND() < 0.8);

CREATE OR REPLACE TABLE 'myproject.mydataset.validation' AS

(SELECT * FROM 'myproject.mydataset.mytable' WHERE RAND() < 0.2);

After training the model, you achieve an area under the receiver operating characteristic curve (AUC ROC) value of 0.8, but after deploying the model to production, you notice that your model performance has dropped to an AUC ROC value of 0.65. What problem is most likely occurring?

Show Answer Hide Answer
Correct Answer: C

The most likely problem is that the tables that you created to hold your training and validation records share some records, and you may not be using all the data in your initial table. This is because the RAND() function generates a random number between 0 and 1 for each row, and the probability of a row being in both the training and validation tables is 0.2 * 0.8 0.16, which is not negligible. This means that some of the records that you use to validate your model are also used to train your model, which can lead to overfitting and poor generalization. Moreover, the probability of a row being in neither the training nor the validation table is 0.2 * 0.2 0.04, which means that you are wasting some of the data in your initial table and reducing the size of your datasets. A better way to split your data into training and validation sets is to use a hash function on a unique identifier column, such as the following queries:

CREATE OR REPLACE TABLE 'myproject.mydataset.training' AS (SELECT * FROM 'myproject.mydataset.mytable' WHERE MOD(FARM_FINGERPRINT(id), 10) < 8); CREATE OR REPLACE TABLE 'myproject.mydataset.validation' AS (SELECT * FROM 'myproject.mydataset.mytable' WHERE MOD(FARM_FINGERPRINT(id), 10) > 8);

This way, you can ensure that each row has a fixed 80% chance of being in the training table and a 20% chance of being in the validation table, without any overlap or omission.


Professional ML Engineer Exam Guide

Preparing for Google Cloud Certification: Machine Learning Engineer Professional Certificate

Google Cloud launches machine learning engineer certification

BigQuery ML: Splitting data for training and testing

BigQuery: FARM_FINGERPRINT function

Question No. 3

You work for a global footwear retailer and need to predict when an item will be out of stock based on historical inventory dat

a. Customer behavior is highly dynamic since footwear demand is influenced by many different factors. You want to serve models that are trained on all available data, but track your performance on specific subsets of data before pushing to production. What is the most streamlined and reliable way to perform this validation?

Show Answer Hide Answer
Correct Answer: A

TFX ModelValidatoris a tool that allows you to compare new models against a baseline model and evaluate their performance on different metrics and data slices1. You can use this tool to validate your models before deploying them to production and ensure that they meet your expectations and requirements.

k-fold cross-validationis a technique that splits the data into k subsets and trains the model on k-1 subsets while testing it on the remaining subset.This is repeated k times and the average performance is reported2. This technique is useful for estimating the generalization error of a model, but it does not account for the dynamic nature of customer behavior or the potential changes in data distribution over time.

Using the last relevant week of data as a validation setis a simple way to check the model's performance on recent data, but it may not be representative of the entire data or capture the long-term trends and patterns. It also does not allow you to compare the model with a baseline or evaluate it on different data slices.

Using the entire dataset and treating the AUC ROC as the main metricis not a good practice because it does not leave any data for validation or testing. It also assumes that the AUC ROC is the only metric that matters, which may not be true for your business problem. You may want to consider other metrics such as precision, recall, or revenue.


Question No. 4

You are developing a mode! to detect fraudulent credit card transactions. You need to prioritize detection because missing even one fraudulent transaction could severely impact the credit card holder. You used AutoML to tram a model on users' profile information and credit card transaction dat

a. After training the initial model, you notice that the model is failing to detect many fraudulent transactions. How should you adjust the training parameters in AutoML to improve model performance?

Choose 2 answers

Show Answer Hide Answer
Correct Answer: B, C

The best options for adjusting the training parameters in AutoML to improve model performance are to decrease the score threshold and add more positive examples to the training set. These options can help increase the detection rate of fraudulent transactions, which is the priority for this use case. The score threshold is a parameter that determines the minimum probability score that a prediction must have to be classified as positive. Decreasing the score threshold can increase the recall of the model, which is the proportion of actual positive cases that are correctly identified. Increasing the recall can help reduce the number of false negatives, which are fraudulent transactions that are missed by the model. However, decreasing the score threshold can also decrease the precision of the model, which is the proportion of positive predictions that are actually correct. Decreasing the precision can increase the number of false positives, which are legitimate transactions that are flagged as fraudulent by the model.Therefore, there is a trade-off between recall and precision, and the optimal score threshold depends on the business objective and the cost of errors1. Adding more positive examples to the training set can help balance the data distribution and improve the model performance. Positive examples are the instances that belong to the target class, which in this case are fraudulent transactions. Negative examples are the instances that belong to the other class, which in this case are legitimate transactions. Fraudulent transactions are usually rare and imbalanced compared to legitimate transactions, which can cause the model to be biased towards the majority class and fail to learn the characteristics of the minority class.Adding more positive examples can help the model learn more features and patterns of the fraudulent transactions, and increase the detection rate2.

The other options are not as good as options B and C, for the following reasons:

Option A: Increasing the score threshold would decrease the detection rate of fraudulent transactions, which is the opposite of the desired outcome. Increasing the score threshold would decrease the recall of the model, which is the proportion of actual positive cases that are correctly identified. Decreasing the recall would increase the number of false negatives, which are fraudulent transactions that are missed by the model. Increasing the score threshold would increase the precision of the model, which is the proportion of positive predictions that are actually correct. Increasing the precision would decrease the number of false positives, which are legitimate transactions that are flagged as fraudulent by the model.However, in this use case, the cost of false negatives is much higher than the cost of false positives, so increasing the score threshold is not a good option1.

Option D: Adding more negative examples to the training set would not improve the model performance, and could worsen the data imbalance. Negative examples are the instances that belong to the other class, which in this case are legitimate transactions. Legitimate transactions are usually abundant and dominant compared to fraudulent transactions, which can cause the model to be biased towards the majority class and fail to learn the characteristics of the minority class.Adding more negative examples would exacerbate this problem, and decrease the detection rate of the fraudulent transactions2.

Option E: Reducing the maximum number of node hours for training would not improve the model performance, and could limit the model optimization. Node hours are the units of computation that are used to train an AutoML model. The maximum number of node hours is a parameter that determines the upper limit of node hours that can be used for training. Reducing the maximum number of node hours would reduce the training time and cost, but also the model quality and accuracy.Reducing the maximum number of node hours would limit the number of iterations, trials, and evaluations that the model can perform, and prevent the model from finding the optimal hyperparameters and architecture3.


Preparing for Google Cloud Certification: Machine Learning Engineer, Course 5: Responsible AI, Week 4: Evaluation

Google Cloud Professional Machine Learning Engineer Exam Guide, Section 2: Developing high-quality ML models, 2.2 Handling imbalanced data

Official Google Cloud Certified Professional Machine Learning Engineer Study Guide, Chapter 4: Low-code ML Solutions, Section 4.3: AutoML

Understanding the score threshold slider

Handling imbalanced data sets in machine learning

AutoML Vision pricing

Question No. 5

You have been asked to build a model using a dataset that is stored in a medium-sized (~10GB) BigQuery table. You need to quickly determine whether this data is suitable for model development. You want to create a one-time report that includes both informative visualizations of data distributions and more sophisticated statistical analyses to share with other ML engineers on your team. You require maximum flexibility to create your report. What should you do?

Show Answer Hide Answer
Correct Answer: A

Option A is correct because using Vertex AI Workbench user-managed notebooks to generate the report is the best way to quickly determine whether the data is suitable for model development, and to create a one-time report that includes both informative visualizations of data distributions and more sophisticated statistical analyses to share with other ML engineers on your team. Vertex AI Workbench is a service that allows you to create and use notebooks for ML development and experimentation. You can use Vertex AI Workbench to connect to your BigQuery table, query and analyze the data using SQL or Python, and create interactive charts and plots using libraries such as pandas, matplotlib, or seaborn. You can also use Vertex AI Workbench to perform more advanced data analysis, such as outlier detection, feature engineering, or hypothesis testing, using libraries such as TensorFlow Data Validation, TensorFlow Transform, or SciPy. You can export your notebook as a PDF or HTML file, and share it with your team. Vertex AI Workbench provides maximum flexibility to create your report, as you can use any code or library that you want, and customize the report as you wish.

Option B is incorrect because using Google Data Studio to create the report is not the most flexible way to quickly determine whether the data is suitable for model development, and to create a one-time report that includes both informative visualizations of data distributions and more sophisticated statistical analyses to share with other ML engineers on your team. Google Data Studio is a service that allows you to create and share interactive dashboards and reports using data from various sources, such as BigQuery, Google Sheets, or Google Analytics. You can use Google Data Studio to connect to your BigQuery table, explore and visualize the data using charts, tables, or maps, and apply filters, calculations, or aggregations to the data. However, Google Data Studio does not support more sophisticated statistical analyses, such as outlier detection, feature engineering, or hypothesis testing, which may be useful for model development. Moreover, Google Data Studio is more suitable for creating recurring reports that need to be updated frequently, rather than one-time reports that are static.

Option C is incorrect because using the output from TensorFlow Data Validation on Dataflow to generate the report is not the most efficient way to quickly determine whether the data is suitable for model development, and to create a one-time report that includes both informative visualizations of data distributions and more sophisticated statistical analyses to share with other ML engineers on your team. TensorFlow Data Validation is a library that allows you to explore, validate, and monitor the quality of your data for ML. You can use TensorFlow Data Validation to compute descriptive statistics, detect anomalies, infer schemas, and generate data visualizations for your data. Dataflow is a service that allows you to create and run scalable data processing pipelines using Apache Beam. You can use Dataflow to run TensorFlow Data Validation on large datasets, such as those stored in BigQuery. However, this option is not very efficient, as it involves moving the data from BigQuery to Dataflow, creating and running the pipeline, and exporting the results. Moreover, this option does not provide maximum flexibility to create your report, as you are limited by the functionalities of TensorFlow Data Validation, and you may not be able to customize the report as you wish.

Option D is incorrect because using Dataprep to create the report is not the most flexible way to quickly determine whether the data is suitable for model development, and to create a one-time report that includes both informative visualizations of data distributions and more sophisticated statistical analyses to share with other ML engineers on your team. Dataprep is a service that allows you to explore, clean, and transform your data for analysis or ML. You can use Dataprep to connect to your BigQuery table, inspect and profile the data using histograms, charts, or summary statistics, and apply transformations, such as filtering, joining, splitting, or aggregating, to the data. However, Dataprep does not support more sophisticated statistical analyses, such as outlier detection, feature engineering, or hypothesis testing, which may be useful for model development. Moreover, Dataprep is more suitable for creating data preparation workflows that need to be executed repeatedly, rather than one-time reports that are static.


Vertex AI Workbench documentation

Google Data Studio documentation

TensorFlow Data Validation documentation

Dataflow documentation

Dataprep documentation

[BigQuery documentation]

[pandas documentation]

[matplotlib documentation]

[seaborn documentation]

[TensorFlow Transform documentation]

[SciPy documentation]

[Apache Beam documentation]