At ValidExamDumps, we consistently monitor updates to the Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 exam questions by Databricks. Whenever our team identifies changes in the exam questions,exam objectives, exam focus areas or in exam requirements, We immediately update our exam questions for both PDF and online practice exams. This commitment ensures our customers always have access to the most current and accurate questions. By preparing with these actual questions, our customers can successfully pass the Databricks Certified Associate Developer for Apache Spark 3.0 exam on their first attempt without needing additional materials or study guides.
Other certification materials providers often include outdated or removed questions by Databricks in their Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 exam. These outdated questions lead to customers failing their Databricks Certified Associate Developer for Apache Spark 3.0 exam. In contrast, we ensure our questions bank includes only precise and up-to-date questions, guaranteeing their presence in your actual exam. Our main priority is your success in the Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 exam, not profiting from selling obsolete exam questions in PDF or Online Practice Test.
Which of the following code blocks adds a column predErrorSqrt to DataFrame transactionsDf that is the square root of column predError?
transactionsDf.withColumn('predErrorSqrt', sqrt(col('predError')))
Correct. The DataFrame.withColumn() operator is used to add a new column to a DataFrame. It takes two arguments: The name of the new column (here: predErrorSqrt) and a Column expression
as the new column. In PySpark, a Column expression means referring to a column using the col('predError') command or by other means, for example by transactionsDf.predError, or even just
using the column name as a string, 'predError'.
The Question: asks for the square root. sqrt() is a function in pyspark.sql.functions and calculates the square root. It takes a value or a Column as an input. Here it is the predError column of
DataFrame transactionsDf expressed through col('predError').
transactionsDf.withColumn('predErrorSqrt', sqrt(predError))
Incorrect. In this expression, sqrt(predError) is incorrect syntax. You cannot refer to predError in this way -- to Spark it looks as if you are trying to refer to the non-existent Python variable predError.
You could pass transactionsDf.predError, col('predError') (as in the correct solution), or even just 'predError' instead.
transactionsDf.select(sqrt(predError))
Wrong. Here, the explanation just above this one about how to refer to predError applies.
transactionsDf.select(sqrt('predError'))
No. While this is correct syntax, it will return a single-column DataFrame only containing a column showing the square root of column predError. However, the Question: asks for a column to
be added to the original DataFrame transactionsDf.
transactionsDf.withColumn('predErrorSqrt', col('predError').sqrt())
No. The issue with this statement is that column col('predError') has no sqrt() method. sqrt() is a member of pyspark.sql.functions, but not of pyspark.sql.Column.
More info: pyspark.sql.DataFrame.withColumn --- PySpark 3.1.2 documentation and pyspark.sql.functions.sqrt --- PySpark 3.1.2 documentation
Static notebook | Dynamic notebook: See test 2, Question: 31 (Databricks import instructions)
Which of the following describes characteristics of the Spark UI?
There is a place in the Spark UI that shows the property spark.executor.memory.
Correct, you can see Spark properties such as spark.executor.memory in the Environment tab.
Some of the tabs in the Spark UI are named Jobs, Stages, Storage, DAGs, Executors, and SQL.
Wrong -- Jobs, Stages, Storage, Executors, and SQL are all tabs in the Spark UI. DAGs can be inspected in the 'Jobs' tab in the job details or in the Stages or SQL tab, but are not a separate tab.
Via the Spark UI, workloads can be manually distributed across distributors.
No, the Spark UI is meant for inspecting the inner workings of Spark which ultimately helps understand, debug, and optimize Spark transactions.
Via the Spark UI, stage execution speed can be modified.
No, see above.
The Scheduler tab shows how jobs that are run in parallel by multiple users are distributed across the cluster.
No, there is no Scheduler tab.
Which of the elements in the labeled panels represent the operation performed for broadcast variables?
Larger image
2,3
Correct! Both panels 2 and 3 represent the operation performed for broadcast variables. While a broadcast operation may look like panel 3, with the driver being the bottleneck, it most probably
looks like panel 2.
This is because the torrent protocol sits behind Spark's broadcast implementation. In the torrent protocol, each executor will try to fetch missing broadcast variables from the driver or other nodes,
preventing the driver from being the bottleneck.
1,2
Wrong. While panel 2 may represent broadcasting, panel 1 shows bi-directional communication which does not occur in broadcast operations.
3
No. While broadcasting may materialize like shown in panel 3, its use of the torrent protocol also enables communciation as shown in panel 2 (see first explanation).
1,3,4
No. While panel 2 shows broadcasting, panel 1 shows bi-directional communication -- not a characteristic of broadcasting. Panel 4 shows uni-directional communication, but in the wrong direction.
Panel 4 resembles more an accumulator variable than a broadcast variable.
2,5
Incorrect. While panel 2 shows broadcasting, panel 5 includes bi-directional communication -- not a characteristic of broadcasting.
More info: Broadcast Join with Spark -- henning.kropponline.de
The code block displayed below contains an error. The code block should count the number of rows that have a predError of either 3 or 6. Find the error.
Code block:
transactionsDf.filter(col('predError').in([3, 6])).count()
Correct code block:
transactionsDf.filter(col('predError').isin([3, 6])).count()
The isin method is the correct one to use here -- the in method does not exist for the Column object.
More info: pyspark.sql.Column.isin --- PySpark 3.1.2 documentation
Which of the following code blocks shows the structure of a DataFrame in a tree-like way, containing both column names and types?
itemsDf.printSchema()
Correct! Here is an example of what itemsDf.printSchema() shows, you can see the tree-like structure containing both column names and types:
root
|-- itemId: integer (nullable = true)
|-- attributes: array (nullable = true)
| |-- element: string (containsNull = true)
|-- supplier: string (nullable = true)
itemsDf.rdd.printSchema()
No, the DataFrame's underlying RDD does not have a printSchema() method.
spark.schema(itemsDf)
Incorrect, there is no spark.schema command.
print(itemsDf.columns)
print(itemsDf.dtypes)
Wrong. While the output of this code blocks contains both column names and column types, the information is not arranges in a tree-like way.
itemsDf.print.schema()
No, DataFrame does not have a print method.
Static notebook | Dynamic notebook: See test 3, Question: 36 (Databricks import instructions)