Name: CCA Spark and Hadoop Developer
Brand: ValidExamDumps
SKU: CCA175
Price: 20 USD
Availability: InStock
Rating: 4.9 (680 reviews)

Free Cloudera CCA175 Exam Actual Questions

The questions for CCA175 were last updated On Apr 30, 2025

At ValidExamDumps, we consistently monitor updates to the Cloudera CCA175 exam questions by Cloudera. Whenever our team identifies changes in the exam questions,exam objectives, exam focus areas or in exam requirements, We immediately update our exam questions for both PDF and online practice exams. This commitment ensures our customers always have access to the most current and accurate questions. By preparing with these actual questions, our customers can successfully pass the Cloudera CCA Spark and Hadoop Developer exam on their first attempt without needing additional materials or study guides.

Other certification materials providers often include outdated or removed questions by Cloudera in their Cloudera CCA175 exam. These outdated questions lead to customers failing their Cloudera CCA Spark and Hadoop Developer exam. In contrast, we ensure our questions bank includes only precise and up-to-date questions, guaranteeing their presence in your actual exam. Our main priority is your success in the Cloudera CCA175 exam, not profiting from selling obsolete exam questions in PDF or Online Practice Test.

Question No. 1

Problem Scenario 91 : You have been given data in json format as below.

{"first_name":"Ankit", "last_name":"Jain"}

{"first_name":"Amir", "last_name":"Khan"}

{"first_name":"Rajesh", "last_name":"Khanna"}

{"first_name":"Priynka", "last_name":"Chopra"}

{"first_name":"Kareena", "last_name":"Kapoor"}

{"first_name":"Lokesh", "last_name":"Yadav"}

Do the following activity

1. create employee.json tile locally.

2. Load this tile on hdfs

3. Register this data as a temp table in Spark using Python.

4. Write select query and print this data.

5. Now save back this selected data in json format.

ASolution :
Step 1 : create employee.json tile locally.
vi employee.json (press insert) past the content.
Step 2 : Upload this tile to hdfs, default location hadoop fs -put employee.json
val employee = sqlContext.read.json('/user/cloudera/employee.json')
employee.write.parquet('employee. parquet')
val parq_data = sqlContext.read.parquet('employee.parquet')
import org.apache.spark.sql.SaveMode prdDF.write..format('orc').saveAsTable('product ore table'}
//Change the codec.
sqlContext.setConf('spark.sql.parquet.compression.codec','snappy')
employee.write.mode(SaveMode.Overwrite).parquet('employee.parquet')

BSolution :
Step 1 : create employee.json tile locally.
vi employee.json (press insert) past the content.
Step 2 : Upload this tile to hdfs, default location hadoop fs -put employee.json
val employee = sqlContext.read.json('/user/cloudera/employee.json')
employee.write.parquet('employee. parquet')
val parq_data = sqlContext.read.parquet('employee.parquet')
parq_data.registerTempTable('employee')
val allemployee = sqlContext.sql('SELeCT' FROM employee')
all_employee.show()
import org.apache.spark.sql.SaveMode prdDF.write..format('orc').saveAsTable('product ore table'}
//Change the codec.
sqlContext.setConf('spark.sql.parquet.compression.codec','snappy')
employee.write.mode(SaveMode.Overwrite).parquet('employee.parquet')

Show Answer

Correct Answer: B

Question No. 2

Problem Scenario 86 : In Continuation of previous question, please accomplish following activities.

1. Select Maximum, minimum, average , Standard Deviation, and total quantity.

2. Select minimum and maximum price for each product code.

3. Select Maximum, minimum, average , Standard Deviation, and total quantity for each product code, hwoever make sure Average and Standarddeviation will have maximum two decimal values.

4. Select all the product code and average price only where product count is more than or equal to 3.

5. Select maximum, minimum , average and total of all the products for each code. Also produce the same across all the products.

ASolution :
Step 1 : Select Maximum, minimum, average , Standard Deviation, and total quantity.
val results = sqlContext.sql('.....SELECT MAX(price) AS MAX , MIN(price) AS MIN , AVG(price) AS Average, STD(price) AS STD, SUM(quantity) AStotal_products FROM products......)
results. showQ
Step 2 : Select minimum and maximum price for each product code.
val results = sqlContext.sql(......SELECT code, MAX(price) AS Highest Price', MIN(price) AS Lowest Price'
FROM products GROUP BY code......)
results. showQ
Step 3 : Select Maximum, minimum, average , Standard Deviation, and total quantity for each product code, hwoever make sure Average andStandard deviation will have maximum two decimal values.
val results = sqlContext.sql(......SELECT code, MAX(price), MIN(price),
CAST(AVG(price} AS DECIMAL(7,2)) AS Average', CAST(STD(price) AS DECIMAL(7,2)) AS 'Std Dev\ SUM(quantity) FROM products
GROUP BY code......)
results. showQ
Step 4 : Select all the product code and average price only where product count is more than or equal to 3.
val results = sqlContext.sql(......SELECT code AS Product Code',
COUNTf) AS Count',
CAST(AVG(price) AS DECIMAL(7,2)) AS Average' FROM products GROUP BY code
HAVING Count >=3'M') results. showQ
Step 5 : Select maximum, minimum , average and total of all the products for each code. Also produce the same across all the products.
val results = sqlContext.sql( '''SELECT
code,
MAX(price),
MIN(pnce),
CAST(AVG(price) AS DECIMAL(7,2)) AS Average',
SUM(quantity)-
FROM products
GROUP BY code
WITH ROLLUP''' )
results. show()

BSolution :
Step 1 : Select Maximum, minimum, average , Standard Deviation, and total quantity.
val results = sqlContext.sql('.....SELECT MAX(price) AS MAX , MIN(price) AS MIN , AVG(price) AS Average, STD(price) AS STD, SUM(quantity) AStotal_products FROM products......)
results. showQ
Step 2 : Select minimum and maximum price for each product code.
val results = sqlContext.sql(......SELECT code, MAX(price) AS Highest Price', MIN(price) AS Lowest Price'
FROM products GROUP BY code......)
results. showQ
Step 3 : Select Maximum, minimum, average , Standard Deviation, and total quantity for each product code, hwoever make sure Average andStandard deviation will have maximum two decimal values.
val results = sqlContext.sql(......SELECT code, MAX(price), MIN(price),
GROUP BY code......)
results. showQ
Step 4 : Select all the product code and average price only where product count is more than or equal to 3.
val results = sqlContext.sql(......SELECT code AS Product Code',
COUNTf) AS Count',
CAST(AVG(price) AS DECIMAL(7,2)) AS Average' FROM products GROUP BY code
HAVING Count >=3'M') results. showQ
Step 5 : Select maximum, minimum , average and total of all the products for each code. Also produce the same across all the products.
val results = sqlContext.sql( '''SELECT
code,
MAX(price),
WITH ROLLUP''' )
results. show()

Show Answer

Correct Answer: A

Question No. 3

Problem Scenario 85 : In Continuation of previous question, please accomplish following activities.

1. Select all the columns from product table with output header as below. productID AS ID

code AS Code name AS Description price AS 'Unit Price'

2. Select code and name both separated by ' -' and header name should be Product Description'.

3. Select all distinct prices.

4. Select distinct price and name combination.

5. Select all price data sorted by both code and productID combination.

6. count number of products.

7. Count number of products for each code.

ASolution :
Step 1 : Select all the columns from product table with output header as below. productID AS ID code AS Code name AS Description price AS 'Unit Price'
val results = sqlContext.sql(......SELECT productID AS ID, code AS Code, name AS Description, price AS Unit Price' FROM products ORDER BY ID'''
results.show()
Step 2 : Select code and name both separated by ' -' and header name should be 'Product Description.
val results = sqlContext.sql(......SELECT CONCAT(code,' -', name) AS Product Description, price FROM products''' )
results.showQ
Step 3 : Select all distinct prices.
val results = sqlContext.sql(......SELECT DISTINCT price AS Distinct Price' FROM products......)
results.show()
Step 4 : Select distinct price and name combination.
val results = sqlContext.sql(......SELECT DISTINCT price, name FROM products''' )
results. showQ
Step 5 : Select all price data sorted by both code and productID combination.
val results = sqlContext.sql('.....SELECT' FROM products ORDER BY code, productID'.....)
results.show()
Step 6 : count number of products.
val results = sqlContext.sql(......SELECT COUNT(') AS 'Count' FROM products......)
results.show()
Step 7 : Count number of products for each code.
val results = sqlContext.sql(......SELECT code, COUNT('} FROM products GROUP BY code......)
results. showQ
val results = sqlContext.sql(......SELECT code, COUNT('} AS count FROM products GROUP BY code ORDER BY count DESC......)
results. showQ

BSolution :
Step 1 : Select all the columns from product table with output header as below. productID AS ID code AS Code name AS Description price AS 'Unit Price'
val results = sqlContext.sql(......SELECT productID AS ID, code AS Code, name AS Description, price AS Unit Price' FROM products ORDER BY ID'''
results.show()
Step 2 : Select code and name both separated by ' -' and header name should be 'Product Description.
val results = sqlContext.sql(......SELECT CONCAT(code,' -', name) AS Product Description, price FROM products''' )
results.showQ
Step 3 : Select all distinct prices.
val results = sqlContext.sql(......SELECT DISTINCT price AS Distinct Price' FROM products......)
results.show()
Step 4 : count number of products.
val results = sqlContext.sql(......SELECT COUNT(') AS 'Count' FROM products......)
results.show()
Step 5 : Count number of products for each code.
val results = sqlContext.sql(......SELECT code, COUNT('} FROM products GROUP BY code......)
results. showQ
val results = sqlContext.sql(......SELECT code, COUNT('} AS count FROM products GROUP BY code ORDER BY count DESC......)
results. showQ

Show Answer

Correct Answer: A

Question No. 4

Problem Scenario 69 : Write down a Spark Application using Python,

In which it read a file "Content.txt" (On hdfs) with following content.

And filter out the word which is less than 2 characters and ignore all empty lines.

Once doen store the filtered data in a directory called "problem84" (On hdfs)

Content.txt

Hello this is ABCTECH.com

This is ABYTECH.com

Apache Spark Training

This is Spark Learning Session

Spark is faster than MapReduce

ASolution :
Step 1 : Create an application with following code and store it in problem84.py
# Import SparkContext and SparkConf
from pyspark import SparkContext, SparkConf
# Create configuration object and set App name
conf = SparkConf().setAppName('CCA 175 Problem 84') sc = sparkContext(conf=conf)
#load data from hdfs
contentRDD = sc.textFile(MContent.txt')
#filter out non-empty lines
nonemptyjines = contentRDD.filter(lambda x: len(x) > 0)
#Split line based on space
words = nonempty_lines.ffatMap(lambda x: x.split(''}}
#filter out all 2 letter words
finalRDD = words.filter(lambda x: len(x) > 2)
for word in finalRDD.collect():
print(word)
#Save final data finalRDD.saveAsTextFile('problem84M)
step 2 : Submit this application
spark-submit -master yarn problem84.py

BSolution :
Step 1 : Create an application with following code and store it in problem84.py
# Import SparkContext and SparkConf
from pyspark import SparkContext, SparkConf
# Create configuration object and set App name
conf = SparkConf().setAppName('CCA 175 Problem 84') sc = sparkContext(conf=conf)
#load data from hdfs
print(word)
#Save final data finalRDD.saveAsTextFile('problem84M)
step 2 : Submit this application
spark-submit -master yarn problem84.py

Show Answer

Correct Answer: A

Question No. 5

Problem Scenario 29 : Please accomplish the following exercises using HDFS command line options.

1. Create a directory in hdfs named hdfs_commands.

2. Create a file in hdfs named data.txt in hdfs_commands.

3. Now copy this data.txt file on local filesystem, however while copying file please make sure file properties are not changed e.g.file permissions.

4. Now create a file in local directory named data_local.txt and move this file to hdfs in hdfs_commands directory.

5. Create a file data_hdfs.txt in hdfs_commands directory and copy it to local file system.

6. Create a file in local filesystem named file1.txt and put it to hdfs

ASolution :
Step 1 : Create directory
hdfs dfs -mkdir hdfs_commands
Step 2 : Create a file in hdfs named data.txt in hdfs_commands. hdfs dfs -touchz hdfs_commands/data.txt
Step 3 : Now copy this data.txt file on local filesystem, however while copying file please make sure file properties are not changed e.g. file permissions.
hdfs dfs -copyToLocal -p hdfs_commands/data.txt/home/cloudera/Desktop/HadoopExam
Step 4 : Create a file in local filesystem named filel .txt and put it to hdfs
touch filel.txt
hdfs dfs -put/home/cloudera/Desktop/HadoopExam/file1.txt hdfs_commands/

BSolution :
Step 1 : Create directory
hdfs dfs -mkdir hdfs_commands
Step 2 : Create a file in hdfs named data.txt in hdfs_commands. hdfs dfs -touchz hdfs_commands/data.txt
Step 3 : Now copy this data.txt file on local filesystem, however while copying file please make sure file properties are not changed e.g. file permissions.
hdfs dfs -copyToLocal -p hdfs_commands/data.txt/home/cloudera/Desktop/HadoopExam
Step 4 : Now create a file in local directory named data_local.txt and move this file to hdfs in hdfs_commands directory.
touch data_local.txt
hdfs dfs -moveFromLocal /home/cloudera/Desktop/HadoopExam/dataJocal.txt hdfs_commands/
Step 5 : Create a file data_hdfs.txt in hdfs_commands directory and copy it to local file system.
hdfs dfs -touchz hdfscommands/data hdfs.txt
hdfs dfs -getfrdfs_commands/data_hdfs.txt /home/cloudera/Desktop/HadoopExam/
Step 6 : Create a file in local filesystem named filel .txt and put it to hdfs
touch filel.txt
hdfs dfs -put/home/cloudera/Desktop/HadoopExam/file1.txt hdfs_commands/

Show Answer

Correct Answer: B