ホーム > Cloudera > Cloudera Certified > CCA175

Cloudera CCA175 問題集

試験コード：CCA175

試験名称：CCA Spark and Hadoop Developer Exam

最近更新時間：2024-04-23

問題と解答：全96問

CCA175 無料でデモをダウンロード：

PDF版 Demo ソフト版 Demo オンライン版 Demo

無料問題集CCA175 資格取得

質問 1：
CORRECT TEXT
Problem Scenario 77 : You have been given MySQL DB with following details.
user=retail_dba
password=cloudera
database=retail_db
table=retail_db.orders
table=retail_db.order_items
jdbc URL = jdbc:mysql://quickstart:3306/retail_db
Columns of order table : (orderid , order_date , order_customer_id, order_status)
Columns of ordeMtems table : (order_item_id , order_item_order_ld ,
order_item_product_id, order_item_quantity,order_item_subtotal,order_
item_product_price)
Please accomplish following activities.
1. Copy "retail_db.orders" and "retail_db.order_items" table to hdfs in respective directory p92_orders and p92 order items .
2 . Join these data using orderid in Spark and Python
3 . Calculate total revenue perday and per order
4. Calculate total and average revenue for each date. - combineByKey
-aggregateByKey
正解：
See the explanation for Step by Step Solution and configuration.
Explanation:
Solution :
Step 1 : Import Single table .
sqoop import --connect jdbc:mysql://quickstart:3306/retail_db -username=retail_dba - password=cloudera -table=orders --target-dir=p92_orders -m 1 sqoop import --connect jdbc:mysql://quickstart:3306/retail_db --username=retail_dba - password=cloudera -table=order_items --target-dir=p92_order_items -m1
Note : Please check you dont have space between before or after '=' sign. Sqoop uses the
MapReduce framework to copy data from RDBMS to hdfs
Step 2 : Read the data from one of the partition, created using above command, hadoop fs
-cat p92_orders/part-m-00000 hadoop fs -cat p92_order_items/part-m-00000
Step 3 : Load these above two directory as RDD using Spark and Python (Open pyspark terminal and do following). orders = sc.textFile("p92_orders") orderltems = sc.textFile("p92_order_items")
Step 4 : Convert RDD into key value as (orderjd as a key and rest of the values as a value)
# First value is orderjd
ordersKeyValue = orders.map(lambda line: (int(line.split(",")[0]), line))
# Second value as an Orderjd
orderltemsKeyValue = orderltems.map(lambda line: (int(line.split(",")[1]), line))
Step 5 : Join both the RDD using orderjd
joinedData = orderltemsKeyValue.join(ordersKeyValue)
#print the joined data
for line in joinedData.collect():
print(line)
Format of joinedData as below.
[Orderld, 'All columns from orderltemsKeyValue', 'All columns from orders Key Value']
Step 6 : Now fetch selected values Orderld, Order date and amount collected on this order.
//Retruned row will contain ((order_date,order_id),amout_collected)
revenuePerDayPerOrder = joinedData.map(lambda row: ((row[1][1].split(M,M)[1],row[0]}, float(row[1][0].split(",")[4])))
#print the result
for line in revenuePerDayPerOrder.collect():
print(line)
Step 7 : Now calculate total revenue perday and per order
A. Using reduceByKey
totalRevenuePerDayPerOrder = revenuePerDayPerOrder.reduceByKey(lambda
runningSum, value: runningSum + value)
for line in totalRevenuePerDayPerOrder.sortByKey().collect(): print(line)
#Generate data as (date, amount_collected) (Ignore ordeMd)
dateAndRevenueTuple = totalRevenuePerDayPerOrder.map(lambda line: (line[0][0], line[1])) for line in dateAndRevenueTuple.sortByKey().collect(): print(line)
Step 8 : Calculate total amount collected for each day. And also calculate number of days.
# Generate output as (Date, Total Revenue for date, total_number_of_dates)
# Line 1 : it will generate tuple (revenue, 1)
# Line 2 : Here, we will do summation for all revenues at the same time another counter to maintain number of records.
#Line 3 : Final function to merge all the combiner
totalRevenueAndTotalCount = dateAndRevenueTuple.combineByKey( \
lambda revenue: (revenue, 1), \
lambda revenueSumTuple, amount: (revenueSumTuple[0] + amount, revenueSumTuple[1]
+ 1), \
lambda tuplel, tuple2: (round(tuple1[0] + tuple2[0], 2}, tuple1[1] + tuple2[1]) \ for line in totalRevenueAndTotalCount.collect(): print(line)
Step 9 : Now calculate average for each date
averageRevenuePerDate = totalRevenueAndTotalCount.map(lambda threeElements:
(threeElements[0], threeElements[1][0]/threeElements[1][1]}}
for line in averageRevenuePerDate.collect(): print(line)
Step 10 : Using aggregateByKey
#line 1 : (Initialize both the value, revenue and count)
#line 2 : runningRevenueSumTuple (Its a tuple for total revenue and total record count for each date)
# line 3 : Summing all partitions revenue and count
totalRevenueAndTotalCount = dateAndRevenueTuple.aggregateByKey( \
(0,0), \
lambda runningRevenueSumTuple, revenue: (runningRevenueSumTuple[0] + revenue, runningRevenueSumTuple[1] + 1), \ lambda tupleOneRevenueAndCount, tupleTwoRevenueAndCount:
(tupleOneRevenueAndCount[0] + tupleTwoRevenueAndCount[0],
tupleOneRevenueAndCount[1] + tupleTwoRevenueAndCount[1]) \
)
for line in totalRevenueAndTotalCount.collect(): print(line)
Step 11 : Calculate the average revenue per date
averageRevenuePerDate = totalRevenueAndTotalCount.map(lambda threeElements:
(threeElements[0], threeElements[1][0]/threeElements[1][1]))
for line in averageRevenuePerDate.collect(): print(line)

質問 2：
CORRECT TEXT
Problem Scenario 6 : You have been given following mysql database details as well as other info.
user=retail_dba
password=cloudera
database=retail_db
jdbc URL = jdbc:mysql://quickstart:3306/retail_db
Compression Codec : org.apache.hadoop.io.compress.SnappyCodec
Please accomplish following.
1. Import entire database such that it can be used as a hive tables, it must be created in default schema.
2. Also make sure each tables file is partitioned in 3 files e.g. part-00000, part-00002, part-
00003
3. Store all the Java files in a directory called java_output to evalute the further
正解：
See the explanation for Step by Step Solution and configuration.
Explanation:
Solution :
Step 1 : Drop all the tables, which we have created in previous problems. Before implementing the solution.
Login to hive and execute following command.
show tables;
drop table categories;
drop table customers;
drop table departments;
drop table employee;
drop table ordeMtems;
drop table orders;
drop table products;
show tables;
Check warehouse directory. hdfs dfs -Is /user/hive/warehouse
Step 2 : Now we have cleaned database. Import entire retail db with all the required parameters as problem statement is asking.
sqoop import-all-tables \
-m3\
-connect jdbc:mysql://quickstart:3306/retail_db \
--username=retail_dba \
-password=cloudera \
-hive-import \
--hive-overwrite \
-create-hive-table \
--compress \
--compression-codec org.apache.hadoop.io.compress.SnappyCodec \
--outdir java_output
Step 3 : Verify the work is accomplished or not.
a. Go to hive and check all the tables hive
show tables;
select count(1) from customers;
b. Check the-warehouse directory and number of partitions,
hdfs dfs -Is /user/hive/warehouse
hdfs dfs -Is /user/hive/warehouse/categories
c. Check the output Java directory.
Is -Itr java_output/

質問 3：
CORRECT TEXT
Problem Scenario 27 : You need to implement near real time solutions for collecting information when submitted in file with below information.
Data
echo "IBM,100,20160104" >> /tmp/spooldir/bb/.bb.txt
echo "IBM,103,20160105" >> /tmp/spooldir/bb/.bb.txt
mv /tmp/spooldir/bb/.bb.txt /tmp/spooldir/bb/bb.txt
After few mins
echo "IBM,100.2,20160104" >> /tmp/spooldir/dr/.dr.txt
echo "IBM,103.1,20160105" >> /tmp/spooldir/dr/.dr.txt
mv /tmp/spooldir/dr/.dr.txt /tmp/spooldir/dr/dr.txt
Requirements:
You have been given below directory location (if not available than create it) /tmp/spooldir .
You have a finacial subscription for getting stock prices from BloomBerg as well as
Reuters and using ftp you download every hour new files from their respective ftp site in directories /tmp/spooldir/bb and /tmp/spooldir/dr respectively.
As soon as file committed in this directory that needs to be available in hdfs in
/tmp/flume/finance location in a single directory.
Write a flume configuration file named flume7.conf and use it to load data in hdfs with following additional properties .
1 . Spool /tmp/spooldir/bb and /tmp/spooldir/dr
2 . File prefix in hdfs sholuld be events
3 . File suffix should be .log
4 . If file is not commited and in use than it should have _ as prefix.
5 . Data should be written as text to hdfs
正解：
See the explanation for Step by Step Solution and configuration.
Explanation:
Solution :
Step 1 : Create directory mkdir /tmp/spooldir/bb mkdir /tmp/spooldir/dr
Step 2 : Create flume configuration file, with below configuration for
agent1.sources = source1 source2
agent1 .sinks = sink1
agent1.channels = channel1
agent1 .sources.source1.channels = channel1
agentl .sources.source2.channels = channell agent1 .sinks.sinkl.channel = channell agent1 .sources.source1.type = spooldir agent1 .sources.sourcel.spoolDir = /tmp/spooldir/bb agent1 .sources.source2.type = spooldir
agent1 .sources.source2.spoolDir = /tmp/spooldir/dr
agent1 .sinks.sink1.type = hdfs
agent1 .sinks.sink1.hdfs.path = /tmp/flume/finance
agent1-sinks.sink1.hdfs.filePrefix = events
agent1.sinks.sink1.hdfs.fileSuffix = .log
agent1 .sinks.sink1.hdfs.inUsePrefix = _
agent1 .sinks.sink1.hdfs.fileType = Data Stream
agent1.channels.channel1.type = file
Step 4 : Run below command which will use this configuration file and append data in hdfs.
Start flume service:
flume-ng agent -conf /home/cloudera/flumeconf -conf-file
/home/cloudera/fIumeconf/fIume7.conf --name agent1
Step 5 : Open another terminal and create a file in /tmp/spooldir/
echo "IBM,100,20160104" > /tmp/spooldir/bb/.bb.txt
echo "IBM,103,20160105" > /tmp/spooldir/bb/.bb.txt mv /tmp/spooldir/bb/.bb.txt
/tmp/spooldir/bb/bb.txt
After few mins
echo "IBM,100.2,20160104" > /tmp/spooldir/dr/.dr.txt
echo "IBM,103.1,20160105" >/tmp/spooldir/dr/.dr.txt mv /tmp/spooldir/dr/.dr.txt
/tmp/spooldir/dr/dr.txt

質問 4：
CORRECT TEXT
Problem Scenario 73 : You have been given data in json format as below.
{"first_name":"Ankit", "last_name":"Jain"}
{"first_name":"Amir", "last_name":"Khan"}
{"first_name":"Rajesh", "last_name":"Khanna"}
{"first_name":"Priynka", "last_name":"Chopra"}
{"first_name":"Kareena", "last_name":"Kapoor"}
{"first_name":"Lokesh", "last_name":"Yadav"}
Do the following activity
1 . create employee.json file locally.
2 . Load this file on hdfs
3 . Register this data as a temp table in Spark using Python.
4 . Write select query and print this data.
5 . Now save back this selected data in json format.
正解：
See the explanation for Step by Step Solution and configuration.
Explanation:
Solution :
Step 1 : create employee.json tile locally.
vi employee.json (press insert) past the content.
Step 2 : Upload this tile to hdfs, default location hadoop fs -put employee.json
Step 3 : Write spark script
#lmport SQLContext
from pyspark import SQLContext
# Create instance of SQLContext sqIContext = SQLContext(sc)
# Load json file
employee = sqlContext.jsonFile("employee.json")
# Register RDD as a temp table employee.registerTempTablef'EmployeeTab"}
# Select data from Employee table
employeelnfo = sqlContext.sql("select * from EmployeeTab"}
#lterate data and print
for row in employeelnfo.collect():
print(row)
Step 4 : Write dataas a Text file employeelnfo.toJSON().saveAsTextFile("employeeJson1")
Step 5: Check whether data has been created or not hadoop fs -cat employeeJsonl/part"

質問 5：
CORRECT TEXT
Problem Scenario 18 : You have been given following mysql database details as well as other info.
user=retail_dba
password=cloudera
database=retail_db
jdbc URL = jdbc:mysql://quickstart:3306/retail_db
Now accomplish following activities.
1. Create mysql table as below.
mysql --user=retail_dba -password=cloudera
use retail_db
CREATE TABLE IF NOT EXISTS departments_hive02(id int, department_name
varchar(45), avg_salary int);
show tables;
2. Now export data from hive table departments_hive01 in departments_hive02. While exporting, please note following. wherever there is a empty string it should be loaded as a null value in mysql.
wherever there is -999 value for int field, it should be created as null value.
正解：
See the explanation for Step by Step Solution and configuration.
Explanation:
Solution :
Step 1 : Create table in mysql db as well.
mysql ~user=retail_dba -password=cloudera
use retail_db
CREATE TABLE IF NOT EXISTS departments_hive02(id int, department_name
varchar(45), avg_salary int);
show tables;
Step 2 : Now export data from hive table to mysql table as per the requirement.
sqoop export --connect jdbc:mysql://quickstart:3306/retail_db \
-username retaildba \
-password cloudera \
--table departments_hive02 \
-export-dir /user/hive/warehouse/departments_hive01 \
-input-fields-terminated-by '\001' \
--input-Iines-terminated-by '\n' \
--num-mappers 1 \
-batch \
-Input-null-string "" \
-input-null-non-string -999
step 3 : Now validate the data,select * from departments_hive02;

質問 6：
CORRECT TEXT
Problem Scenario 22 : You have been given below comma separated employee information.
name,salary,sex,age
alok,100000,male,29
jatin,105000,male,32
yogesh,134000,male,39
ragini,112000,female,35
jyotsana,129000,female,39
valmiki,123000,male,29
Use the netcat service on port 44444, and nc above data line by line. Please do the following activities.
1. Create a flume conf file using fastest channel, which write data in hive warehouse directory, in a table called flumeemployee (Create hive table as well tor given data).
2. Write a hive query to read average salary of all employees.
正解：
See the explanation for Step by Step Solution and configuration.
Explanation:
Solution :
Step 1 : Create hive table forflumeemployee.'
CREATE TABLE flumeemployee
(
name string, salary int, sex string,
age int
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',';
Step 2 : Create flume configuration file, with below configuration for source, sink and channel and save it in flume2.conf.
#Define source , sink , channel and agent,
agent1 .sources = source1
agent1 .sinks = sink1
agent1.channels = channel1
# Describe/configure source1
agent1.sources.source1.type = netcat
agent1.sources.source1.bind = 127.0.0.1
agent1.sources.source1.port = 44444
## Describe sink1
agent1 .sinks.sink1.channel = memory-channel
agent1.sinks.sink1.type = hdfs
agent1 .sinks.sink1.hdfs.path = /user/hive/warehouse/flumeemployee
hdfs-agent.sinks.hdfs-write.hdfs.writeFormat=Text
agent1 .sinks.sink1.hdfs.tileType = Data Stream
# Now we need to define channel1 property.
agent1.channels.channel1.type = memory
agent1.channels.channel1.capacity = 1000
agent1.channels.channel1.transactionCapacity = 100
# Bind the source and sink to the channel
Agent1 .sources.sourcel.channels = channell agent1 .sinks.sinkl.channel = channel1
Step 3 : Run below command which will use this configuration file and append data in hdfs.
Start flume service:
flume-ng agent -conf /home/cloudera/flumeconf -conf-file
/home/cloudera/flumeconf/flume2.conf --name agent1
Step 4 : Open another terminal and use the netcat service.
nc localhost 44444
Step 5 : Enter data line by line.
alok,100000.male,29
jatin,105000,male,32
yogesh,134000,male,39
ragini,112000,female,35
jyotsana,129000,female,39
valmiki,123000,male,29
Step 6 : Open hue and check the data is available in hive table or not.
step 7 : Stop flume service by pressing ctrl+c
Step 8 : Calculate average salary on hive table using below query. You can use either hive command line tool or hue. select avg(salary) from flumeemployee;

弊社は失敗したら全額で返金することを承諾します

我々は弊社のCCA175問題集に自信を持っていますから、試験に失敗したら返金する承諾をします。我々のCloudera CCA175を利用して君は試験に合格できると信じています。もし試験に失敗したら、我々は君の支払ったお金を君に全額で返して、君の試験の失敗する経済損失を減少します。

Cloudera CCA175 認定試験の出題範囲：

トピック	出題範囲
トピック 1	Generate reports by using queries against loaded data Produce ranked or sorted data
トピック 2	Perform standard extract, transform, load (ETL) processes on data using the Spark API Join disparate datasets using Spark
トピック 3	Understand the fundamentals of querying datasets in Spark Write the results back into HDFS using Spark
トピック 4	Write queries that calculate aggregate statistics Load data from HDFS for use in Spark applications
トピック 5	Use Spark SQL to interact with the meta store programmatically in your applications Read and write files in a variety of file formats

参照：https://www.cloudera.com/about/training/certification/cdhhdp-certification/cca-spark.html

安全的な支払方式を利用しています

Credit Cardは今まで全世界の一番安全の支払方式です。少数の手続きの費用かかる必要がありますとはいえ、保障があります。お客様の利益を保障するために、弊社のCCA175問題集は全部Credit Cardで支払われることができます。

領収書について：社名入りの領収書が必要な場合、メールで社名に記入していただき送信してください。弊社はPDF版の領収書を提供いたします。

TopExamは君にCCA175の問題集を提供して、あなたの試験への復習にヘルプを提供して、君に難しい専門知識を楽に勉強させます。TopExamは君の試験への合格を期待しています。

弊社のCloudera CCA175を利用すれば試験に合格できます

弊社のCloudera CCA175は専門家たちが長年の経験を通して最新のシラバスに従って研究し出した勉強資料です。弊社はCCA175問題集の質問と答えが間違いないのを保証いたします。

この問題集は過去のデータから分析して作成されて、カバー率が高くて、受験者としてのあなたを助けて時間とお金を節約して試験に合格する通過率を高めます。我々の問題集は的中率が高くて、100％の合格率を保証します。我々の高質量のCloudera CCA175を利用すれば、君は一回で試験に合格できます。

一年間の無料更新サービスを提供します

君が弊社のCloudera CCA175をご購入になってから、我々の承諾する一年間の更新サービスが無料で得られています。弊社の専門家たちは毎日更新状態を検査していますから、この一年間、更新されたら、弊社は更新されたCloudera CCA175をお客様のメールアドレスにお送りいたします。だから、お客様はいつもタイムリーに更新の通知を受けることができます。我々は購入した一年間でお客様がずっと最新版のCloudera CCA175を持っていることを保証します。

弊社は無料Cloudera CCA175サンプルを提供します

お客様は問題集を購入する時、問題集の質量を心配するかもしれませんが、我々はこのことを解決するために、お客様に無料CCA175サンプルを提供いたします。そうすると、お客様は購入する前にサンプルをダウンロードしてやってみることができます。君はこのCCA175問題集は自分に適するかどうか判断して購入を決めることができます。

CCA175試験ツール：あなたの訓練に便利をもたらすために、あなたは自分のペースによって複数のパソコンで設置できます。

連絡方法: [email protected] サポート

人気のベンダー: Apple; Avaya; CIW; FileMaker; Lotus; Lpi; OMG; SNIA; Symantec; XML Master; Zend-Technologies; The Open Group; H3C; 3COM; ACI

TopExam問題集を選ぶ理由は何でしょうか？: 品質保証TopExamは我々の専門家たちの努力によって、過去の試験のデータが分析されて、数年以来の研究を通して開発されて、多年の研究への整理で、的中率が高くて99％の通過率を保証することができます。; 一年間の無料アップデートTopExamは弊社の商品をご購入になったお客様に一年間の無料更新サービスを提供することができ、行き届いたアフターサービスを提供します。弊社は毎日更新の情況を検査していて、もし商品が更新されたら、お客様に最新版をお送りいたします。お客様はその一年でずっと最新版を持っているのを保証します。; 全額返金弊社の商品に自信を持っているから、失敗したら全額で返金することを保証します。弊社の商品でお客様は試験に合格できると信じていますとはいえ、不幸で試験に失敗する場合には、弊社はお客様の支払ったお金を全額で返金するのを承諾します。(全額返金); ご購入の前の試用TopExamは無料なサンプルを提供します。弊社の商品に疑問を持っているなら、無料サンプルを体験することができます。このサンプルの利用を通して、お客様は弊社の商品に自信を持って、安心で試験を準備することができます。