Datascience24
Please message me using Send Message option for new assignment requests.
- 190
- 0
- 10
Community
- Followers
- Following
12 Reviews received
200 items
Hands-On Experiment 3-2: Frequent Pattern Mining with Spark - Part II
Hands-On Experiment 3-2: 
Frequent Pattern Mining with Spark - Part II 
1.3 Create DataFrames 
You can create your DataFrames using 
Assignment 1 
1. Write spark codes to read the following data. 
(a) Only read the following four tables that will be used for this exercise 
i. orders 
ii. products 
iii. departments 
iv. order_products_train 
(b) Make sure that you read the “headers” as well 
i. Each CSV file of the dataset has a header line. 
ii. You can achieve this behavior by 
Assignment ...
- Exam (elaborations)
- • 4 pages •
Hands-On Experiment 3-2: 
Frequent Pattern Mining with Spark - Part II 
1.3 Create DataFrames 
You can create your DataFrames using 
Assignment 1 
1. Write spark codes to read the following data. 
(a) Only read the following four tables that will be used for this exercise 
i. orders 
ii. products 
iii. departments 
iv. order_products_train 
(b) Make sure that you read the “headers” as well 
i. Each CSV file of the dataset has a header line. 
ii. You can achieve this behavior by 
Assignment ...
Hands-On Experiment 3-1: Frequent Pattern Mining with Spark
2.4 Let’s try to practice answering some exercise questions 
Q1: List 3 most frequent itemsets of size 1. 
Q2: Given support >= 30%, show itemsets and the counts for candidate itemsets of size 2 
Q3: Colby is purchased most frequently with what other product? 
Q4: What is the confidence for the rule: American → Cheddar 
3 Submission: Find frequent patterns using FPGrowth from a 
real-world grocery store dataset 
Please read the related news article “Kroger Knows Your Shopping Patterns B...
- Exam (elaborations)
- • 6 pages •
2.4 Let’s try to practice answering some exercise questions 
Q1: List 3 most frequent itemsets of size 1. 
Q2: Given support >= 30%, show itemsets and the counts for candidate itemsets of size 2 
Q3: Colby is purchased most frequently with what other product? 
Q4: What is the confidence for the rule: American → Cheddar 
3 Submission: Find frequent patterns using FPGrowth from a 
real-world grocery store dataset 
Please read the related news article “Kroger Knows Your Shopping Patterns B...
Hands-On Experiment 2-2: Data Warehousing with Hive
Objectives 
In this Hands-on exercise, you will learn 
1. Practice PySpark SQL for data analytics. 
2. Use enhanced aggregation to emulate SQL concepts like GROUPING SETS, ROLLUP, and CUBE 
in PySpark. 
3. Analyzing Driver Risk factor 
4. Analyzing data using Data Warehousing/OLAP functions in Hive 
 
Q1. (35pts) Modify/rewrite the grouping-set-query in the example with ROLLUP (Let’s call it 
rollup-query). Run it, check the results, and explain the differences. 
– Replace the GROUPING SETS ...
- Exam (elaborations)
- • 78 pages •
Objectives 
In this Hands-on exercise, you will learn 
1. Practice PySpark SQL for data analytics. 
2. Use enhanced aggregation to emulate SQL concepts like GROUPING SETS, ROLLUP, and CUBE 
in PySpark. 
3. Analyzing Driver Risk factor 
4. Analyzing data using Data Warehousing/OLAP functions in Hive 
 
Q1. (35pts) Modify/rewrite the grouping-set-query in the example with ROLLUP (Let’s call it 
rollup-query). Run it, check the results, and explain the differences. 
– Replace the GROUPING SETS ...
Hands-on Exercise Ex5-3: Detecting Fake News with Apache Spark and Spark NLP
Hands-on Exercise Ex5-3: Detecting Fake News with Apache Spark and Spark NLP 
Assignment 1 – 4 (10pts each, 40pts in total) 
Do the exercises in Section 1.4 – 1.7 
Assignment 5 (30pts) 
Rewrite the codes for detecting fake/real news in Trump and Biden tweet datasets. Note: Do not 
combine those datasets. 
• Read the article [21] 
• (10pts) Write the codes for downloading the two files: 
o Use the two links in the article 
o Use the links from the raw data by clicking the raw button on th...
- Exam (elaborations)
- • 13 pages •
Hands-on Exercise Ex5-3: Detecting Fake News with Apache Spark and Spark NLP 
Assignment 1 – 4 (10pts each, 40pts in total) 
Do the exercises in Section 1.4 – 1.7 
Assignment 5 (30pts) 
Rewrite the codes for detecting fake/real news in Trump and Biden tweet datasets. Note: Do not 
combine those datasets. 
• Read the article [21] 
• (10pts) Write the codes for downloading the two files: 
o Use the two links in the article 
o Use the links from the raw data by clicking the raw button on th...
Hands-on Exercise Ex5-2: Topic modeling with Apache Spark and Spark NLP
Hands-on Exercise Ex5-2: Topic modeling with Apache Spark and Spark NLP 
Assignments 1 – 4 (10pts each) 
Do the exercises in Sections 3.6 – 3.9 
Assignment 5 (20pts) 
Try different values of k and maxIter to see which combination best suits your data in Section 
3.8. Show at least five combinations, show their results, and explain why it’s best. 
Assignment 6 (40pts) 
(30pts) Rewrite the codes for finding topics in the tweets coronavirus dataset. (10pts) Also, try 
different values of k an...
- Exam (elaborations)
- • 16 pages •
Hands-on Exercise Ex5-2: Topic modeling with Apache Spark and Spark NLP 
Assignments 1 – 4 (10pts each) 
Do the exercises in Sections 3.6 – 3.9 
Assignment 5 (20pts) 
Try different values of k and maxIter to see which combination best suits your data in Section 
3.8. Show at least five combinations, show their results, and explain why it’s best. 
Assignment 6 (40pts) 
(30pts) Rewrite the codes for finding topics in the tweets coronavirus dataset. (10pts) Also, try 
different values of k an...
Hands-on Exercise Ex5-1: Natural Language Processing (NLP) with Named Entity Recognition (NER)
Hands-on Exercise Ex5-1: Natural Language Processing 
(NLP) with Named Entity Recognition (NER) 
Assignment 10 (10pts) 
Annotate (NER) a text using a PretrainedPipeline (recognize_entities_dl) in SparkNLP [12][13] 
• Input Text from Wikipedia 
The University of Illinois Springfield (UIS) is a public university in Springfield, Illinois, United 
States. The university was established in 1969 as Sangamon State University by the Illinois 
General Assembly and became a part of the University of Ill...
- Exam (elaborations)
- • 8 pages •
Hands-on Exercise Ex5-1: Natural Language Processing 
(NLP) with Named Entity Recognition (NER) 
Assignment 10 (10pts) 
Annotate (NER) a text using a PretrainedPipeline (recognize_entities_dl) in SparkNLP [12][13] 
• Input Text from Wikipedia 
The University of Illinois Springfield (UIS) is a public university in Springfield, Illinois, United 
States. The university was established in 1969 as Sangamon State University by the Illinois 
General Assembly and became a part of the University of Ill...
Learn Models using ML Pipeline in Spark.
Learn Models using ML Pipeline in Spark. 
2.2.1.2 Specify parameters 
The next step is setting up parameters for ML algorithms, LogisticRegression. We give 10 for 
maxIter (Max Iteration) and 0.01 for regParam (Regularization parameter) 
For detail, see reference [7] 
After running the above codes in Spark shell, you will see a bunch of parameters you specified, 
e.g. maxIter and regParam, and can specify or change, aggregationDepth and etc. 
2.2.1.3 Learn model 
Now it’s time to learn mode wi...
- Exam (elaborations)
- • 3 pages •
Learn Models using ML Pipeline in Spark. 
2.2.1.2 Specify parameters 
The next step is setting up parameters for ML algorithms, LogisticRegression. We give 10 for 
maxIter (Max Iteration) and 0.01 for regParam (Regularization parameter) 
For detail, see reference [7] 
After running the above codes in Spark shell, you will see a bunch of parameters you specified, 
e.g. maxIter and regParam, and can specify or change, aggregationDepth and etc. 
2.2.1.3 Learn model 
Now it’s time to learn mode wi...
Data Analytics using Spark SQL
Data Analytics using Spark SQL 
Assignment1 (20pts) Related: Section 3 
Write and run a Spark command (not SQL query) to show the date when # of deaths was severe 
(more than 800 deaths), as well as # of confirmed cases, # of deaths, and country using the filter 
function. The output should be like the one below. 
+--------+-----+------+-----------------------+ 
| dateRep|cases|deaths|countriesAndTerritories| 
+--------+-----+------+-----------------------+ 
Note: Write commands/queries for all ...
- Exam (elaborations)
- • 2 pages •
Data Analytics using Spark SQL 
Assignment1 (20pts) Related: Section 3 
Write and run a Spark command (not SQL query) to show the date when # of deaths was severe 
(more than 800 deaths), as well as # of confirmed cases, # of deaths, and country using the filter 
function. The output should be like the one below. 
+--------+-----+------+-----------------------+ 
| dateRep|cases|deaths|countriesAndTerritories| 
+--------+-----+------+-----------------------+ 
Note: Write commands/queries for all ...
Data Analytics with DW/OLAP using Hive
Create Hive Tables 
Apache Hive is a data warehouse software project built on top of Apache Hadoop for providing 
data queries and analysis. This exercise will use Hive as a data warehouse/OLAP tool for 
analyzing data. 
2.1.3 Create Hive Tables 
2.1.3.1 Check Schema 
To check the schema of the tables, see the first 5 rows. To see , use the ‘head’ Linux 
commands. You can see the schema (at least field/column names) in the first line: driverId, 
name, ssn, location, certified, and wage-plan....
- Exam (elaborations)
- • 6 pages •
Create Hive Tables 
Apache Hive is a data warehouse software project built on top of Apache Hadoop for providing 
data queries and analysis. This exercise will use Hive as a data warehouse/OLAP tool for 
analyzing data. 
2.1.3 Create Hive Tables 
2.1.3.1 Check Schema 
To check the schema of the tables, see the first 5 rows. To see , use the ‘head’ Linux 
commands. You can see the schema (at least field/column names) in the first line: driverId, 
name, ssn, location, certified, and wage-plan....
NoSQL Database HBase
NoSQL Database HBase 
Assignments 
1. Write and run 11 HBase commands to insert a new row into the table. 
a. Table name: <your-namespace>:truck_event 
b. Rowkey: 20000 
c. Column family name: events 
d. Columns: values 
i. driverId: <your-login or UIS NetID> 
ii. truckId: 999 
iii. eventTime: 01:01.1 
iv. eventType: <Pick one from Normal, Overspeed, and Lane Departure> 
v. longitude: -94.58 
vi. latitude: 37.03 
vii. eventKey (This is a RowKey) 
viii. CorrelationId: 1000 
ix. ...
- Exam (elaborations)
- • 2 pages •
NoSQL Database HBase 
Assignments 
1. Write and run 11 HBase commands to insert a new row into the table. 
a. Table name: <your-namespace>:truck_event 
b. Rowkey: 20000 
c. Column family name: events 
d. Columns: values 
i. driverId: <your-login or UIS NetID> 
ii. truckId: 999 
iii. eventTime: 01:01.1 
iv. eventType: <Pick one from Normal, Overspeed, and Lane Departure> 
v. longitude: -94.58 
vi. latitude: 37.03 
vii. eventKey (This is a RowKey) 
viii. CorrelationId: 1000 
ix. ...
Santander_Bank_Case_Study_ML_Week6_NEC
Drawing_Maps_VisualAnalytics_Week13_NEC_Solved
MNIST _Fashion_MNIST_image_data_ML_Wk12_NEC_Solved
Santander_Bank_Case_Study_ML_Week6_NEC
Fundamentals_of_ensemble_modeling_Week5_NEC