Introduction
- All quiz questions are from Coursera Data Science Capstone course.
- All .json files are provided by Yelp.
- Data sources is hiden for privacy concern.
- Thief-welcome.
Question 1
After untaring the the dataset, how many files are there (including the documentation pdfs)?length(dir("../yelp_dataset_challenge_academic_dataset"))
[1] 7
Your Answer Score Explanation2
3
5
7 Correct 1.00
Total 1.00 / 1.00
Question 2
The data files are in what format?Your Answer Score Explanation
json Correct 1.00
.RData
csv
.xlsx
Total 1.00 / 1.00
Question 3
How many lines of text are there in the reviews file (in orders of magnitude)?nrow(review)
[1] 1569264
Your Answer Score ExplanationOne million Correct 1.00
Ten thousand
Ten million
One hundred thousand
Total 1.00 / 1.00
Question 4
Consider line 100 of the reviews file. “I’ve been going to the Grab n Eat for almost XXX years”review[100, ]$text
[1] "I have been coming to Gab n Eat for almost 20 years and They have never let me down. I get a typical breakfast if eggs, ham, toast, and home fries. Delicious as usual. The ambience however is usually lacking. The walls are dark, with writing and signatures of semi famous people all over the place. Pictures of local people hang on the walls(i secretly want mine up there) along with posters galore. While its fun to look at the first 10 times, it gets a little boring after awhile. So today when I arrived I expected the same old experience. Wow was I wrong! As soon as I looked at the door I knew something was different. The place seemed lighter and brighter. To my pleasant surprise, they painted and got new counter tops!! They're not quite done yet but the place has a new Happy vibe to it. The awesome breakfast, the new decor and the 5 guys sitting at the counter making me laugh are why I will be back( maybe for lunch)."
Your Answer Score Explanation20 Correct 1.00
10
2
5
Total 1.00 / 1.00
Question 5
What percentage of the reviews are five star reviews (rounded to the nearest percentage point)?nrow(review[review$stars == 5, ])/nrow(review)
[1] 0.3692986
Your Answer Score Explanation37% Correct 1.00
30%
10%
14%
Total 1.00 / 1.00
Question 6
How many lines are there in the businesses file?nrow(business)
[1] 61184
Your Answer Score ExplanationAround 15 million
Around 60 thousand Correct 1.00
Around 1.5 million
Around 55 million
Total 1.00 / 1.00
Question 7
Conditional on having an response for the attribute “Wi-Fi”, how many businesses are reported for having free wi-fi (rounded to the nearest percentage point)?x <- business$attributes$`Wi-Fi`
x <- x[!is.na(x)]
length(x[x == "free"])/length(x)
[1] 0.4091519
Your Answer Score Explanation2%
57%
1%
40% Correct 1.00
Total 1.00 / 1.00
Question 8
How many lines are in the tip file?nrow(tip)
[1] 495107
Your Answer Score ExplanationAbout 55 million
About 60 thousand
About 500 thousand Correct 1.00
About 1.5 million
Total 1.00 / 1.00
Question 9
In the tips file on the 1,000th line, fill in the blank: “Consistently terrible ______”tip[1000, ]$text
[1] "Consistently terrible service. What's with the attitudes?"
Your Answer Score Explanationservice Correct 1.00
food
desserts
atmosphere
Total 1.00 / 1.00
Question 10
What is the name of the user with over 10,000 compliment votes of type “funny”?x <- user[user$compliments$funny >= 10000, ]$name
x[!is.na(x)]
[1] "Brian"
Your Answer Score ExplanationJeff
Roger
Ira
Brian Correct 1.00
Total 1.00 / 1.00
Question 11
Create a 2 by 2 cross tabulation table of when a user has more than 1 fans to if the user has more than 1 compliment vote of type “funny”. Treat missing values as 0 (fans or votes of that type). Pass the 2 by 2 table to fisher.test in R. What is the P-value for the test of independence?condition1 <- user$fans >= 1 & !is.na(user$fans)
condition2 <- user$compliments$funny >= 1 & !is.na(user$compliments$funny)
ta <- table(condition1, condition2)
fisher.test(ta)
Fisher's Exact Test for Count Data
data: ta
p-value = 0.00146
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
17.03876 17.85392
sample estimates:
odds ratio
17.43834
Your Answer Score Explanationaround 0.05
around 0.01 Correct 1.00
around 0.20
less than .001
Total 1.00 / 1.00
沒有留言:
張貼留言
請。高談闊論