失效链接处理 |
数据可视化 PDF 下载
本站整理下载:
相关截图:
主要内容:
1 Introduction
1.1 Problem Background
Company always need to analyze the comments of competitive products to
select its online sales strategy and determine potential important design features
to help them introduce and sell its new products in the online marketplace. Sunshine Company is planning to introduce and sell three new products: a microwave oven, a baby pacifier, and a hair dryer in the online marketplace and
hired our team as consultants. Through the analysis of Amazon marketplace data provided by Sunshine ’s Data Center, we are required to inform their online
sales strategy and identify potentially important design relationships between
product desirability and reviews. Emphatically the time-based patterns and interaction in these data is of particular interest to Sunshine Company. A slew of
prior arts have shed light on the emotional analysis of comment data: Moghaddam et al.[1]
found that the matrix factorization model is more effective in the
recommendation of online reviews. Lappas et al.[2] developed a greedy algorithm, an integer-regression algorithm and an iterative-random algorithm to sample
a characteristic set of reviews. However, above works focus on screening a big
volume of customer online reviews and helpful opinions only, which is not capable to be applied directly for competitor analysis.
2 General Assumptions
In order to rationalize our model, we make some assumptions in our paper.
The details are as below:
• The Internet reputation of products is the only goal sunshine company pursues. A product with perfect Internet reputation is a successful product
indubitably, no matter how its sales volume is.
• data provided by Sunshine Company is complete, or at least is randomly distributed, so data can represent the true evaluation of customers on
products.
• Review body contains all the information of review headline. If review
body is taken into account, there is no need to consider review headline.
3 Nomenclature
We list the variables used to make analysis:
Team # 2021634 Page 2 of 30
Symbol Definition
Scorerev Score of customer review
Score∗ Comprehensive score based on ratings and reviews
Ratefavor The praise rate of review
Scoremotion Emotional level of review
eW OM Internet word of mouth
BCV S The best cross validation score
V ine Dummy variable indicating Amazon Vine membership
V erif ied_P urchase Dummy variable indicating discounted purchasing
Helpful_votes Number of help_votes
T otal_vote Number of total_votes the review received
help_pct Helpful_votes/total_votes
4 Step1: Data Preprocessing
From the perspective of researches about humans decision, customers usually
make comments on products with both subjective preference and objective opinion, which will lead to bias on the feedback information’s effectiveness. Based on
whether it may be influenced by endogenous and exogenous factors, we divide 7
data measures into social factor-driven measures and natural factor-driven measures. Social factor-driven measures can be interpreted as discrete numbers (we
can treat form "Y" as 1, form "N" as 0), and easier to be interplayed. For example,
one customer tend to rate lower after a series of low star ratings or tend to hold
a vote when he read a vine member’s review. Natural factor-driven measure is
text-based review since the comment in text form require more serious thinking,
which will help reduce external interference.
(a) different factors (b) steps
Figure 1: analysis process
The determination of value on social-factor measures is often disturbed by
emotions in the real world. Moreover, its given rate may be affected by other
numerical rates due to the simplification of rating system and the complexity
of human being, which make the effective information we analyze biased. In
order to solve this problem, we need to grasp the possible influence relationship
between each measure and extract it into an intuitive and quantitative way.
star_rating
total_votes
vine
verified_purchase
helpful_pct
star_rating
total_votes
vine
verified_purchase
helpful_pct
1 -0.059 0.031 0.13 -0.15
-0.059 1 0.01 -0.096 0.22
0.031 0.01 1 -0.3 0.022
0.13 -0.096 -0.3 1 -0.16
-0.15 0.22 0.022 -0.16 1
0.25
0.00
0.25
0.50
0.75
1.00
star_rating
total_votes
vine
verified_purchase
helpful_pct
star_rating
total_votes
vine
verified_purchase
helpful_pct
1 -0.00041 0.061 0.51 -0.16
-0.00041 1 0.24 -0.11 0.18
0.061 0.24 1 -0.16 0.057
0.51 -0.11 -0.16 1 -0.21
-0.16 0.18 0.057 -0.21 1
0.00
0.25
0.50
0.75
1.00
star_rating
total_votes
vine
verified_purchase
helpful_pct
star_rating
total_votes
vine
verified_purchase
helpful_pct
1 -0.11 0.0019 0.074 -0.14
-0.11 1 0.016 -0.11 0.27
0.0019 0.016 1 -0.21 0.042
0.074 -0.11 -0.21 1 -0.15
-0.14 0.27 0.042 -0.15 1
0.00
0.25
0.50
0.75
1.00
Team # 2021634 Page 3 of 30
We will first use correlation coefficient matrix and grouping descriptive statistical graph to help Sunshine company capture the basic characteristics. Then
we will establish a regression model to confirm its implicit relationship.
|