www.HadoopExam.com

HadoopExam Learning Resources

CCD-410 Certifcation CCA-500 Hadoop Administrator Exam HBase Certifcation CCB-400 Data Science Certifcation Hadoop Training with Hands On Lab Hadoop Package Deal

Basics of Data Science

 

Introduction

Data Science : Mine the Hidden information from Data. For example which user likes what kind of Books or movies and recommend him Book or Movie Accordingly.

Predictive Analytics: it is the field of deriving information from current and historical data. Broadly it can be divided in three categories. 

  1. Recommender

  2. Classification

  3. Clustering 

Recommender: This is a system which will help you to recommend the information based on your past behavior or interest like Amazon or Flipkart suggests the books based on your past purchase, similarly eBay suggest the products based on your past purchase.  

Classification: It is also known as supervised learning,  it assigns a category to previously unseen data, based on some prior observations about similar data. In everyday life you see it for instance email spam filtering and detection of fraudulent credit card or stock market transaction etc.  

 

 

Classification, also known as supervised learning, is a fancy term for a system that makes predictions on data based on some previously known data. When you see an email titled “PAYMENT RECEIVED IN YOUR BANK ACCOUNT,” do you eagerly open it? The answer is, no. Prior experience has told you that the combination of words, and the fact that they’re uppercase, means that this email is most likely spam. This is an example of human supervised learning, where your current behavior is a result of previous observations you made on similar data. You may not have seen an email subject with the exact same sequence of words, but you’ve seen enough examples of similar email subjects which were spam to make you immediately suspicious.

Supervised learning works in exactly the same way. In the case of email spam detection,  you train a system using data which has already been labeled (or marked) as being either spam or ham (legitimate email) to build a model, and then use that model to make predictions about emails that the system hasn’t seen before. 

 

Naïve Bayes, is a supervised learning algorithms and you can use it in conjunction with Hadoop to build a scalable spam training and classification system.

 

Clustering: It is also known as unsupervised learning, it groups the data together into clusters. It does so without learning the characteristics about related data. Clustering is useful when you are trying to discover hidden structures in your data, such as user habits. 

Use Recommenders to make product suggestion: 

Recommender system is also known as collaborative filtering (CF).  This system is like asking your friend for the recommended books, restaurant, movies etc.  As more people are recommending www.HadoopExam.com for the BigData and Hadoop learning, the higher the probability that you will come to this site. 

We can further divide the collaborative recommenders: 

  1. User-based

  2. Item-based

User-based:  User-based collaborative filtering systems suggest interesting items to a user relying on similar-minded people called neighbors. While standard strategies perform a neighbor selection based on user similarities, trust-aware recommendation algorithms rely on other aspects indicative of user trust and reliability. 

Hence, there is a target user to whom we need to recommend a product. So then look for the user which is similar to target user, and use their collaborative ratings to make proper product to recommend the target user. 

Item-based:  Based on the previously used product by the target user recommend him the similar product. 

You have no rights to post comments

You are here: Home Free Tutorial Data Science Tutorial Basics of Data Science