Presentazione tutorial
-
Upload
dariospin93 -
Category
Data & Analytics
-
view
263 -
download
0
Transcript of Presentazione tutorial
AWS Machine Learning
Engineering in Computer Science - Data Mining Class
AWS Machine Learning
Who are we?
2
Lukas Hermann
Milad Kiwan
Dario Molinari Lorenzo Vitali
Daniele De Cillis
Matteo Pallotta
AWS Machine Learning
Where to find the material
Slideshare repositoryhttp://www.slideshare.net/dariospin93/presentazione-tutorial-70026708
Github repository: https://github.com/dariospin93/TutorialDataMining
Here you’ll find the files needed for this tutorial
3
AWS Machine Learning
What is Machine Learning?
“Machine learning is the subfield of computer science that gives computers the ability to learn without being explicitly programmed" (Wikipedia)
“A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E." (Tom M. Mitchell, Chair of the Machine Learning Department at Carnegie Mellon University)
4
AWS Machine Learning
What is Machine Learning?
Some ML tasks:• Classification: inputs are divided into 2 or more
classes. The goal is to produce a model that assigns unseen inputs to one (e.g: Spam Filtering, input= emails, output = ”spam” or “not spam”).
• Regression: related to the previous category. The outputs are continuous rather than discrete (e.g: input = ”size of a house”, output = ”price”)
5
AWS Machine Learning
What is Machine Learning?
Some ML tasks:• Clustering: divide inputs into groups. The main
difference with respect to Classification problems is that the groups are not known beforehand
• Dimensionality reduction: map inputs into a lower-dimensional space. (e.g: input = “set of documents in human language”, output = “which documents cover similar topics”)
• ...
6
AWS Machine Learning
Why Machine Learning?
• Growing flood of data• Growing availability of computational power• Progress in algorithms
7
AWS Machine Learning
Amazon Machine Learning
“Service that makes it easy for developers of all skill levels to use machine learning technology. Amazon Machine Learning provides visualization tools and wizards that guide you through the process of creating machine learning (ML) models without having to learn complex ML algorithms and technology.”
8
AWS Machine Learning
What is Amazon ML?
• Robust cloud-based service that makes it easy for developers of all skill levels to use ML technology.
• Create ML models by finding patterns in your existing data.
• provides visualization tools and wizards that guide you through the process.
9
AWS Machine Learning
When to use Amazon ML?
• No need to learn complex ML algorithms and technology.
• Makes it easy to obtain predictions for your application using simple APIs.
• ML is not a solution for every type of problem.– if you can determine a target value by using simple rules, computations,
or predetermined steps that can be programmed without needing ML.
10
AWS Machine Learning
When to use Amazon ML?
• Many human tasks cannot be adequately solved using a simple rule-based solution: recognizing whether an email is spam or not spam.
• When rules depend on too many factors and many of these rules overlap or need to be tuned very finely.
11
AWS Machine Learning
When to use Amazon ML?
You can use ML approaches for these specific ML tasks:
• binary classification (predicting one of two possible outcomes).
• multiclass classification (predicting one of more than two outcomes).
• regression (predicting a numeric value).
12
AWS Machine Learning
Formulating The Problem
• The first step in machine learning is to decide what you want to predict, which is known as the label or target answer.
– Predict the number of purchases your customers will make for each product. (regression problem)
– Predict which products will get more than 10 purchases. (binary classification problem)
– Which category of products is most interesting to this customer. (multiclass classification problem)
13
AWS Machine Learning
Collecting Labeled Data
• Labeled Data: are data for which you already know the target answer.
• The Target: is the answer that you want to predict.
14
AWS Machine Learning
Collecting Labeled Data
• Data is not readily available in a labeled form. Collecting and preparing the variables and the target are often the most important steps in solving an ML problem.
• You provide data that is labeled with the target to the ML algorithm to learn from. Then, you will use the trained ML model to predict this answer on data for which you do not know the target answer.
15
AWS Machine Learning
What is Amazon S3 (Simple Storage Service)?
• Amazon S3 has a simple web services interface that you can use to store and retrieve any amount of data.
• It is designed to make web-scale computing easier for developers.
16
AWS Machine Learning
Training and Evaluation Data
• The fundamental goal of ML is to generalize beyond the data instances to train models.
• Amazon ML splits the first 70 percent of the input data sent for training a model through the Amazon ML console and the remaining 30 percent for the evaluation datasource.
17
AWS Machine Learning
Training and Evaluation Data
• The ML system uses the training data to train models to see patterns, and uses the evaluation data to evaluate the predictive quality of the trained model
• The ML system evaluates predictive performance by comparing predictions on the evaluation data set with true values.
18
AWS Machine Learning
Evaluation
• Threshold for prediction can be adjusted
• Control precision and recall
19
AWS Machine Learning
Precision and Recall
20
AWS Machine Learning
AWS ML Techniques
• for regression, AWS uses linear regression• for classification, AWS uses logistic regression
– Despite the name classification method
– uses a ML model similar to regression with a logistic sigmoid function
– binomial or multinomial
21
AWS Machine Learning
Logistic Regression: Example
• Labeled data with labels y {0,1}∈• e.g.:
– x: hours of study– y: pass (1) or fail (0)
• What’s the probability of success given a certain time spent studying?
22
Logistic Regression: Example
23AWS Machine Learning
Limitations of Amazon ML
• Only supervised learning (no clustering etc.)• No selection of the ML method possible• Preprocessing of the data is a black box
24AWS Machine Learning
AWS Machine Learning – hands-on
AWS Machine Learning
A cloud application
With Amazon ML, we can build and train a predictive model in a scalable cloud
solution. In fact, there is no need of any kind of application to run this tool, because
it runs on the cloud (actually, we’ll need a web browser in order to access to the
tool). In our tutorial we’ll show you the basic functionalities of Amazon ML, like
creating a datasource, building a model and using this model to generate
predictions.
In order to do this, we need a dataset, as big as possible. Our dataset, is taken
from the University of California at Irvine (UCI) machine learning repository, where it
is possible to find a lot of them.
Pagina 26
AWS Machine Learning
What we will see in this tutorial
In this tutorial we’ll see how machine learning can be used for marketing purposes.
To do this, we’ll show you how to build and train a model to help you making
decisions based on the data you have.
We’ll focus on selecting people based on their earnings, that may be useful to find
who’s going to be more suitable for certain marketing offers
Pagina 27
AWS Machine Learning
Tutorial Plan
1. Preparing the data
2. Creating a training datasource
3. Creating a model
4. Reviewing the model’s predictive performance and setting a score threshold
5. Using the model to generate predictions
6. Cleaning up (to avoid incurring in unwanted charges)
Pagina 28
AWS Machine Learning
Step 1: Preparing the data
Initially, we must be sure that our tool understands the data we pass to it. In order
to do this, we should ensure that our dataset follows Amazon’s guidelines:
• Data must be saved in .csv format
• Each row must be a single observation
• Each column must contain a single attribute of the observation
• The first should contain the attribute’s names (or you can provide them in a
separated file, but it’s not recommended)
• Every attribute must be separated by comma
• If you use Excel and MacOS, do not save in “comma separated value(.csv)”
format, use the “windows comma separated (.csv)” instead.
Pagina 29
AWS Machine Learning
Step 1: Preparing the data
Consider our dataset: open the “census.csv” file
Our target is the attribute “class”: how much a person earns per year (binary, 1 if >
50.000, 0 if ≤ 50.000)
Pagina 30
AWS Machine Learning
Step 1: Preparing the data
In practice, the machine will learn which are the characteristics of the people who
earn more than the threshold and who earn less, and with this knowledge, we will
ask to predict at which class other people belong.
Pagina 31
AWS Machine Learning
Step 1: Preparing the data
Open the census-batch.csv file: there is no “class” attribute there. In fact, the tool’s
job now is showing us what it has learnt, letting it work on this dataset where we
know the right “class” attribute, but it’s not specified in there.
Pagina 32
AWS Machine Learning
Step 2: Creating the training datasource
In order to use all our files, we have to upload them to Amazon S3
• Open https://console.aws.amazon.com/s3/
• Create a new bucket
• Choose upload in the navigation bar
• Add the files mentioned before
Pagina 33
AWS Machine Learning
Step 2: Creating the training datasource
Now to create the datasource (it will contain only the location of the data):
• Open https://console.aws.amazon.com/machinelearning/
• Choose Get Started (or Create New) and launch
• Select S3 from “Where your data is located?”
• Type <name of your bucket>/census.csv
• Put the name “Census data”
• Choose verify and grant permission
• Review and choose continue
Pagina 34
AWS Machine Learning
Step 2: Creating the training datasource
A schema contains information needed to interpret the input data for the model. The
simplest and fastest thing to do is to let Amazon infer it. We have to check if it is
correct. Review the schema and be sure that:
• Attributes with only 2 possible states are marked as binary
• Attributes that are numbers or strings that are used to denote a category should
be marked as categorical
• Attributes that are numbers where order matters should be marked as numeric
• Attributes that are plain strings as text
Then choose continue
Pagina 35
AWS Machine Learning
Step 2: Creating the training datasource
Finally we can choose the target attribute to predict, in this case it is “class”. We
don’t have an identifier, so we skip to continue and the datasource will be created.
Pagina 36
AWS Machine Learning
Step 3: Creating an ML model
Amazon should redirect us to the page of model creation. If not:
• From the console, click on “create a new model”
• Choose “I already created a datasource pointing to my S3 data”
• Pick our datasource previously created and click Continue
• Be sure the model name is “ML model: Census data” and select Default
• The evaluation name must be “Evaluation ML model: Census data”, review and
finish
Pagina 37
AWS Machine Learning
Step 3: Creating an ML model
Now Amazon is processing our data, and this may take some minutes
Pagina 38
AWS Machine Learning
Step 3: Creating an ML model
The operations that Amazon is performing are the following:
• Splitting the training datasource in 2 parts: one containing the 70% of the data
and one containing the remaining 30%
• Training the model with 70% of the data
• Testing the resulting model with the 30%
The status now is in pending. It will be in progress and then completed.
Pagina 39
AWS Machine Learning
Step 3: Creating an ML model
Pagina 40
AWS Machine Learning
Step 4: Reviewing the model’s predictive performance and setting a score threshold It’s important to check if the model is good enough for future predictions. This can
be done by looking at the model evaluation.
Take a look to the AUC (Area Under Curve) metric: it is an industry-standard quality
metric that expresses the performance quality of the model.
• Choose evaluation in the model summary
• Click on our model
• Click on summary
Pagina 41
AWS Machine Learning
Step 4: Reviewing the model’s predictive performance and setting a score threshold Shortly, the ML model generates numeric prediction score for each record and then,
based on a threshold, it converts this scores in binary labels.
Pagina 42
AWS Machine Learning
Step 4: Reviewing the model’s predictive performance and setting a score threshold We can interact with this evaluation: if we change this threshold, we can modify
how the model assigns the labels.
• On evaluation summary page, choose “Adjust score threshold”
• Try to move the vertical line on the graphic and the number of correct choices
and errors will change:
– Movements to the right will reduce the number of false positives
– Movements to the left will reduce the number of false negatives
• Move it until the score threshold becomes 0.37 (it decreases the false
negatives)
Pagina 43
AWS Machine Learning
Step 4: Reviewing the model’s predictive performance and setting a score threshold Now every time the model will predict a label, it will do it with this new threshold.
Pagina 44
AWS Machine Learning
Step 5: Using the ML model to generate predictions
There are two types of prediction that can be done:
• Real time predictions: it is prediction for a single observation that amazon
generates on demand
• Batch predictions: it is a set of predictions for a group of observation (N.B.:
Amazon will charge you 0.10€ for 1000 predictions, rounding up to the next
thousand)
Pagina 45
AWS Machine Learning
Step 5: Using the ML model to generate predictions
We’ll try now batch predictions, and we need the census-batch.csv file that we
uploaded at the beginning.
• Click on Amazon Machine Learning
• Click on Batch prediction
• Choose the model we created and click Continue
• In “Locate the input data”, choose “My data is in s3, and I need to create a
datasource”
• For the name of the datasource, type “Census data 2” and for the location of the
file type “your-bucket/census-batch.csv”
• “Does the first line in your cvs contain the column names?”, choose Yes, then
Verify and Continue
Pagina 46
AWS Machine Learning
Step 5: Using the ML model to generate predictions
• For the destination, type the location where you uploaded the file at the
beginning
• Accept the default name
• Choose Review
• Grant permission to Amazon S3
• On the review page choose Finish
As we saw with the training, now Amazon will process our file and give us the
results.
Pagina 47
AWS Machine Learning
Step 5: Using the ML model to generate predictions
Pagina 48
AWS Machine Learning
Step 5: Using the ML model to generate predictions
To view the results:
• Go to https://console.aws.amazon.com/s3/
• Navigate to the output location given before
• You will find a compressed file containing the result: download it and open it
Pagina 49
AWS Machine Learning
Step 5: Using the ML model to generate predictions
This file has 2 columns: best answer and score for each row of the datasource.
The score is greater than the threshold → the best answer will be “> 50.000”
The score is smaller than the threshold → the best answer will be “≤ 50.000”
Pagina 50
AWS Machine Learning
Step 6: Cleaning up
It’s safe to delete all the model and predictions we created so far, in order to not
incur in additional charges and to keep clean our console.
Pagina 51
AWS Machine Learning - Homework Assignment
Pagina 52
AWS Machine Learning
Homework Assignment
In the tutorial it has been introduced the usage of Amazon ML service through a
graphical interface, however in practice it can be useful to integrate such service
into a particular application.
Amazon ML addresses this problem offering a large, complete and easy to use set
of APIs.
http://docs.aws.amazon.com/machine-learning/latest/APIReference
Pagina 53
AWS Machine Learning
Homework Assignment
Assignment:
You are asked to repeat the steps presented in the tutorial with the exception of the
5th
step (Using the model to generate predictions). You are asked indeed to
complete such point by writing a Python script that makes use of the APIs.
Write the code needed to:1) Generate real-time predictions2) Generate batch predictions:
Pagina 54
AWS Machine Learning
Homework Assignment – Before starting
DASHBOARD LINK: https://eu-west-1.console.aws.amazon.com/machinelearning
DATASOURCE_ID: once in the dashboard, click on the datasource (created at
pass 2), then copy the ID MODEL_ID: once in the dashboard, click on the model (created at pass 3), then
copy the ID ID and KEY: once in the dashboard, click on your username on the top right side
of the screen →"My Security Credentials" → expand the voice "Access Keys" →"Create new access key" → copy the ID and KEY generated
Pagina 55
AWS Machine Learning
Homework Assignment – Before starting
GIVE PERMISSIONS TO FILES IN S3: It is mandatory to grant usage permissions to the files uploaded to S3. To do so: right click on the files -> Properties -> Permissions -> Add more permissions -> Select 'Any authenticated AWS user' -> Put a tick on all different permissions
ENABLE MODEL FOR REAL TIME PREDICTIONS: click on the model → create endpoint
Pagina 56
AWS Machine Learning
Homework Assignment – Exercise 1
Generate real-time predictions: in a new file, store 10 records of the “census-batch.csv” file. Generate one real-time prediction per record and print the results.
You can make use of the following function: from boto3.session import Session #install library boto3 first MODEL_ID = 'the id of the model you have created'ID = 'your id'KEY = 'your key' session = Session(aws_access_key_id=ID, aws_secret_access_key=KEY)client = session.client('machinelearning', region_name='eu-west-1')prediction_endpoint = "https://realtime.machinelearning.eu-west-1.amazonaws.com"fields=["age", "work class", "fnlwgt", "education", "education-num", "marital-status", "occupation", "relationship", "race", "sex",
"capital-gain", "capital-loss", "hours-per-week", "native-country"] def real_time_prediction(line) : # line = one line of the csv file
record = dict()for index, val in enumerate(line.split(',')):
record[fields[index]] = valresponse = client.predict(MLModelId=MODEL_ID, Record=record, PredictEndpoint=prediction_endpoint)return response.get('Prediction')
Pagina 57
AWS Machine Learning
Homework Assignment – Exercise 2
Generate batch predictions: use the “census-batch.csv” file that you’ve uploaded and then check the results on S3.
You can make use of the following function:from boto3.session import Session #install library boto3 firstID = 'your id'KEY = 'your key'MODEL_ID = 'the id of the model you have created'DATASOURCE_ID = 'the id of the data source you have created (the one related to census-batch.csv)'PREDICTION_ID = "batch_prediction_0001" # must be uniquePREDICTION_NAME = "bp_0001"OUTPUT_URI = "s3://your_bucket/dir_batch_0001" session = Session(aws_access_key_id=ID, aws_secret_access_key=KEY)client = session.client('machinelearning', region_name='eu-west-1')client.create_batch_prediction(BatchPredictionId=PREDICTION_ID, BatchPredictionName=PREDICTION_NAME,
MLModelId=MODEL_ID, BatchPredictionDataSourceId=DATASOURCE_ID, OutputUri=OUTPUT_URI)status = "PENDING"while status != "COMPLETED" and status != "FAILED" :
print(status)response = client.get_batch_prediction(BatchPredictionId=PREDICTION_ID)status = response['Status']time.sleep(3)
print(status)print(response)print("Your results are in s3!")
Pagina 58
AWS Machine Learning
Homework Assignment
For this homework, you’ll have one week of time to deliver the results.
In particular, the due is 20/12/2016, 23:59
You are asked to deliver back the code and the instructions to run it into a .zip file to one of our email addresses.
Pagina 59
AWS Machine Learning
Homework Assignment
For any kind of problem or information, please contact us!
Contacts:• Dario Molinari: [email protected]• Daniele De cillis: [email protected]• Lorenzo Vitali: [email protected]• Lukas Hermann: [email protected]• Milad Kiwan: [email protected]• Matteo Pallotta: [email protected]
Pagina 60