import matplotlib.pyplot as plt
import numpy as np
ML Notes
Machine Learning (supervised learning) notes
Use of algorithms and statistical models to perform tasks without explicit instructions, instead using patterns and inference.
Examples:
- House price predictor
- Netflix recommendations
- Marketing
Matplotlib:
- xlabel
- ylabel
- title
- plot
- show
Practical Example - Plotting
# Effect of time spent walking (hours) on the distance travelled (miles)
= [1, 2, 3, 4, 5] # (independent variable)
time_spent_walking = [2, 4, 6, 8, 10] # (dependent variable)
distance
plt.plot(time_spent_walking, distance)"Time Spent Walking (hours)")
plt.xlabel("Distance (miles)")
plt.ylabel("Effect of time spent walking on distance travelled")
plt.title( plt.show()
# Effect of car age (years) on price ($)
= [1, 2, 5, 10, 30]
car_age = [30000, 25000, 18000, 10000, 4000]
price
plt.plot(car_age, price)"Car Age (years)")
plt.xlabel("Price ($)")
plt.ylabel("Effect of car age on price")
plt.title( plt.show()
# Effect of amount of time spent studying (hours) on test score results (%)
= [1, 5, 10, 20, 50, 100]
study_time = [40, 60, 70, 75, 88, 93]
test_score_results
plt.plot(study_time, test_score_results)"Study Time (hours)")
plt.xlabel("Test Score (%)")
plt.ylabel("Effect of study time on test scores")
plt.title( plt.show()
Linear Regression: y = mx + b
x
: x axisy
: y axism
: gradient of lineb
: value ofy
whenx
= 0
Example 1
= [1, 2, 3, 4, 5]
x = [2, 4, 6, 8, 10]
y
plt.plot(x, y) plt.show()
# y = 2x + 2
for i in x:
= 2 * i
y print(y)
2
4
6
8
10
Example 2
= [1, 2, 3, 4, 5]
x = [6, 9, 12, 15, 18]
y
plt.plot(x, y) plt.show()
# y = mx + b
for i in x:
= (3 * i) + 3
y print(y)
6
9
12
15
18
Example 3
= [0, 1, 2, 3, 4]
x = [6, 9, 12, 15, 18]
y
plt.plot(x, y) plt.show()
# y = mx + b
# y = 3x + 6
for i in x:
= (3 * i) + 6
y print(i, y)
0 6
1 9
2 12
3 15
4 18
Practical Examples - Linear Regression (y = mx + b
)
Question 1
# Effect of amount of water provided (L) per day on size of trees (m)
= [0, 1, 2, 3, 4, 5, 6, 7, 8]
water = [4, 5, 6, 7, 8, 9, 10, 11, 12]
tree_size
plt.plot(water, tree_size) plt.show()
Solution
# y = mx + b
for x in water:
= (1 * x) + 4
y print(x, y)
# y = x + 4
0 4
1 5
2 6
3 7
4 8
5 9
6 10
7 11
8 12
Question 2
# Effect of wingspan (cm) on flying speed (km/h)
= [0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
wingspan = [0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50]
flying_speed
plt.plot(wingspan, flying_speed) plt.show()
Solution
# y = mx + b
for x in wingspan:
= 0.5 * x
y print(x, y)
# y = 0.5x + 0
0 0.0
10 5.0
20 10.0
30 15.0
40 20.0
50 25.0
60 30.0
70 35.0
80 40.0
90 45.0
100 50.0
Question 3
# Effect of number of gifts given to employees each year, on staff statisfaction levels (100%)
= [0, 1, 2, 3, 4, 5]
num_of_gifts = [50, 55, 60, 65, 70, 75]
satisfaction
plt.plot(num_of_gifts, satisfaction) plt.show()
Solution
# y = mx + b
for x in num_of_gifts:
= (5 * x) + 50
y print(x, y)
# y = 5x + 50
0 50
1 55
2 60
3 65
4 70
5 75
Line of Best Fit
Cost: distance of each point from best fit line
Loss: Sum of all distances between best fit line and data points (sum of costs)
Mean Squared Error (MSE)
Mean Squared Error = sum (Y @ prediction - y at datapoint)^2 / number of datapoints
Example
# Data
= [1, 5, 8, 10]
x_data = [4, 8, 9, 7]
y_data
# Data plot
plt.plot(x_data, y_data) plt.show()
Best fit line (y = mx + b)
# Calculate m, b
= np.polyfit(x_data, y_data, 1)
m, b
# Best fit line equation:
for x in x_data:
= (m * x) + b
y print(x, y)
1 5.04347826086957
5 6.608695652173917
8 7.7826086956521765
10 8.56521739130435
Cost & Loss - Mean Squared Error
variable | ||||
---|---|---|---|---|
x | 1 | 5 | 8 | 10 |
y | 4 | 8 | 9 | 7 |
y[hat] | 5 | 6 | 7 | 8 |
cost | 1 | 4 | 4 | 1 |
# cost = (y[hat] - y) ^ 2 / 4
= 10 / 4
mse mse
2.5
Logistic Regression
Overfitting
Best fit too accurately defines training data and minimizes the loss… but thats only training data and new data will keep the model humble.
Random Forests
Random forests takes an average or majority decision from numerous decision trees and creates an output from this