The recent twitter spat between Elon Musk and Mark Zuckerberg shows that clever people are thinking about Artificial Intelligence (AI). The problem is these two people are on opposite sides of a debate on the possible risks and rewards of AI. Elon Musk is not the only influential person to warn about the potential negative impact of AI, Stephen Hawking and Bill Gates have also issued warnings. We are still a long way from the AI of science fiction stories which risesup and builds an army of human hating killing machines. The two most widespread forms of AI we have are machine learning (ML) and deep learning (DL). Both are examples of narrow AI that is they may be able outperform humans in one narrow task but unlike the multi functional human brain they are unable to carry out unrelated tasks, to reason or be selfaware. Our world is increasingly controlled by Artificial Intelligence (AI), and it is not just the self driving cars and chess playing computers that the media often report on. Google, Amazon, Facebook, your bank, the government, security/intelligence organizations, the police and others all use AI. Often this is Machine Learning (ML) but increasingly Deep Learning (DL) is being used to solve a diverse range of problems including medical issues, Amazon’s Echo and business applications. But what is it? Is it just another form of machine learning? Though they have some things in common DL and ML are not the same. To illustrate how DL works I will use a basic DL script written in Python, the script is available below. The problem we are going to attempt to solve involves creating a model which divides data into catagories. This type of problem is common in the real world, for example customers who renew their subscription/contract versus those who don’t, computer activity which is suspicious or benign, a fuzzy patch on an MRI scan that might be benign or might be cancer. The script can generate three types of dataset. The first, called moons, can be plotted on a 2 dimensional graph: The two classes are coloured red and blue. Our model will be represented by a decision boundary, a line, on the graph which separates the blue and red dots. For any new data point we will then be able to use our model to determine if it belongs to the reds or the blues. As you can see a simple straight line will not be a sufficient to solve the problem: Feeding the data into the DL script generates the following solution: The solution is not perfect, but it is pretty good. Note that we are not trying to get every single point on the correct side of the line – that situation is called over fitting, it is not a good thing. Real world data often contains outliers and noise, we don’t necessarily want to include this in our model. The Script import numpy as np
import sklearn from sklearn import datasets, linear_model import matplotlib.pyplot as plt def initialise(a,b,c,d): nn_input_dim = a #number of input nodes nn_output_dim = b #number of output nodes epsilon = c # learning rate for gradient descent reg_lambda = d # regularization strength return nn_input_dim, nn_output_dim, epsilon, reg_lambda def generate_data(data_type): np.random.seed(0) if data_type == 'moons': X, y = datasets.make_moons(200, noise=0.20) elif data_type == 'circles': X, y = sklearn.datasets.make_circles(200, noise=0.20) elif data_type == 'blobs': X, y = sklearn.datasets.make_blobs(centers=2, random_state=0) return X, y def visualize(X, y, model): plot_decision_boundary(lambda x:predict(model,x), X, y) def plot_decision_boundary(pred_func, X, y): # Set min and max values and give it some padding x_min, x_max = X[:, 0].min()  .5, X[:, 0].max() + .5 y_min, y_max = X[:, 1].min()  .5, X[:, 1].max() + .5 h = 0.01 # Generate a grid of points with distance h between them xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h)) # Predict the function value for the whole gid Z = pred_func(np.c_[xx.ravel(), yy.ravel()]) Z = Z.reshape(xx.shape) # Plot the contour and training examples plt.contourf(xx, yy, Z, cmap=plt.cm.Spectral) plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Spectral) plt.show() # Helper function to evaluate the total loss on the dataset def calculate_loss(model, X, y): num_examples = len(X) # training set size W1, b1, W2, b2 = model['W1'], model['b1'], model['W2'], model['b2'] # Forward propagation to calculate our predictions z1 = X.dot(W1) + b1 a1 = np.tanh(z1) z2 = a1.dot(W2) + b2 exp_scores = np.exp(z2) probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True) # Calculating the loss corect_logprobs = np.log(probs[range(num_examples), y]) data_loss = np.sum(corect_logprobs) # Add regulatization term to loss (optional) data_loss += reg_lambda / 2 * (np.sum(np.square(W1)) + np.sum(np.square(W2))) return 1. / num_examples * data_loss def predict(model, x): W1, b1, W2, b2 = model['W1'], model['b1'], model['W2'], model['b2'] # Forward propagation z1 = x.dot(W1) + b1 a1 = np.tanh(z1) z2 = a1.dot(W2) + b2 exp_scores = np.exp(z2) probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True) return np.argmax(probs, axis=1) # This function learns parameters for the neural network and returns the model. #  nn_hdim: Number of nodes in the hidden layer #  num_passes: Number of passes through the training data for gradient descent #  print_loss: If True, print the loss every 1000 iterations def build_model(X, y, nn_hdim, num_passes=20000, print_loss=False): # Initialize the parameters to random values. We need to learn these. num_examples = len(X) np.random.seed(0) W1 = np.random.randn(nn_input_dim, nn_hdim) / np.sqrt(nn_input_dim) b1 = np.zeros((1, nn_hdim)) W2 = np.random.randn(nn_hdim, nn_output_dim) / np.sqrt(nn_hdim) b2 = np.zeros((1, nn_output_dim)) # This is what we return at the end model = {} # Gradient descent. For each batch... for i in range(0, num_passes): # Forward propagation z1 = X.dot(W1) + b1 a1 = np.tanh(z1) z2 = a1.dot(W2) + b2 exp_scores = np.exp(z2) probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True) # Backpropagation delta3 = probs delta3[range(num_examples), y] = 1 dW2 = (a1.T).dot(delta3) db2 = np.sum(delta3, axis=0, keepdims=True) delta2 = delta3.dot(W2.T) * (1  np.power(a1, 2)) dW1 = np.dot(X.T, delta2) db1 = np.sum(delta2, axis=0) # Add regularization terms (b1 and b2 don't have regularization terms) dW2 += reg_lambda * W2 dW1 += reg_lambda * W1 # Gradient descent parameter update W1 += epsilon * dW1 b1 += epsilon * db1 W2 += epsilon * dW2 b2 += epsilon * db2 # Assign new parameters to the model model = {'W1': W1, 'b1': b1, 'W2': W2, 'b2': b2} # Optionally print the loss. # This is expensive because it uses the whole dataset, so we don't want to do it too often. if print_loss and i % 1000 == 0: print("Loss after iteration %i: %f" % (i, calculate_loss(model, X, y))) return model nn_input_dim, nn_output_dim, epsilon, reg_lambda = initialise(2,2,0.01,0.01) X, y = generate_data('moons') model = build_model(X, y, 4, 10000, print_loss=True) visualize(X, y, model) Note: this script is derived from: https://github.com/dennybritz/nnfromscratch/blob/master/ann_classification.py
0 Comments
Leave a Reply. 
This blog includes:Scripts mainly in Python with a few in R covering NLP, Pandas, Matplotlib and others. See the home page for links to some of the scripts. Also includes some explanations of basic data science terminology. Archives
June 2018
