An alternative to searching for weight values is to use a directed optimization process.
Optimization is a search process, but instead of sampling the space of possible solutions randomly or exhaustively, the search process uses any available information to make the next step in the search, such as toward a set of weights that has lower error.
The SciPy library offers many excellent optimization algorithms, including local and global search methods.
SciPy provides an implementation of the Differential Evolution method. This is one of the few stochastic global search algorithms that “just works” for function optimization with continuous inputs, and it works well.
The differential_evolution() SciPy function requires that function is specified to evaluate a set of weights and return a score to be minimized. We can minimize the classification error (1 – accuracy).
As with the grid search, we most normalize the weight vector before we evaluate it. The loss_function() function below will be used as the evaluation function during the optimization process.
1 2 3 4 5 6 | # loss function for optimization process, designed to be minimized def loss_function(weights, members, testX, testy): # normalize weights normalized = normalize(weights) # calculate error rate return 1.0 - evaluate_ensemble(members, normalized, testX, testy) |
We must also specify the bounds of the optimization process. We can define the bounds as a five-dimensional hypercube (e.g. 5 weights for the 5 ensemble members) with values between 0.0 and 1.0.
1 2 | # define bounds on each weight bound_w = [(0.0, 1.0) for _ in range(n_members)] |
Our loss function requires three parameters in addition to the weights, which we will provide as a tuple to then be passed along to the call to the loss_function()each time a set of weights is evaluated.
1 2 | # arguments to the loss function search_arg = (members, testX, testy) |
We can now call our optimization process.
We will limit the total number of iterations of the algorithms to 1,000, and use a smaller than default tolerance to detect if the search process has converged.
1 2 | # global optimization of ensemble weights result = differential_evolution(loss_function, bound_w, search_arg, maxiter=1000, tol=1e-7) |
The result of the call to differential_evolution() is a dictionary that contains all kinds of information about the search.
Importantly, the ‘x‘ key contains the optimal set of weights found during the search. We can retrieve the best set of weights, then report them and their performance on the test set when used in a weighted ensemble.
1 2 3 4 5 6 | # get the chosen weights weights = normalize(result['x']) print('Optimized Weights: %s' % weights) # evaluate chosen weights score = evaluate_ensemble(members, weights, testX, testy) print('Optimized Weights Score: %.3f' % score) |
Tying all of this together, the complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 | # global optimization to find coefficients for weighted ensemble on blobs problem from sklearn.datasets.samples_generator import make_blobs from sklearn.metrics import accuracy_score from keras.utils import to_categorical from keras.models import Sequential from keras.layers import Dense from matplotlib import pyplot from numpy import mean from numpy import std from numpy import array from numpy import argmax from numpy import tensordot from numpy.linalg import norm from scipy.optimize import differential_evolution
# fit model on dataset def fit_model(trainX, trainy): trainy_enc = to_categorical(trainy) # define model model = Sequential() model.add(Dense(25, input_dim=2, activation='relu')) model.add(Dense(3, activation='softmax')) model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']) # fit model model.fit(trainX, trainy_enc, epochs=500, verbose=0) return model
# make an ensemble prediction for multi-class classification def ensemble_predictions(members, weights, testX): # make predictions yhats = [model.predict(testX) for model in members] yhats = array(yhats) # weighted sum across ensemble members summed = tensordot(yhats, weights, axes=((0),(0))) # argmax across classes result = argmax(summed, axis=1) return result
# # evaluate a specific number of members in an ensemble def evaluate_ensemble(members, weights, testX, testy): # make prediction yhat = ensemble_predictions(members, weights, testX) # calculate accuracy return accuracy_score(testy, yhat)
# normalize a vector to have unit norm def normalize(weights): # calculate l1 vector norm result = norm(weights, 1) # check for a vector of all zeros if result == 0.0: return weights # return normalized vector (unit norm) return weights / result
# loss function for optimization process, designed to be minimized def loss_function(weights, members, testX, testy): # normalize weights normalized = normalize(weights) # calculate error rate return 1.0 - evaluate_ensemble(members, normalized, testX, testy)
# generate 2d classification dataset X, y = make_blobs(n_samples=1100, centers=3, n_features=2, cluster_std=2, random_state=2) # split into train and test n_train = 100 trainX, testX = X[:n_train, :], X[n_train:, :] trainy, testy = y[:n_train], y[n_train:] print(trainX.shape, testX.shape) # fit all models n_members = 5 members = [fit_model(trainX, trainy) for _ in range(n_members)] # evaluate each single model on the test set testy_enc = to_categorical(testy) for i in range(n_members): _, test_acc = members[i].evaluate(testX, testy_enc, verbose=0) print('Model %d: %.3f' % (i+1, test_acc)) # evaluate averaging ensemble (equal weights) weights = [1.0/n_members for _ in range(n_members)] score = evaluate_ensemble(members, weights, testX, testy) print('Equal Weights Score: %.3f' % score) # define bounds on each weight bound_w = [(0.0, 1.0) for _ in range(n_members)] # arguments to the loss function search_arg = (members, testX, testy) # global optimization of ensemble weights result = differential_evolution(loss_function, bound_w, search_arg, maxiter=1000, tol=1e-7) # get the chosen weights weights = normalize(result['x']) print('Optimized Weights: %s' % weights) # evaluate chosen weights score = evaluate_ensemble(members, weights, testX, testy) print('Optimized Weights Score: %.3f' % score) |
Running the example first creates five single models and evaluates the performance of each on the test dataset.
Your specific results will vary given the stochastic nature of the learning algorithm.
We can see on this run that models 3 and 4 both perform best with an accuracy of about 82.2%.
Next, a model averaging ensemble with all five members is evaluated on the test set reporting an accuracy of 81.8%, which is better than some, but not all, single models.
1 2 3 4 5 6 7 | (100, 2) (1000, 2) Model 1: 0.814 Model 2: 0.811 Model 3: 0.822 Model 4: 0.822 Model 5: 0.809 Equal Weights Score: 0.818 |
The optimization process is relatively quick.
We can see that the process found a set of weights that pays most attention to models 3 and 4, and spreads the remaining attention out among the other models, achieving an accuracy of about 82.4%, out-performing the model averaging ensemble and individual models.
1 2 | Optimized Weights: [0.1660322 0.09652591 0.33991854 0.34540932 0.05211403] Optimized Weights Score: 0.824 |
It is important to note that in these examples, we have treated the test dataset as though it were a validation dataset. This was done to keep the examples focused and technically simpler. In practice, the choice and tuning of the weights for the ensemble would be chosen by a validation dataset, and single models, model averaging ensembles, and weighted ensembles would be compared on a separate test set.