Implementation Example - Bike Sharing

Let’s take a Kaggle dataset, bike sharing, as an example. Say we are a bike sharing company that wants to forecast the number of bike rentals each day in order to better manage the bike’s maintenance, logistics and other aspects of business.

Description: image alt text

Rentals mainly depend on the weather conditions, so with the weather forecast, that company could get a better idea when rentals will peak, and try to avoid maintenance on these days.

First, we train a model and save it as a pickle object which can be seen in the Jupyter notebook.

Model training and performance is not dealt with here, this is just an example for understanding the full process.

Then we write the data transformation that will be done at each API call:

import numpy as np

import pandas as pd

from datetime import date

def doTheCalculation(data):

               data['dayofyear']=(data['dteday']-

               data['dteday'].apply(lambda x: date(x.year,1,1))

               .astype('datetime64[ns]')).apply(lambda x: x.days)

               X = np.array(data[['instant','season','yr','holiday','weekday','workingday',

                               'weathersit','temp','atemp','hum','windspeed','dayofyear']])

               return X

This is just a calculation of a variable (day of year) to include both the month and the precise day. There is also a selection of columns and their respective order to be kept.

We need, then, to write the REST API with Flask:

from flask import Flask, request, redirect, url_for, flash, jsonify

from features_calculation import doTheCalculation

import json, pickle

import pandas as pd

import numpy as np

app = Flask(__name__)

@app.route('/api/makecalc/', methods=['POST'])

def makecalc():

"""

               Function run at each API call

"""

               jsonfile = request.get_json()

               data = pd.read_json(json.dumps(jsonfile),orient='index',convert_dates=['dteday'])

               print(data)

               res = dict()

               ypred = model.predict(doTheCalculation(data))

               for i in range(len(ypred)):

                   res[i] = ypred[i]

               return jsonify(res)

if __name__ == '__main__':

               modelfile = 'modelfile.pickle'

               model = pickle.load(open(modelfile, 'rb'))

               print("loaded OK")

               app.run(debug=True)

Run this program, it will serve the API on port 5000 by default.

If we test a request locally, still with Python:

import requests, json

url = '[http://127.0.0.1:5000/api/makecalc/](http://127.0.0.1:5000/api/makecalc/)'

text = json.dumps({"0":{"instant":1,"dteday":"2011-01-01T00:00:00.000Z","season":1,"yr":0,"mnth":1,"holiday":0,"weekday":6,"workingday":0,"weathersit":2,"temp":0.344167,"atemp":0.363625,"hum":0.805833,"windspeed":0.160446},

                               "1":{"instant":2,"dteday":"2011-01-02T00:00:00.000Z","season":1,"yr":0,"mnth":1,"holiday":0,"weekday":3,"workingday":0,"weathersit":2,"temp":0.363478,"atemp":0.353739,"hum":0.696087,"windspeed":0.248539},

                               "2":{"instant":3,"dteday":"2011-01-03T00:00:00.000Z","season":1,"yr":0,"mnth":1,"holiday":0,"weekday":1,"workingday":1,"weathersit":1,"temp":0.196364,"atemp":0.189405,"hum":0.437273,"windspeed":0.248309}})

The request contains all the information that was fed to the model. Therefore, our model will respond with a forecast of bike rentals for the specified dates (here we have three of them).

headers = {'content-type': 'application/json', 'Accept-Charset': 'UTF-8'}

r = requests.post(url, data=text, headers=headers)

print(r,r.text)

<Response [200]> {

  "0": 1063,

  "1": 1028,

  "2": 1399

That’s it! This service could be used in any company’s application easily, for maintenance planning or for users to be aware of bike traffic, demand, and the availability of rental bikes.

Putting it all Together

The major flaw of many machine learnings systems, and especially PoCs, is to mix training and prediction.

If they are carefully separated, real-time predictions can be performed quite easily for an MVP, at a quite low development cost and effort with Python/Flask, especially if, for many PoCs, it was initially developed with Scikit-learn, Tensorflow, or any other Python machine learning library.

However, this might not be feasible for all applications, especially applications where feature engineering is heavy, or applications retrieving the closest match that need to have the latest data available at each call.

In any case, do you need to watch movies over and over to answer questions about them? The same rule applies to machine learning!