Data Preprocessing

We will use the music21 toolkit (a toolkit for computer-aided musicology, MIT) to extract data from these MIDI files.

A.     Notes Extraction

 

 

The function get_notes returns a list of notes and chords present in the .mid file. We use the converter.parse function to convert the midi file in a stream object, which in turn is used to extract notes and chords present in the file. The list returned by the function get_notes() looks as follows:

 

We can see that the list consists of pitches and chords (represented as a list of integers separated by a dot). We assume each new chord to be a new pitch on the list. As letters are used to generate words in a sentence, similarly the music vocabulary used to generate music is defined by the unique pitches in the notes list.

 

 

B . Generating Input and Output Sequences

A neural network accepts only real values as input and since the pitches in the notes list are in string format, we need to map each pitch in the notes list to an integer. We can do so as follows:

Next, we will create an array of input and output sequences to train our model. Each input sequence will consist of 100 notes, while the output array stores the 101st note for the corresponding input sequence. So, the objective of the model will be to predict the 101st note of the input sequence of notes.

Next, we reshape and normalize the input vector sequence before feeding it to the model. Finally, we one-hot encode our output vector.