In this week, our group focused on the final part of the program: normalized the chunk data in frequency domain and designed the match algorithm.
To normalized the chunk data, we firstly made some tests to get the characteristic. We record a 10-second clip from one song and divided the data into chunks and then picked the first chunk. the second chunk, the tenth chunk and modify the first chunk with the data in lower amplitude. The result could be seen in Figure 1.
 |
| Figure 1. Test: a). the first chunk, b). the second chunk, c). the tenth chunk, d). the first chunk with lower amplitude. |
In Figure 1, it could be seen that the envelop of the first and the second chunk were quite similar. While when it went to the tenth chunk, the shapes were quite different. It is to say that in a very short period of time, the frequency property of the chunks did not change very much. Actually, when the system record the music clip, it could not match the data in original song chunk-to-chunk perfectly. There was a high probability that the chunk will slip some data points. The above property could make sure that the record music clip could align to the original sound track (which we want to recognize). Now make the comparison between a) and d). It could be seen that the oscillation in d) was less than that in a). While the location (frequency) of the maximum value of amplitude was almost the same. As a result, the normalization value of each chunk could be the frequency number of the point with maximum amplitude in frequency domain: 1 to 49, 50 to 99, 100 to 150 (because of the symmetry, the points higher than 50 frequency number should be neglected). And then, these value will form the fingerprint of the audio clip like:
43 58 108
49 58 108
49 58 108
44 64 128
43 65 115
28 65 115
44 86 128
49 55 105
49 55 105
44 55 105
29 69 119
29 68 118
29 68 118
29 64 114
43 86 134
43 87 137
Then in match part, we firstly record the music with 20 seconds. Stored the data in fingerprint format into a text file. To make it easy to demonstrate. We could call it the mothersoundtruck. Then we record a 10 seconds music which included in the mothersoundtruck time domain and transfer it and stored in one another text file. We could call it littleclip. Then the program will compare the first row data (the first chunk) of the littleclip to all rows in mothersoundtruck. If the data could matched within 5. Then the rest chunks in littleclip will match the chunks in mothersoundtruck in sequence. Finally, the porgram will return the numbers that how many chunks could match between mothersoundtruck and littleclip. As a result, which mothersoundtruck has the maximum points will be the music which we want to recognized.
In the test part, we stored four musics with 20 seconds as the database. When we display the animals in 10 seconds, the result of the program:
While when we display Fur Elise, the result:
Consequently, it could recognize the music quite accurate. However the program may crush because the vectors in the program might be processed exceed its length. We need to improve the program next week.
Summary: In the end of this week, we almost finished the whole project. While the program still needed to be optimized next week.