2017年2月24日星期五

Week 3

The equipment for this project has been settled and modified. Our aim of this week is to write the audio recognition program. There are two main advanced parts in audio recognition area: speech recognition and music recognition. The speech recognition is much more complicated. Although there are some very powerful speech recognition library for programming on the internet. As mentioned before, our system cannot reach that high sample rate (44.1kHz) to make a precise record of the sound. As a result, we need to modify the speech recognition library to realize the function. Obviously, this is a enormous work for the group of three. Therefore, we decided to develop the program to realize simple music recognition function through our devices. Meanwhile, due to the low sample frequency, our system may only recognize the music audio with around 1000 to 3000Hz frequency and the music data stored in this program should also be recorded through our system to get the same sample frequency. These are some basic limitation of our system, the relevant upgrade solutions will be discussed in week 5. As for the basic algorithm for the music recognition program, the supervisor suggested us to program on MATLAB. According to some discussions, we held the view that we'd better program directly on C++. One reason was that our system did not need very powerful mathematical ability as the audio data in our system did not have that high quality. MATLAB may cause the excess performance of the whole system. It will induce a high limitation of the compatibility of the system. On the other hand, there was not sufficient time for us to design the MATLAB program and combine it with our former data transmission program together. There will be a high failed risk.
Go back to the algorithm part, it could be divided into two main parts: Fast fourier transform for data hash and match the relevant frequency fingerprint. In this week we focused on the first part. For the music recognition program, frequency domain information and time domain information were both required to represent the characteristic of the audio clip. To maintain both frequency information and time information, the data chunk should be induced in the program. The program needed to divide the original data sequence into many small data chunk. The data of each chunk will be converted by discrete fourier transform to represent the frequency feature. Meanwhile, the sequence of chunks was in time order. Thus the serial number of chunks could represent the time information. In this project, the program selected 300 data points to store into one chunk. 300 points was a proper quantity to represent the features in frequency for the program to make the match. Because of the low sample frequency of the system, more points in the chunk will cause that the system may not have enough eigenvalue points to match the music. Consequently, it will reduce the accuracy.
Typically, the discrete fourier transform could be represented:
 According to Eulers equation, it also has the form:
As a result, in program, a complex number array was needed to store the results of the fourier transform of each chunks. Then the program will take the absolute value of each complex number to represent the amplitude. Typically in discrete fourier transform, as the magnitude of the complex number were often very high. In general, it takes the logarithm of each magnitude to represent the final results. In this program, the result of one chunk could be seen:
test
In this figure, y axis represented the amplitude while the x axis represent the multiples of the frequency. Typically, the envelop of the pattern was periodic. One period start form -150 to 150 for 300 data points. Because the envelop was symmetrical, in this program it start form 0 to 299. For the amplitude at 0 frequency, it represents the bias. Actually, it was relevant to the system while not the feature of the audio clip. As a result, this point could be neglected.
Summary: In this week, we designed the discrete fourier transform program successfully. Next week, we will focused on the match part of the whole system.

没有评论:

发表评论