Audio Recognition System based on Arduino UNO: 2017

2017年3月6日星期一

Week 5

Till now, the project was almost finished. In this week, we focused on the program optimization and the poster designing.

As mentioned before, the program will match the chunk information to recognize the music. However, the chunks from the original song and the music clip cannot match each other accurately. The new chunk may slide some data points. Because that the chunk in this program contains 300 data points. As a result, the maximum uncertainty for the chunk match may occur when the chunk slid 150 data points. To determine the uncertainty reservation area, we took a 300 data points chunk from a music clip and then took another chunk which slid 150 data points to the original chunk. we tested 20 groups of these kinds of chunks. The result of the fingerprint of these chunks:

Table 1. Test data

According to the results, the average value of the uncertainty was about 3. we set the reservation area as 3 for the index of areas: 0-49, 50-99, 100-150.

On the other hand, the poster could be seen in Figure 1.

Figure 1. Project Poster

2017年3月1日星期三

Week 4

In this week, our group focused on the final part of the program: normalized the chunk data in frequency domain and designed the match algorithm.

To normalized the chunk data, we firstly made some tests to get the characteristic. We record a 10-second clip from one song and divided the data into chunks and then picked the first chunk. the second chunk, the tenth chunk and modify the first chunk with the data in lower amplitude. The result could be seen in Figure 1.

Figure 1. Test: a). the first chunk, b). the second chunk, c). the tenth chunk, d). the first chunk with lower amplitude.

In Figure 1, it could be seen that the envelop of the first and the second chunk were quite similar. While when it went to the tenth chunk, the shapes were quite different. It is to say that in a very short period of time, the frequency property of the chunks did not change very much. Actually, when the system record the music clip, it could not match the data in original song chunk-to-chunk perfectly. There was a high probability that the chunk will slip some data points. The above property could make sure that the record music clip could align to the original sound track (which we want to recognize). Now make the comparison between a) and d). It could be seen that the oscillation in d) was less than that in a). While the location (frequency) of the maximum value of amplitude was almost the same. As a result, the normalization value of each chunk could be the frequency number of the point with maximum amplitude in frequency domain: 1 to 49, 50 to 99, 100 to 150 (because of the symmetry, the points higher than 50 frequency number should be neglected). And then, these value will form the fingerprint of the audio clip like:

43 58 108

49 58 108

44 64 128

43 65 115

28 65 115

44 86 128

49 55 105

44 55 105

29 69 119

29 68 118

29 64 114

43 86 134

43 87 137

Then in match part, we firstly record the music with 20 seconds. Stored the data in fingerprint format into a text file. To make it easy to demonstrate. We could call it the mothersoundtruck. Then we record a 10 seconds music which included in the mothersoundtruck time domain and transfer it and stored in one another text file. We could call it littleclip. Then the program will compare the first row data (the first chunk) of the littleclip to all rows in mothersoundtruck. If the data could matched within 5. Then the rest chunks in littleclip will match the chunks in mothersoundtruck in sequence. Finally, the porgram will return the numbers that how many chunks could match between mothersoundtruck and littleclip. As a result, which mothersoundtruck has the maximum points will be the music which we want to recognized.
In the test part, we stored four musics with 20 seconds as the database. When we display the animals in 10 seconds, the result of the program:

While when we display Fur Elise, the result:

Consequently, it could recognize the music quite accurate. However the program may crush because the vectors in the program might be processed exceed its length. We need to improve the program next week.

Summary: In the end of this week, we almost finished the whole project. While the program still needed to be optimized next week.

2017年2月24日星期五

Week 3

The equipment for this project has been settled and modified. Our aim of this week is to write the audio recognition program. There are two main advanced parts in audio recognition area: speech recognition and music recognition. The speech recognition is much more complicated. Although there are some very powerful speech recognition library for programming on the internet. As mentioned before, our system cannot reach that high sample rate (44.1kHz) to make a precise record of the sound. As a result, we need to modify the speech recognition library to realize the function. Obviously, this is a enormous work for the group of three. Therefore, we decided to develop the program to realize simple music recognition function through our devices. Meanwhile, due to the low sample frequency, our system may only recognize the music audio with around 1000 to 3000Hz frequency and the music data stored in this program should also be recorded through our system to get the same sample frequency. These are some basic limitation of our system, the relevant upgrade solutions will be discussed in week 5. As for the basic algorithm for the music recognition program, the supervisor suggested us to program on MATLAB. According to some discussions, we held the view that we'd better program directly on C++. One reason was that our system did not need very powerful mathematical ability as the audio data in our system did not have that high quality. MATLAB may cause the excess performance of the whole system. It will induce a high limitation of the compatibility of the system. On the other hand, there was not sufficient time for us to design the MATLAB program and combine it with our former data transmission program together. There will be a high failed risk.
Go back to the algorithm part, it could be divided into two main parts: Fast fourier transform for data hash and match the relevant frequency fingerprint. In this week we focused on the first part. For the music recognition program, frequency domain information and time domain information were both required to represent the characteristic of the audio clip. To maintain both frequency information and time information, the data chunk should be induced in the program. The program needed to divide the original data sequence into many small data chunk. The data of each chunk will be converted by discrete fourier transform to represent the frequency feature. Meanwhile, the sequence of chunks was in time order. Thus the serial number of chunks could represent the time information. In this project, the program selected 300 data points to store into one chunk. 300 points was a proper quantity to represent the features in frequency for the program to make the match. Because of the low sample frequency of the system, more points in the chunk will cause that the system may not have enough eigenvalue points to match the music. Consequently, it will reduce the accuracy.
Typically, the discrete fourier transform could be represented:

According to Eulers equation, it also has the form:

As a result, in program, a complex number array was needed to store the results of the fourier transform of each chunks. Then the program will take the absolute value of each complex number to represent the amplitude. Typically in discrete fourier transform, as the magnitude of the complex number were often very high. In general, it takes the logarithm of each magnitude to represent the final results. In this program, the result of one chunk could be seen:

test

In this figure, y axis represented the amplitude while the x axis represent the multiples of the frequency. Typically, the envelop of the pattern was periodic. One period start form -150 to 150 for 300 data points. Because the envelop was symmetrical, in this program it start form 0 to 299. For the amplitude at 0 frequency, it represents the bias. Actually, it was relevant to the system while not the feature of the audio clip. As a result, this point could be neglected.
Summary: In this week, we designed the discrete fourier transform program successfully. Next week, we will focused on the match part of the whole system.

2017年2月20日星期一

Week 2

Since the circuit of this project assembled successfully, this week, we were mainly focusing on audio sample process on Arduino board the data transmission between Arduino board and computer through serial port. We firstly focused on the sample program. Typically, function analogRead () could be used to sample the analogue signal in Arduino language. However, analogRead could only provide the sample rate 9600 Hz. It was far low to the common sample rate 44.1KHz. To make a higher sample rate, we need to reset the ADC in Arduino board and bypass the function analogRead. In Arduino, ADC takes 13 clock cycles to read a new analog value (25 for the first data point). As a result, we could set the clock of ADC to 500000 Hz to make the sample rate 500/13=~38.5KHz which was near to 44.1KHz which is the common sample rate. To reset the clock of ADC, we needed to set the prescaler as 32. It could devide down the processor clock speed which is 16MHz set the frequency of ADC as 16MHZ/32 = 500KHz.
As for the transmission program, according to some researches, we finally decided to use c++/cli language to code the transmission program. c++/cli contains some powerful class functions for serial port transmission. Otherwise, we need to call API functions to design a new serial port class. This is a complicated task for a three-member group and it will increase the fail risk for the whole project. Meanwhile, we could still use c++ language to write the program for audio processing and matching part. There are some excellent examples and references on the NETfreamwork official website.
When we working in this part, we found the instant transport rate of serial port cannot reach sample rate even it reach the maximum ordinary baud rate which is 115200. Meanwhile the Arduino board did not have enough storage space for the sampled data. According to the discussion, we decided to continue the original plan. If the sample rate has a big impact to the recognition process, we will let the system only recognition the audio with around 1000 Hz as the sample rate of our system is around 2500 Hz.
After we finished the program for this part, when we tested it, we found that the quantity of data that computer got cannot match that Arduino board sent. It was always larger than the required number. According to some surveys, the reason exposed that the rate of computer reading the data was much larger than the Arduino board writing rate. To solve this problem, we set a condition to ensure that computer read each data after the Arduino board has wrote each. The core code in this part could be seen:
c++/cli code:
static void Read()
{
String^ name;
StringComparer^ stringComparer = StringComparer::OrdinalIgnoreCase;
// Create a new SerialPort object with default settings.
_serialPort = gcnew SerialPort();

// Allow the user to set the appropriate properties.
_serialPort->PortName = "COM3";//SetPortName(_serialPort->PortName);
_serialPort->BaudRate = 115200; //SetPortBaudRate(_serialPort->BaudRate);
//_serialPort->Parity = SetPortParity(_serialPort->Parity);
//_serialPort->DataBits = SetPortDataBits(_serialPort->DataBits);
//_serialPort->StopBits = SetPortStopBits(_serialPort->StopBits);
//_serialPort->Handshake = SetPortHandshake(_serialPort->Handshake);

_serialPort->ReadTimeout = 0; //ms
_serialPort->WriteTimeout = 500;
_serialPort->ReadBufferSize = 1048576;
_serialPort->Open();
_continue = true;
int a = 0;
ofstream fout("sample.txt");
int x = 0;
Console::WriteLine("Loading...");
_serialPort->WriteLine("0");
while (_continue&&x <= 23000)
{
try
{
if (_serialPort->BytesToRead != a){
a = _serialPort->BytesToRead;
String^message = _serialPort->ReadLine();

MarshalString(message, mes);
fout << mes << endl;
//Console::WriteLine(message);
x = x + 1;

}
}
catch (TimeoutException ^) {}

}
_serialPort->DiscardInBuffer();
_serialPort->DiscardOutBuffer();
_serialPort->Close();
fout.close();
cout << "finish" << endl;
//cin >> x;

}
Where function MarshalString was used to convert the system string into std string.

Arduino code:
int start;
String serialString="";
boolean readCompleted = false;
boolean sendFlag=false;
void setup() {

//clear ADCSRA and ADCSRB registers
ADCSRA = 0;
ADCSRB = 0;

ADMUX |= (1 << REFS0); //set reference voltage
ADMUX |= (1 << ADLAR); //left align the ADC value- so we can read highest 8 bits from ADCH register only

ADCSRA |= (1 << ADPS2) | (1 << ADPS0); //set ADC clock with 32 prescaler- 16mHz/32=500kHz
ADCSRA |= (1 << ADATE); //enabble auto trigger
ADCSRA |= (1 << ADEN); //enable ADC
ADCSRA |= (1 << ADSC); //start ADC measurements
Serial.begin(115200);
serialString.reserve(200);
}

void loop() {
int counter=0;
if(Serial.available()>0)
{
sendFlag=true;
char inChar=(char)Serial.read();
if(inChar!='\n')
{
serialString += inChar;
}
else{
sendFlag=true;
}
}

if(sendFlag)
{
start=millis();
while(counter<=23000){
Serial.println(ADCH);
counter=counter+1;}
sendFlag=false;
}

}

Summary: In this week, we designed the arduino sample program and the serial port transmission program. Although the sample rate cannot reach the common sample rate 44.1KHz, the whole system could record the audio data with a low frequency successfully. However, there are some error data points appeared in the frond of data stream. Next week, we need to go trough this problem and then focus on the core part: audio recognition program.

2017年2月12日星期日

Week 1

The apparatuses in this project were collected successfully. In this week, the aim of our group is to assemble the microphone amplify circuit and code Arduino program to test it. We firstly measured the voltage range of the microphone without amplified. The range finally we measured through oscilloscope is around -190 mV to 200 mV. The voltage range of the Arduino board analogue in point is 0 to 5 volt. Thus the amplifier circuit need to amplifier the sound signal from ±200 mV to ±2 V and then set the voltage offset 2 volt. In this circuit, we use the operational amplifier to built the amplify circuit. The operational amplifier was shown in Figure 1.

The voltage gain of the operational amplifier could be shown:

Figure 1. Operational amplifier

For operational amplifier analysis, two terminal of the amplifier have the same voltage. Thus the voltage could be shown:

As a result, the amplify circuit could be seen in Figure 2 and Figure 3.

Figure 2. Amplifier circuit diagram

Figure 3. Amplifier circuit

According to the circuit diagram, point A (9 volt) worked as the ground for the whole system. So point B (0 volt) could provide -9 volt for op amp. Resistors C and D were combined together to adjust the amplification. D was the feedback resistor. Capacitor E is to prevent the DC signal from the right part of the circuit. While F is the bypass capacitor. Resistors G and H were combined together to form the voltage bias circuit as the Arduino only receive the analog signal with range 0 to 5 volt. In above circuit, the function generator represents the input signal of the microphone.

Finally, to tested the circuit, we wrote a Arduino program that the Arduino board would put the digital data of the analogue signal to the serial port. Then through serial port tool:

Figure 4. Serial port display

Summary: The work of this week implemented smoothly. In next week, we will contribute to the circuit modification and Arduino board to computer communication.

2017年2月9日星期四

Project Introduction

The project of our group is the audio recognition system based on Arduino uno microprocessor. It could be divided into three parts: the microphone signal amplify circuit, the analog signal sample and transport system powered by Arduino uno microprocessor and the computer processing program. The expected work principle is using microphone amplify circuit to crate the audio analog signal and then sampling it by Arduino uno microprocessor. At the same time, the microprocessor will send the digital data to computer through serial port. In the end, the computer will process the received digital data and make the output of the whole system. This project covers electric circuit, microprocessor processing programming, serial communication and c++ programming. Owing to the limitation of apparatus and the team size, the performance of the audio recognition system (the type of audio source which to be recognized) needs to be determined through the following work weeks.

订阅：评论 (Atom)