First, we get the input sound signal and convert it into discrete Fourier transform (DFT) spectrogram, and then we convert the spectrogram into pitch class contour (PCP). Then, according to the prepared chord template, PCP is used for pattern matching, and finally the root sound and the final chord type are obtained.
The algorithm first converts the input sound signal stream into a DFT graph, assuming that f s is the sampling frequency and x(n) is the nth input sound signal segment among n sampling points. Then the formula of DFT spectrum is as follows, where k = 0, 1, 2...N- 1, and our X(0), X( 1)...X(N/2- 1) represent our whole spectrum.
X(k) represents the sine coefficient of f s * (k/N) frequency wave. Our problem is that the frequency it can express is related to the number of sampling points. This is reasonable in theory. The richer (more detailed) the sampling points, the more detailed the frequency we can describe.
When we see the principle of Fourier transform, the method we will think of may be to construct the linear sum of sine and cosine functions, design many parameters, and finally get the parameters by some means.
In fact, there is nothing wrong with this idea in essence. Our goal is to decompose the function into the combination of sine and cosine functions. Some scientists have found that all periodic functions can be composed of addition and subtraction of sine and cosine functions.
Then the next question is, if the period of our original function is t, how can we ensure that the period of the combined function is also t?
If the period of sin(x) is 2π, then the period of sin(2x) is also 2π (although the minimum period is π). More generally, if the period of f(x) is t, then the period of the following formula is also t, therefore, the addition and subtraction of these functions can ensure that the composed functions have a certain period t.
Next, make some adjustments to the amplitude of trigonometric functions, and then add and subtract trigonometric functions.
Finally, the approximate sum we construct is like this. This formula expresses the linear combination of sine and cosine waves with infinite frequency.
Suppose our function can be decomposed as follows
If the complex plane is removed, it should be the imaginary part of the formula constructed by Euler formula.
Obviously, according to what I said just now, e it+e i2t represents the sum of two vectors.
More generally, we can write expressions of function vectors.
The dot product of a function vector is defined like this.
According to the definition of function dot product, we can calculate the following formula ourselves.
So how do you get the coordinates of the corresponding function vector?
Suppose w = a u+b v (w, u, v are all vectors, where u and v are orthogonal).
Then the coordinate a of the bottom u can be obtained by the following formula:
The coordinates of the vector function sin(x) should be
Now, let's review the previous assumptions.
We can rewrite it like this (all sine and cosine vector functions of different frequencies must be orthogonal in the same period).
We can get another form of f(x)
In ...
We can get the discrete form of f(x), which represents the function of wave superposition of P different frequencies, where ω is the unit frequency of all frequency waves and determines the effect of wave superposition, ω = 2π/n.
When n= 1, 2, 3 ... p, let
Because the unit frequency is 2π/N, there is such a special condition:
Where C n represents the DC component when n=0, all coefficients can be expressed in the range of n= 1 ... p (i.e. 1...(N- 1)/2), and the coefficients on n = p+ 1...n are only symmetric.
So, generally speaking, we need to find the coefficient of the first half.
Now let's review our formula, which I believe is easy to understand.
This is consistent with the description in the paper, and we only need the value of X(k = 0...N/2- 1). The equivalence of X(k) and C n here.
In fact, C n is the point coordinate of the wave with the corresponding frequency on the complex plane, and we can use this to calculate the amplitude of the wave with the corresponding frequency.
A certain point C n or X(k) of the corresponding spectrum after DFT can be represented by a complex number a+bi. Then the modulus of this complex number is Ak=√(a * a+b * b), then the amplitude a is
For a signal with n=0, its amplitude is 0, which usually exists as a whole offset, called DC component, and its amplitude is a1/n.
Finally, we notice that due to the symmetry of DFT results, we usually only use the first half of the results, that is, the results with less than half the sampling frequency. (According to Nyquist theory, the frequency of recoverable semaphore must be less than half the sampling frequency. )
For X(k), we can continue to derive PCP, and its 12-dimensional vector represents 12 semitones in the octave of the piano.
Let p = 0, 1, 2... 1 1, then we define the formula (1) PCP(p) as follows:
In this paper, 27 groups of chords are selected, and the template PCP is obtained by manual adjustment through the above process.
This paper gives two methods to evaluate the matching performance:
1, pcp smooth
2. Perception of chord transition
3. Pretreatment of Pentachlorophenol
4. Eliminate the irrelevant areas in M(l)
5. Use DFT window
6. Mute detection
7. Noise detection
cite
/Question /2323470 1/ Answer /260 17000
/questions/ 13722/pitch class-profiling
/questions/36752485/python-code-for-pitch class-profiling
/70549/