Blind
Single-Channel Music Source Separation by Non-Negative Matrix Factorization
Using Perceptual Filtering
Abstract
We propose a new approach that improves
perceptual quality of the separated sources in blind single-channel musical
source separation. It uses the
advantages of subspace learning based on Non-negative Matrix Factorization
(NMF) in which the bases represent notes. Weighted Kullback-Leibler (KL) and
Itakura-Saito (IS) divergence type of cost functions are formulated by adopting
the PEAQ auditory model defined in ITU-R BS.1387 into the source
separation. The proposed perceptually
weighted factorization scheme is integrated into the Non-negative Matrix Factor
2-D Deconvolution (NMF2D) and Clustered Non-negative Matrix Factorization
(CNMF) to overcome the source clustering problem encountered in
under-determined source separation. It
is shown that the introduced perceptually weighted NMF schemes, named as
PW-NMF2D and PW-CNMF, efficiently
learn the bases that enable us to apply a simple resynthesis of the musical
sources based on the temporal model stored in the encoding matrix. Source
separation performance has been reported on musical mixtures where 1-2 dB
improvement is achieved in terms of SDR, SIR and SAR. Performance has also been
evaluated by perceptual measures resulting an improvement of 2-5 in OPS, TPS,
IPS and APS values. Comparison with the state of the art methods illustrate
that the PW-NMF2D and PW-CNMF constitute promising alternatives for single
channel blind source separation.
Separation
of Two Sources from a Single Observation
Test
Files (Dataset 1)
Source
Signals |
Mixture |
Separated
Signals (PW-NMF2D) |
Separated
Signals (PW-CNMF) |
|||
Performance (Dataset 1)
Methods |
OPS |
TPS |
IPS
|
APS |
SDR |
SIR
|
SAR |
PW-NMF2D |
29.69 |
51.52 |
47.37 |
48.82 |
10.67 |
15.20 |
15.11 |
PW-CNMF |
27.90 |
51.24 |
51.75 |
43.67 |
10.61 |
17.06 |
14.68 |
Test
Files (Dataset 2)
Source
Signals |
Mixture |
Separated
Signals (PW-NMF2D) |
Separated
Signals (PW-CNMF) |
|||
Performance (Dataset 2)
Methods |
OPS |
TPS |
IPS
|
APS |
SDR |
SIR
|
SAR |
PW-NMF2D |
26.75 |
43.73 |
42.33 |
50.56 |
12.38 |
15.88 |
18.45 |
PW-CNMF |
32.96 |
42.25 |
58.14 |
41.19 |
13.27 |
17.25 |
20.73 |
Separation
of Three Sources from a Single Observation
Test
Files (Dataset 1)
Source
Signals |
Mixture |
Separated
Signals (PW-NMF2D) |
Separated
Signals (PW-CNMF) |
||||||
Performance (Dataset 1)
Methods |
OPS |
TPS |
IPS
|
APS |
SDR |
SIR
|
SAR |
PW-NMF2D |
26.09 |
27.67 |
37.91 |
31.35 |
4.33 |
6.67 |
11.16 |
PW-CNMF |
25.04 |
30.40 |
44.19 |
27.16 |
4.51 |
7.46 |
11.91 |