Blind Single-Channel Music Source Separation by Non-Negative Matrix Factorization Using Perceptual Filtering

Abstract

We propose a new approach that improves perceptual quality of the separated sources in blind single-channel musical source separation.  It uses the advantages of subspace learning based on Non-negative Matrix Factorization (NMF) in which the bases represent notes. Weighted Kullback-Leibler (KL) and Itakura-Saito (IS) divergence type of cost functions are formulated by adopting the PEAQ auditory model defined in ITU-R BS.1387 into the source separation.  The proposed perceptually weighted factorization scheme is integrated into the Non-negative Matrix Factor 2-D Deconvolution (NMF2D) and Clustered Non-negative Matrix Factorization (CNMF) to overcome the source clustering problem encountered in under-determined source separation.  It is shown that the introduced perceptually weighted NMF schemes, named as PW-NMF2D and PW-CNMF, eciently learn the bases that enable us to apply a simple resynthesis of the musical sources based on the temporal model stored in the encoding matrix. Source separation performance has been reported on musical mixtures where 1-2 dB improvement is achieved in terms of SDR, SIR and SAR. Performance has also been evaluated by perceptual measures resulting an improvement of 2-5 in OPS, TPS, IPS and APS values. Comparison with the state of the art methods illustrate that the PW-NMF2D and PW-CNMF constitute promising alternatives for single channel blind source separation.

 

Separation of Two Sources from a Single Observation

Test Files (Dataset 1)

 

 

Source Signals

 

Mixture

Separated Signals

(PW-NMF2D)

Separated Signals

(PW-CNMF)

s1

s2

mix

es1

es2

es1

es2

s1

s2

mix

es1

es2

es1

es2

s1

s2

mix

es1

es2

es1

es2

s1

s2

mix

es1

es2

es1

es2

s1

s2

mix

es1

es2

es1

es2

s1

s2

mix

es1

es2

es1

es2

s1

s2

mix

es1

es2

es1

es2

s1

s2

mix

es1

es2

es1

es2

s1

s2

mix

es1

es2

es1

es2

s1

s2

mix

es1

es2

es1

es2

s1

s2

mix

es1

es2

es1

es2

s1

s2

mix

es1

es2

es1

es2

s1

s2

mix

es1

es2

es1

es2

 

Performance (Dataset 1)

Methods        

OPS     

TPS  

   IPS     

APS     

SDR   

  SIR     

SAR

PW-NMF2D   

29.69   

51.52    

47.37   

48.82   

10.67   

15.20   

15.11

PW-CNMF     

27.90    

51.24   

51.75   

43.67    

10.61   

17.06   

14.68

 

Test Files (Dataset 2)

 

 

Source Signals

 

Mixture

Separated Signals

(PW-NMF2D)

Separated Signals

(PW-CNMF)

s1

s2

mix

es1

es2

es1

es2

s1

s2

mix

es1

es2

es1

es2

s1

s2

mix

es1

es2

es1

es2

s1

s2

mix

es1

es2

es1

es2

 

Performance (Dataset 2)

Methods        

OPS     

TPS  

   IPS     

APS     

SDR   

  SIR     

SAR

PW-NMF2D   

26.75

43.73

42.33

50.56

12.38

15.88

18.45

PW-CNMF     

32.96

42.25 

58.14

41.19 

13.27

17.25

20.73

 

 

Separation of Three Sources from a Single Observation

Test Files (Dataset 1)

 

 

Source Signals

 

Mixture

Separated Signals

(PW-NMF2D)

Separated Signals

(PW-CNMF)

s1

s2

s3

mix

es1

es2

es3

es1

es2

es3

s1

s2

s3

mix

es1

es2

es3

es1

es2

es3

s1

s2

s3

mix

es1

es2

es3

es1

es2

es3

s1

s2

s3

mix

es1

es2

es3

es1

es2

es3

s1

s2

s3

mix

es1

es2

es3

es1

es2

es3

 

Performance (Dataset 1)

Methods        

OPS     

TPS  

   IPS     

APS     

SDR   

  SIR     

SAR

PW-NMF2D   

26.09

27.67

37.91

31.35

4.33

6.67

11.16

PW-CNMF     

25.04

30.40

44.19

27.16

4.51

7.46

11.91