UFLDL-PCA

这是UFLDL的编程练习。具体教程参照官网。

PCA

Whitening

白化是一种重要的预处理过程,其目的就是降低输入数据的冗余性,使得经过白化处理的输入数据具有如下性质:(i)特征之间相关性较低;(ii)所有特征具有相同的方差。

白化处理分PCA白化和ZCA白化,PCA白化保证数据各维度的方差为1,而ZCA白化保证数据各维度的方差相同。PCA白化可以用于降维也可以去相关性,而ZCA白化主要用于去相关性,且尽量使白化后的数据接近原始输入数据。

PCA白化ZCA白化都降低了特征之间相关性较低,同时使得所有特征具有相同的方差。

  1. PCA白化需要保证数据各维度的方差为1,ZCA白化只需保证方差相等。
  2. PCA白化可进行降维也可以去相关性,而ZCA白化主要用于去相关性另外。
  3. ZCA白化相比于PCA白化使得处理后的数据更加的接近原始数据。

    Regularizaion

    编程作业代码(建议作为参考,自己先独立完成):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
%% Step 0a: Load data
% Here we provide the code to load natural image data into x.
% x will be a 144 * 10000 matrix, where the kth column x(:, k) corresponds to
% the raw image data from the kth 12x12 image patch sampled.
% You do not need to change the code below.

x = sampleIMAGESRAW();
figure('name','Raw images');
randsel = randi(size(x,2),200,1); % A random selection of samples for visualization
display_network(x(:,randsel));

%%================================================================
%% Step 0b: Zero-mean the data (by row)
% You can make use of the mean and repmat/bsxfun functions.

% -------------------- YOUR CODE HERE --------------------
avg = mean(x,1);
x = x - repmat(avg,size(x,1),1);
%%================================================================
%% Step 1a: Implement PCA to obtain xRot
% Implement PCA to obtain xRot, the matrix in which the data is expressed
% with respect to the eigenbasis of sigma, which is the matrix U.


% -------------------- YOUR CODE HERE --------------------
xRot = zeros(size(x)); % You need to compute this
U = zeros(size(x,1));
sigma = x*x'/size(x,2);
[U,S,V] = svd(sigma);
xRot = U'*x;

%%================================================================
%% Step 1b: Check your implementation of PCA
% The covariance matrix for the data expressed with respect to the basis U
% should be a diagonal matrix with non-zero entries only along the main
% diagonal. We will verify this here.
% Write code to compute the covariance matrix, covar.
% When visualised as an image, you should see a straight line across the
% diagonal (non-zero entries) against a blue background (zero entries).

% -------------------- YOUR CODE HERE --------------------
covar = zeros(size(x, 1)); % You need to compute this
%[covar,S,V] = svd(sigma);
covar = xRot*xRot'/size(xRot,2) ;
% Visualise the covariance matrix. You should see a line across the
% diagonal against a blue background.
figure('name','Visualisation of covariance matrix');
imagesc(covar);

%%================================================================
%% Step 2: Find k, the number of components to retain
% Write code to determine k, the number of components to retain in order
% to retain at least 99% of the variance.

% -------------------- YOUR CODE HERE --------------------
k = 0; % Set k accordingly
POVV = 0;
for k = 1:size(x,1)
POVV = sum(sum(S(1:k,1:k)))/sum(sum(S));
if POVV >=0.99
break
end
k
end

%%================================================================
%% Step 3: Implement PCA with dimension reduction
% Now that you have found k, you can reduce the dimension of the data by
% discarding the remaining dimensions. In this way, you can represent the
% data in k dimensions instead of the original 144, which will save you
% computational time when running learning algorithms on the reduced
% representation.
%
% Following the dimension reduction, invert the PCA transformation to produce
% the matrix xHat, the dimension-reduced data with respect to the original basis.
% Visualise the data and compare it to the raw data. You will observe that
% there is little loss due to throwing away the principal components that
% correspond to dimensions with low variation.

% -------------------- YOUR CODE HERE --------------------
xHat = zeros(size(x)); % You need to compute this
xRot = U(:,1:k)'*x;
xHat = U(:,1:k)*xRot;

% Visualise the data, and compare it to the raw data
% You should observe that the raw and processed data are of comparable quality.
% For comparison, you may wish to generate a PCA reduced image which
% retains only 90% of the variance.

figure('name',['PCA processed images ',sprintf('(%d / %d dimensions)', k, size(x, 1)),'']);
display_network(xHat(:,randsel));
figure('name','Raw images');
display_network(x(:,randsel));

%%================================================================
%% Step 4a: Implement PCA with whitening and regularisation
% Implement PCA with whitening and regularisation to produce the matrix
% xPCAWhite.

epsilon = 0.1;
xPCAWhite = zeros(size(x));
xPCAWhite = diag(1./sqrt(diag(S)+epsilon))*U'*x;

% -------------------- YOUR CODE HERE --------------------

%%================================================================
%% Step 4b: Check your implementation of PCA whitening
% Check your implementation of PCA whitening with and without regularisation.
% PCA whitening without regularisation results a covariance matrix
% that is equal to the identity matrix. PCA whitening with regularisation
% results in a covariance matrix with diagonal entries starting close to
% 1 and gradually becoming smaller. We will verify these properties here.
% Write code to compute the covariance matrix, covar.
%
% Without regularisation (set epsilon to 0 or close to 0),
% when visualised as an image, you should see a red line across the
% diagonal (one entries) against a blue background (zero entries).
% With regularisation, you should see a red line that slowly turns
% blue across the diagonal, corresponding to the one entries slowly
% becoming smaller.
% -------------------- YOUR CODE HERE --------------------
covar = xPCAWhite*xPCAWhite'/size(xPCAWhite,2);
% Visualise the covariance matrix. You should see a red line across the
% diagonal against a blue background.
figure('name','Visualisation of covariance matrix');
imagesc(covar);

%%================================================================
%% Step 5: Implement ZCA whitening
% Now implement ZCA whitening to produce the matrix xZCAWhite.
% Visualise the data and compare it to the raw data. You should observe
% that whitening results in, among other things, enhanced edges.
epsilon = 0.1;
xZCAWhite = zeros(size(x));
xZCAWhite = U*diag(1./sqrt(diag(S)+epsilon))*U'*x;

% -------------------- YOUR CODE HERE --------------------

% Visualise the data, and compare it to the raw data.
% You should observe that the whitened images have enhanced edges.
figure('name','ZCA whitened images');
display_network(xZCAWhite(:,randsel));
figure('name','Raw images');
display_network(x(:,randsel));