Visualising augmented images in Keras

Wed, 28 Dec 2022 00:00:00 +0000

Data augmentation

Data augmentation is been used in deep learning for many reasons. One of the reason is to reduce overfitting and makes the model more robust. Data augmentation can be done relatively easy in keras package in R. However, I have not found any resources on how to visualise the augmented image in R except in Python. Visualising the augmented image can be quite useful to get an idea of how the image looks like. So, this post covers a simple to do this in R.

R code

Let’s load the keras library

library(keras)

## Warning: package 'keras' was built under R version 4.2.2

Next, we load the image from the internet.

r_logo <- 
  get_file("img", "https://ih1.redbubble.net/image.522493300.6771/st,small,507x507-pad,600x600,f8f8f8.jpg") %>% 
  image_load()

Our image right now is 600x 600 x 3. The 3 at the back because the image is coloured (RGB channels).

r_logo$size

## [[1]]
## [1] 600
## 
## [[2]]
## [1] 600

So, we need to change the image into an array with the dimension of 1 x 600 x 600 x 3. The number 1 indicates we have only one image.

r_logo <- 
  r_logo %>% 
  image_to_array() %>% 
  array_reshape(c(1, dim(.)))
dim(r_logo)

## [1]   1 600 600   3

Once we have a correct dimension, we can specify the parameters for the data augmentation.

augment_params <- image_data_generator(horizontal_flip = T, 
                                       vertical_flip = T,
                                       rotation_range = 0.5,
                                       zoom_range = 0.5,
                                       fill_mode = "reflect")

I am not going to into the details of the parameters. For those interested, the TensorFlow for R website explain this very well.

Next, we can generate the batch of augmented data at random. This function, however, will only run once we fit the model.

img_gen <- flow_images_from_data(r_logo,
                                 generator = augment_params, 
                                 batch_size = 1)

Finally, we can plot the image. Firstly, this is our original image.

img_gen$x [1,,,] %>% 
  as.raster(max = 255) %>% 
  as.array() %>% 
  plot()

Now, we going to loop the augmentation process. Here, we going to generate six augmented images. The set.seed for reproducibility.

set.seed(123)
par(mfrow = c(3, 2), mar = c(1, 0, 1, 0))

for (i in 1:6) {
  IMG <- img_gen$`next`()
  IMG[1,,,] %>% as.raster(max = 255) %>% as.array() %>% plot()
}

Conclusion

I believe this is quite useful to get a sense of how your data is augmented. Consequently, this may help in selecting the parameters for the data augmentation.

Exponentially Weighted Average in Deep Learning

Sun, 09 May 2021 00:00:00 +0000

I have been reading about lost functions and optimisers in deep learning for the last couple of days when I stumble upon the term Exponentially Weighted Average (EWA). So, in this post I aims to explain my understanding of EWA.

Overview of EWA

EWA basically is an important concept in deep learning and have been used in several optimisers to smoothen the noise of the data.

Let’s see the formula for EWA:

V_t is some smoothen value at point t, while S_t is a data point at point t. B here is a hyperparameter that we need to tune in our network. So, the choice of B will determine how many data points that we average the value of V_t as shown below:

EWA in deep learnings’ optimiser

So, some of the optimisers that adopt the approach of EWA are (red box indicates the EWA part in each formula):

Stochastic gradient descent (SGD) with momentum

The issue with SGD is the present of noise while searching for global minima. So, SGD with momentum integrated the EWA, which reduces these noises and helps the network converges faster.

Adaptive delta (Adadelta) and Root Mean Square Propagation (RMSprop)

Adadelta and RMSprop are proposed in attempt to solve the issue of diminishing learning rate of adaptive gradient (Adagrad) optimiser. The use of EWA in both optimisers actually helps to achieve this. Both optimisers have quite a similar formula, but attached below is the formula for Adadelta.

Adaptive moment estimation (ADAM)

ADAM basically combined the SGD with momentum with Adadelta. As shown earlier, both optimisers use EWA.

More details on EWA

Now, let’s go back to EWA. Here is the example of calculation of EWA:

Keep in mind that t₃ is the latest time point, followed by t₂ and t₁, respectively. So, if we want to calculate V₃:

So, if we were to varies the value of B across the equation (while the values of a₁…a_n remain constant), we can do so in R.

library(tidyverse) 

func <- function(b) (1 - b) * b^((20:1) - 1)
beta <- seq(0.1, 0.9, by=0.2)

dat <- t(sapply(beta, func)) %>% 
  as.data.frame()
colnames(dat)[1:20] <- 1:20

dat %>%  
  mutate(beta = as_factor(beta)) %>%
  pivot_longer(cols = 1:20, names_to = "data_point", values_to = "weight") %>% 
  ggplot(aes(x=as.numeric(data_point), y=weight, color=beta)) +
  geom_line() +
  geom_point() +
  scale_x_continuous(breaks = 1:20) +
  labs(title = "Change of Exponentially Weighted Average function", 
       subtitle = "Time at t20 is the recent time, and t1 is the initial time") +
  scale_colour_discrete("Beta:") +
  xlab("Time(t)") +
  ylab("Weights/Coefficients") +
  theme_bw()

Note that time at t₂₀ is the recent time, and t₁ is the initial time. Thus, two main points from the above plot are:

The EWA function acts in a decaying manner.
As beta, B increases we actually put more emphasize on the recent data point.

Side note: I have tried to do the plot in plotly, not sure why it did not work 😕

References:
1) https://towardsdatascience.com/deep-learning-optimizers-436171c9e23f (all the equations are from this reference)
2) https://youtu.be/NxTFlzBjS-4
3) https://medium.com/@dhartidhami/exponentially-weighted-averages-5de212b5be46

Deep Learning | Tengku Hanis