Optimization and Inference for Computer Vision

miércoles, 18 de noviembre de 2015

Graphical models

In this post, we are going to deal the problem of image completion using millions of photographs from a huge database.

Given a target image to complete:

Finding the most similar images to the target from a huge database.
From every found image, find the most similar region to the hole of the target.
Using graphical models, automatically fine tune of the editing mask.
Copy the region.
Perform the Poisson Editing correction
Choose the best result

In particularly, we are going to focus on the 3rd step.

Once we have the similar image to the target, we placed the image into the hole of the incomplete image at its best placement using a seam finding and standard poisson blending to smooth the transition between the patch and the hole.

The seam finding operation tries to adapt the seam such that it could be possible that some pixel of the image won’t preserve. The algorithm use a relax constraint that allow the seam finding operation to remove some pixels from the given image but discourage the cutting of too many pixels by penalizing with a cost which increases with distance from the hole.

The seam finding operation minimizes the gradient of the image difference along the seam:

where C_d(p, L(p)) are the unary costs of assigning any pixel p, to a specific label L(p), and C_i(p, q, L(p), L(q)) is the cost for two pixels taking labels L(p) and L(q) respectively. Depending on if a pixel is taken from the match image or the incomplete image, the label of each pixel, L(p), has two values patch and exist. Thus, the value of C_d(p, exist) for a missing pixels of the existing image is a very large number. Likewise, the C_d(p, patch) is a very large number for those pixels that don’t belongs to the hole region.

The seam finding operation assigns a cost to each pixel taken from the incomplete image which increases with a pixel's distance from the hole.

C_i(p, q, L(p), L(q)) assigns a cost based on the labels of the immediately adjacent pixels. When the connected pixels p and q share the same label, L(p)=L(q), the cost is zero. Along the seams, when L(p)~=L(q), C_i(p, q, L(p), L(q)) is equal to the gradient of the SSD between the incomplete image and the match image at pixels p and q.

Then, this cost functional is minimized using a Loopy Belief Propagation. An image can be represented as a graph where each pixel is a node that could have two state values (patch, exist). The unary potentials (or factor) of each node is given by C_d. The edge potentials encode the relation between neighbors pixels or adjacent nodes, therefore, is given by C_i.

martes, 10 de noviembre de 2015

Poisson

Let S, a closed subset of R2, be the image definition domain, and let Ω be a closed subset of S with boundary dΩ. Let f* be a known scalar function defined over S minus the interior of Ω and let f be an unknown scalar function de fined over the interior of Ω. Finally, let v be a vector field defined over Ω.
To interpolate f of f* over Ω, the solution is:

To avoid the blurred interpolation, we can modify the problem by introducing further constrains in the form of a guidance field,

We choose the gradient field directly from the source image g,

So now, the solution to the problem is

We can discretize the laplacian operator as

So, the solution can be discretized as follows,

Results

Given a source and a destination image, we want to copy the eyes and the mouth of the girl (source) into the lena image (destination).

Gauss-Seidel scheme

The discretized model can be resolved by the Gauss-Seidel iterative method. The result is showed.

Conjugate gradient method

The model problem described above can be expressed as a linear system of equations:

where f are those pixels in the clone region. A is a matrix of size N x N, where N is the number of pixels in the clone region, and is defined as follows:

The vector b is a finite approximation of the source laplacian for all the pixels in the clone region. Moreover, the border pixels have the fixed value of their neighbors outside the clone region added on the laplacian value.

Once the matrix A and vector b are defined, the conjugate gradient method will attempt to solve the linear system of equations.

Conclusions

The results present a bit of blurring where the destination image have edges. This is happening because the seamless cloning blends the source and destination regions. The effect of this blending is more significant when the insertion of one object is close to another. However, the use of mixed gradients prevents this undesirable effect.

Segmentation

The Chan-Vese model is defined as follows,

The first term controls the regularity by penalizing the length. The second term penalizes the enclosed area of C to control its size. The third and fourth terms penalize discrepancy between u and f.

Defined H as the Heaviside function and δ the Dirac mass, its distributional derivative,

To be able to obtain δ, H is regularized as

So δ is

where t = φ.

In the function H(φ) we can see the set enclosed by C. So, the length of C is obtained as the total variation of H(φ),

For fixed φ, the optimal values of c1 and c2 are the region averages

Taking into account where H is defined, we can say that c1 is the average value of the set enclosed by C in the image f, whilst c2 is the average value of the outer points.

The evolution of φ can be discretized in space,

Where,

and

the forward difference in the x dimension

the backward difference in the x dimension and,

the central difference in the x dimension.

Knowing that :

is the function in our current iteration and

is the function in the next one,

we can discretize the derivative as

It means that we can isolate the function in the next iteration as follows,

Results

The parameters used in this results are the same except when it is specified something different: λ1 = λ2 = 1, ε= 1, η= 0.01 and dt = (10^-2 ).
These first examples are obtained using μ= 1 and ν= 0. Videos show on the left how φ evolves, and on the right the curve C that describes in the image f.

Effects of μ

The parameter μ is used to penalize the length of C. It means that if μ is small, the length(C) is less penalized, and the curve can be larger and it can fit more accurately the input image. On the other hand, if is larger the boundaries of the curve will be smoother.

In the videos below we can observe the results using a different μ .

This first case is using μ= 1 and ν = 0.5 The second case is using μ= 0.6 and ν = 0.5, and finally the third case is μ= 0.3 and ν = 0.5. In the first case, as we penalize more the length, it can not fit the shapes as well, in consequence the curve is smoother. In the third case, as we don't penalize too much the length, it can expand more, for this reason we can see that there are selected more shapes that those we want. With a medium value as we see in the second case, the result is quite better and it fits the shapes that we want.

Effects of ν

The parameter ν is used to control the size and penalize the area inside C. When ν is too smaller (it could be negative values) the boundary expands to fill the full domain, but if is too larger, the boundary shrinks until it vanishes.

In the videos below we can observe the results by using different ν . We observe that in the first case where μ= 0.5 and ν = 0, as the area is less penalize, the curve expands without fit any shape. In the third case as the area of C is more penalized the curve decreases without fitting any shape and at last it vanishes. In the second case, with a medium value, the curve fits well the shapes.

When it doesn't work

When the mean value inside the curve and the mean value outside are the same ( c1 = c2), φ doesn't change. In consequence the curve never fit the shapes.

In the video below we can see an example with an image with the same mean values in all the image. We can observe that due to the length restriction, φ goes down, so the length of the curve decreases. Even though, the constants keep the same inside and outside so, the curve can't evolve. It continues the same way until at last the curve vanishes.

Phantom19.png

martes, 20 de octubre de 2015

Image Denoising

Denoising Modelling

The goal is to apply function optimizations to denoise an image.

The image denoising is modelled as the following energy functionals which have to be minimized:

Tikhonov

Rudin, Osher and Fatemi (ROF):

For each denoising model (Tikhonov and ROF), the first integral computes the "smoothness" of the resulting image, and the second integral computes the "similarity" of the resulting image with the input image.

The gradient is computed following the Euler-Lagrange equations, which give us:

We will use this gradient to iteratively descend the gradient to find the value of u that minimizes the energy functional.

Discretization

The gradients are defined in continuous space. To compute them we have to discretize them, and we do this as:

From which we can deduce an explicit iterative method for each model:

Tikhonov:

ROF:

This iterative method is the one implemented to find the u that minimizes the energy functional.

Parameters

λ : Similarity weight

Lambda weights out the importance of the similarity parameter, versus the smoothness parameter. The greater the lambda, the smoother the image will be, as the similarity parameter will be weighted less.

The value has to be chosen depending on the input image. For an input image with small noise, one is interested in retreiving a similar image, there is no need for a big smoothing, therefore the chosen lambda should be low.

Some examples on how λ affects the end result:

As we can see, the greater the λ, the less noise there is in the image, but also countours are removed.

ε: Zero offset

Epsilon is a simple offset for the ROF model, as the |∇u| is not differentiable at zero. Smaller values of ε are, the more blurred the restored image is. The ε parameter restricts the minimum energy which the smoothness term can reach. For instance, if ε is too small the resulting image will remain noisy. Otherwise, if ε is too big the resulting image will be too smooth and the information of the contours will be completely lost.

dt: Discretization time

The discretization time is the step amount along the gradient between iterations. This step has to be small enough to ensure that the iterative algorithm reaches the minimum of the functional, but not too small or it will require many iterations to find the minimum. If the steps are too big the iterative algorithm may diverge, therefore it will never reach the minimal solution. This can be seen in the following image which are shown 3 results obtained for differents values of dt.

Seeds

The minumum result is independent of the initial seed, as the functional is convex and there are no restrictions, so the solution is unique. Therefore the starting point only affects the number of iterations the method will need to reach the result.

In this examples we can see the effect of the seed, using Thikonov with λ = 15 and dt= 0.1:

Energy

The energy decreases every iteration because the iterative process follows the energy gradient. Each step will reduce the energy of the functional until we reach the minimum u or reach the iterations limit.

Here we can see an example of the energy in Thikonov, using as a λ = 15, dt = 0.1 and with Neumann boundary conditions:

Boundary Conditions

With boundary condition Dirichlet, where the boundary pixels have value 0, the edge pixels are greatly affected with the Tikhonov model, as the resulting image is highly smoothed. This smoothness makes the boundary zero values to “smudge” into the image. The lambda factor can be used to lower the amount of smoothness and weight more the similarity to the image, reducing the black border generated.

Here we can see an example of how affect Dirichlet boundary conditions to an image, using Tikhonov model with λ = 150 and dt = 0.1:

Here we can see an example of how affect Dirichlet boundary conditions to an image, using ROF model with λ = 150 and dt = 0.1:

It is important to note that with the ROF model we do not see a black border. This is because the ROF model penalizes less the edges, so the black contour created is not smudged into the image.

With boundary condition Neumann, where the boundary pixels have the same value as the edge pixels (mirror-like), the edge pixels are not affected by the boundary condition. This is expected, as the gradient at the borders is zero (which is the only term computed in the borders).

Here we can see an example of how affect Neumann boundary conditions to an image, using Tikhonov model with λ = 150 and dt = 0.1:

Here we can see an example of how affect Neumann boundary conditions to an image, using ROF model with λ = 0.1 and dt = 10^-5.

Data Fidelity term

One could ask himself "¿Why is the data fidelity term needed? If we want to denoise we just need a smooth image".

The data fidelity term makes sure that the computed image is “similar” to the noisy image. Together with the smoothing term the resulting image is similar and smooth. If the similarity term is not used, then the image will only be smooth, a plain image with the average value of the seed image.

Conclusions

We can conclude that both methods, using both boundary conditions are not good enough for a real world problem.
Tikhonov denoising generates images too smooth, greatly removing the image high frequencies, making it barely visible.
ROF denoising correctly removes the noise, but at the cost of losing the intermediate grayscales, having as a result a staircase effect, where segments of the image are represented by a value of gray, with great value differences between patches.

Image Inpainting

ROF Dual Formulation

We saw on the last entry the formulation of the Total Variation (TV) as:

This formulation has two problems:

It is not differentiable at the origin
It is not differentiable if u has discontinuities

Therefore we need another formulation for the |∇u|. A definitions suggests:

Where p is a vector of 2 components (2D images) which has the same direction as ∇u when the |∇u| ≠ 0.

With this new definition, we can now use it in a new formulation of the ROF model, a dual formulation.

We define the ROF as follows:

Where

This means that we have two variables to minimize, therefore we need to constrain the first, minimize with respect to the second, then constrain the second and minimize respect to the first.

From the Euler-Lagrange equations we see that constraining p we get that the optimal u is:

Substituting this value into the original model minimizing with respect to p we get that the gradient descent of p is:

In the discrete form:

Which is the explicit algorithm we will use to find the minimum of the ROF model.

Inpainting Modelling

The goal is to apply function optimizations to inpaint an image.

The inpainting model can be defined as:

Where u is the image to be computed, and f is the input image, missing some parts to be inpainted. This model states the the resulting image is f at the borders of the missing pixels, and is the TV minimum value at the missing pixels.

We can reformulate it, again, as a dual model:

Where we compute the image u as the solution of the ROF dual model using v as input and viceversa, and v is then used to recompose the image where the information is known, using as source the input image to be inpainted f.

This leads again to a three step iterative approach.

1. Fixing v, and minimizing

2. Then fixing u and minimizing

Which, as u is constant, becomes:

And by simple inspection we can see that the minimum is when u = v.

3. The last step is to restore the computed image v with the known information, from f. We fill the known pixels in v with the pixels from f.

Parameters

λ : Similarity weight

Trying out values for lambda we conclude that it has to be less than 0.1 to get a reasonably good image, not blurred.

Boundary Conditions

The boundary conditions in this model are derived from the internal models. We know that the best boundary condition for the ROF model is Neumann, therefore we can derive that the best condition for p is Dirichlet, because p is the ∇u normalized, therefore in the borders the gradient should be zero for Neumann conditions on u.

Seed

The seed that we are using in the inpainting model is the input image f.

We can deduce this from the equation

Where u in the initial iteration is equal to f, as the p is (0,0) (Dirichlet Conditions), and div(0,0) = 0.

Differences between primal and dual

Minimizing with respect to v

As commented in the introduction:

Fixing u and minimizing

Which, as u is constant, becomes:

And by simple inspection we can see that the minimum is when u = v.

Multichannel extension of Total Variation

The vectorial TV norm for multichannel images, as RGB images, is defined as follows:

where u and p are vectorial functions where each component corresponds to one of the three channels of RGB. There are many ways to compute the vectorial TV norm. In this post, we defined as the sum of the scalar TV of each channel independently.

Given this formulation, the optimization process for the multichannel extension is equivalent to optimize each component separately as scalar functions. Thus, the equations for solving this problem are the same as we mentioned earlier.

Since the three channels are considered independent, the results of the inpainting process wouldn't be as good as expected.

Here there is the obtained result from the optimization process. As we can see the minimization problem needs more iteration to remove completely the text in red. Although we can see how the text is partially removed and the red holes have been blurred a little bit.

Examples with the given images

Here we can see some examples of the algorithm applied in the given images. These images are created using λ = 0.1 and 200 iterations.

Conclusions

We can conclude that the more damage the original image is, the less resolution the resulting image has and more iterations are needed to reach the minimum. For instance, the pictures 3 and 4 from the image above need more iteration due to they lost more information from the original image.

Moreover, the quality of the restored image depend on how the lost information is spread across the image. If the image have big holes, it will be more difficult to restore it completely and the inpainting results will have big homogeneous areas where there wasn't information.