Confidence score #242

JonasFrey96 · 2023-02-04T14:14:36Z

makes currently a lot of sense when we have a bimodal distribution but makes little sense when we don't have one

JonasFrey96 · 2024-02-16T09:56:16Z

maybe we can improve this - visualizing this plot would be nice

JonasFrey96 · 2024-02-17T21:23:17Z

I checked the confidence generation:
In update_running_mean:

# The confidence used to be computed as the distance to the center of the Gaussian given factor*sigma
# This is certainly wrong given that the gaussian is simply the wrong function for the job
confidence = torch.exp(-(((x - self.mean) / (self.std * self.std_factor)) ** 2) * 0.5)
confidence[x < self.mean] = 1.0
# My suggestion is the following - we define the 0.5 confidence value to be at self.mean + self.std*self.std_factor
# And then we use two other points e.g. plus and minus two std to define the 1 confidence and 0 confidence points


# shifted_mean = self.mean + self.std*self.std_factor
# interval_min = shifted_mean - 2 * self.std
# interval_max = shifted_mean + 2 * self.std
# x = torch.clip( x , interval_min, interval_max)
# confidence = 1 - ((x - interval_min) / (interval_max - interval_min))

In inference_without_update -> which is used for visualization, we used to do something completely different.

JonasFrey96 · 2024-02-17T21:25:15Z

Still the problem would remain when we start out training the confidence is very high everywhere and then only gets smaller for regions over time - therefore initially the traversability is over-optimistic

mmattamala · 2024-02-17T22:42:39Z

I'm not sure I understood the coment above. Was it mainly about the fact that we compute the confidence in different ways in the confidence generator using in the training loop, vs the one that generates the published messages?

I believe we should rethink this to make it more principled. I think that many things we tried out for the paper were mostly driven by wanting to make the system work (subject to the deadline constraint).

Some general agreements we have discussed:

The formula for the Gaussian was wrong because of the 1/4 factor, which whould be 0.5 Sketchy standard deviation scaling parameter #238
Scaling the traversability is also wrong, because it changes the physical meaning of what we are estimating Computing traversability threshold #232
We should use the same strategy to compute the confidence, and to estimate it for visualization/publishing

I'm thinking that maybe should go back to basics before getting crazy with the formulation. I'll use loss for the loss, c for the confidence, and t for time. Let's use the reference image we used in the paper

What we know:

We are bootstrapping the reconstruction error to get a confidence
That means that low reconstruction error should imply more confidence. This should be true independently of the fact that the loss distribution changes over time.

Simple approach

I would propose that:

We "hardcode" the confidence c=1.0 at loss=0.0.
We define the "cutoff point" c=0.0 at loss=(mean of the distribution at t=0.0). In the figure, we should set it at loss=5.0
Then, we define a function that will interpolate between these values, it can be linear (as you did at the beginning), half Gaussian (as we have now), or a sigmoid.

Pros This definition should ensure that at t=0.0 we will get low confidence, and it does not need to explicitly label the positive and unknown samples, because the confidence model gets fixed at the beginning. No need for running means or Kalman filters. We don't need to set an adaptive threshold either, we just rely on the initial condition.

Cons The initial loss distribution (positive + unknown) could change. The plot from the paper shows it doesn't change that much (the mean of the grey histogram stayed centered at loss=5.0. But if we implement this plot as a debug visualization, we could confirm if this is the case.

Adaptive approach

A next iteration would be to make the threshold adaptive as we did. The main trend we should expect is that as we collect more data, we will be more and more conservative about what we feel confident about, but it will not change about the unknown things.

Probably the best way would be to fit 2 models as you tried. One for the positive samples and another for the unknown ones. The loss histogram should be normalized to express a true distribution.
We could use a GMM but ideally it should be two Poisson or Gamma distributions, because they are always positive, not like the Gaussians that will do weird things around loss=0 (like underestimating the distribution, and then the sigma value).
Then, we could define the cutoff point as the midpoint between the peaks of both distributions p_positive, and p_unknown: c=0.0 at loss=0.5*(p_positive + p_unknown)
- At the beginning, both peaks should coincide, so the cutoff stays far from zero. In the example, it should be around loss=5.0, same as the simple case. Then, the confidence for all the samples should be low as intended.
- As we start training and the positive samples' distribution shifts towards zero, the cutoff point should slowly move, ideally converging at loss=2.5 if the unknown samples' distribution stays the same.
If we want to have more control on the cuttoff point, we could instead define the cuttof at loss=alpha*p_positive + (1-alpha)*p_unknown. But perhaps this adds unnecesary complexity.

Crazy approach

We finally could get crazy about the confidence estimate using anomaly detection stuff. Now we are learning the distribution of samples through the autoencoder but we are not enforcing any structure in the distribution---what we could do.
Some brainstorming:

We could use a normalizing flow to explicitly enforce a Gaussian distribution of positive samples. This does not solve the cutoff point though.
We could make the reconstruction task "contrastive" to explicitly bring positive samples close to zero and push the unknown ones away. Here we would invert the logic and we could keep a fixed cutoff and hope for the training to make the job of separating the distributions.

mmattamala mentioned this issue Feb 6, 2023

Confidence Latest measurment per image #231

Closed

1 task

mmattamala mentioned this issue Feb 23, 2024

Switching from dino to stego requires different hyper parameter #296

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Confidence score #242

Confidence score #242

JonasFrey96 commented Feb 4, 2023

JonasFrey96 commented Feb 16, 2024

JonasFrey96 commented Feb 17, 2024 •

edited

Loading

JonasFrey96 commented Feb 17, 2024

mmattamala commented Feb 17, 2024 •

edited

Loading

Confidence score #242

Confidence score #242

Comments

JonasFrey96 commented Feb 4, 2023

JonasFrey96 commented Feb 16, 2024

JonasFrey96 commented Feb 17, 2024 • edited Loading

JonasFrey96 commented Feb 17, 2024

mmattamala commented Feb 17, 2024 • edited Loading

Simple approach

Adaptive approach

Crazy approach

JonasFrey96 commented Feb 17, 2024 •

edited

Loading

mmattamala commented Feb 17, 2024 •

edited

Loading