I had the privilege of attending ICML 2019 with some colleagues last month, and I’ve started working through some of the papers that stood out to me. First on the docket: Combating Label Noise in Deep Learning Using Abstention. Key idea: When classifying
I was hitting some issues with the very first formula in the paper (I’m new to research if that wasn’t obvious), and I wanted to see how the authors coded it up.
The formula
Gets translated to
loss = (1. - p_out_abstain)*h_c - \ self.alpha_var*torch.log(1. - p_out_abstain)
This seems okay (log transform on the rightmost term but whatever), but what is h_c
?
h_c = F.cross_entropy(input_batch[:,0:-1],target_batch,reduce=False)
Huh? What happened to
Even after I was convinced, I had to chew on the proof for a bit. I revisited the problem after a long weekend off, and it’s pretty slick.
This means that
Pretty cool stuff, but definitely deserved a comment!
As mentioned above, it’s been a few years since I tried a proof (and to be honest, I don’t think I ever successfully did this much series - apologies to my Calc II teacher). Here are some tips I have for future me the next time I try this, maybe someone else can find them useful, too.
h_c
, I assumed it was a typo or mistake in the calculation. It wasn’t until I’d shown for myself that the trick works in practice that I could approach demonstrating why it worked