LookThem V8

I was courious, what if a token look at other tokens, but without QKV? Instead, it's like make transformation for two tokens (current token and another token), then divide them. Let's say current token is token A and another token is token B. It's divide like "transformA(A) / transformB(B)" which the transform is a linear NN. With tanh for normalizing (to make it don't explode). And, the reverse ("transformA(B) / transformB(A)"). Then, the result of "transformA(A) / transformB(B)" multiply with A, and the reverse multiply with B. Then add them, then divide by 2. That's the new number for that interaction. Add to temp variable. Loop again for another token interaction (but for the code it's vectorized). Then, that variable averaged. That's the new A.

Result:

  • Training ~14.5 eppch with scheduler's T_MAX=20, train accuracy: Add soon hopefully
  • Test accuracy: 49.72%

Architectural change compared to V7.6, including:

  • Reducing layer in stream A and B
  • Earlier stride in stream B
  • Add Transformer-like stack
  • Add clamp to LookThemLayer in division part
  • Reducing file size
  • Training stability improved (maybe)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train ASomeoneWhoInterestedWithAI/LookThem_V8-ImageNet100

Space using ASomeoneWhoInterestedWithAI/LookThem_V8-ImageNet100 1

Collection including ASomeoneWhoInterestedWithAI/LookThem_V8-ImageNet100