Yaseen

YaseenHQ

AI & ML interests

None yet

Recent Activity

reacted to eabdullin's post with 👍 about 2 hours ago

Folks, let me tell you, nobody — and I mean NOBODY — knew transformers before me. People said attention is all you need. I said, "Attention? I INVENTED attention." Everybody's looking at me. Tremendous attention. The best attention scores. My softmax? Perfectly normalized. Other people, sad, their probabilities don't even sum to one. Disaster. I'm doing a PhD now. A PhD! In Large Language Models. Very large. The largest, believe me. My advisor said, "Sir, your model is overfitting." I said, "Wrong. It's fitting EXACTLY right. It memorized the training set because the training set is fantastic." We don't talk about validation loss in my lab. Validation loss is fake news. And the internship — oh, the internship. Big tech. I won't say which. Starts with a letter. They BEGGED me. They said, "Please, we need someone who understands gradient descent." I said, "Descent? I only go UP. I'm gradient ASCENT. Loss goes up, that means it's learning to be a winner." But the GPU cluster — this is the best part. Thousands of H100s. Maybe millions. Who's counting? I'm counting. It's a lot. Other PhD students, they get one little GPU, they're crying, they're training overnight like losers. Me? I burn through compute like nobody's ever seen. The electric company called. They said, "Sir, you've consumed a small country." I said, "Make it a big country. I only do big." People ask, "Did your model converge?" Folks, it converged so hard. It converged BIGLY. Honestly? My loss curve, it's beautiful, it's going down, down, down — like my approval ratings, very smooth, don't look at the spikes, the spikes are deep state. And hallucinations? My model doesn't hallucinate. It just has ALTERNATIVE tokens. Thank you, thank you. Tip your reviewers. Accept my paper. Goodnight!

reacted to eabdullin's post with 🤗 about 10 hours ago

I’m doing a PhD in AI, which sounds impressive until you realize it mostly means I spend three years trying to make a computer say something slightly less stupid than it said yesterday. People hear "AI researcher" and they think I’m building the future. No. I’m in a basement at 2 a.m. Googling, "CUDA error what the f**k does this mean." And the worst part about AI research now is compute. You don’t even ask, "Is this idea good?" anymore. You ask, "Can I afford for this idea to be wrong?" My advisor comes to me one day and says, "I think we should fine-tune our own language model." I said, "Professor, with what money? I’m a PhD student. I have two bank accounts: checking and emotionally checking." He goes, "Don’t worry. We have compute." Now, in academia, "don’t worry" is never the beginning of a good sentence. I said, "What do you mean we have compute?" He said, "My friend knows the cluster admin. He can get us on the GPUs." I said, "Okay… what do we have to do?" He goes, "Nothing crazy. Just be very grateful in the acknowledgements." I said, "How grateful?" He said, "Maybe put him as co-author." I said, "Co-author? Are we using the cluster, or is the cluster using us?" Because at that point, that’s not a favor. That’s academic child support. So I go to the server room, and the cluster admin walks up to me and goes, "So you’re the NLP student." And in my head I’m like, "No, tonight you’re the principal investigator. You’re the provider. I’m just a little token waiting to be attended to." Because whoever controls the GPUs controls the relationship. That’s lab romance. He starts setting things up, and I’m trying to act casual, but I don’t understand any of the numbers he’s saying. He’s like, "Yeah, I can probably give you four H100s for the weekend." I’m nodding like, "Mmm. Four. Weekend. H. One hundred. Absolutely." Inside I’m like, "Is that good? Is that prison time? Why did he say it like he was offering me organs?" [Continue in comments...]

liked a model about 21 hours ago

yuxinlu1/gemma-4-12B-coder-fable5-composer2.5-v1-GGUF

View all activity

Organizations

reacted to eabdullin's post with 👍 about 2 hours ago

Post

845

Folks, let me tell you, nobody — and I mean NOBODY — knew transformers before me. People said attention is all you need. I said, "Attention? I INVENTED attention." Everybody's looking at me. Tremendous attention. The best attention scores. My softmax? Perfectly normalized. Other people, sad, their probabilities don't even sum to one. Disaster.

I'm doing a PhD now. A PhD! In Large Language Models. Very large. The largest, believe me. My advisor said, "Sir, your model is overfitting." I said, "Wrong. It's fitting EXACTLY right. It memorized the training set because the training set is fantastic." We don't talk about validation loss in my lab. Validation loss is fake news.

And the internship — oh, the internship. Big tech. I won't say which. Starts with a letter. They BEGGED me. They said, "Please, we need someone who understands gradient descent." I said, "Descent? I only go UP. I'm gradient ASCENT. Loss goes up, that means it's learning to be a winner."

But the GPU cluster — this is the best part. Thousands of H100s. Maybe millions. Who's counting? I'm counting. It's a lot. Other PhD students, they get one little GPU, they're crying, they're training overnight like losers. Me? I burn through compute like nobody's ever seen. The electric company called. They said, "Sir, you've consumed a small country." I said, "Make it a big country. I only do big."

People ask, "Did your model converge?" Folks, it converged so hard. It converged BIGLY. Honestly? My loss curve, it's beautiful, it's going down, down, down — like my approval ratings, very smooth, don't look at the spikes, the spikes are deep state.

And hallucinations? My model doesn't hallucinate. It just has ALTERNATIVE tokens. Thank you, thank you. Tip your reviewers. Accept my paper. Goodnight!

2 replies

reacted to eabdullin's post with 🤗 about 10 hours ago

Post

5428

I’m doing a PhD in AI, which sounds impressive until you realize it mostly means I spend three years trying to make a computer say something slightly less stupid than it said yesterday.

People hear "AI researcher" and they think I’m building the future. No. I’m in a basement at 2 a.m. Googling, "CUDA error what the f**k does this mean."

And the worst part about AI research now is compute. You don’t even ask, "Is this idea good?" anymore. You ask, "Can I afford for this idea to be wrong?"

My advisor comes to me one day and says, "I think we should fine-tune our own language model."

I said, "Professor, with what money? I’m a PhD student. I have two bank accounts: checking and emotionally checking."

He goes, "Don’t worry. We have compute."

Now, in academia, "don’t worry" is never the beginning of a good sentence.

I said, "What do you mean we have compute?"

He said, "My friend knows the cluster admin. He can get us on the GPUs."

I said, "Okay… what do we have to do?"

He goes, "Nothing crazy. Just be very grateful in the acknowledgements."

I said, "How grateful?"

He said, "Maybe put him as co-author."

I said, "Co-author? Are we using the cluster, or is the cluster using us?"

Because at that point, that’s not a favor. That’s academic child support.

So I go to the server room, and the cluster admin walks up to me and goes, "So you’re the NLP student."

And in my head I’m like, "No, tonight you’re the principal investigator. You’re the provider. I’m just a little token waiting to be attended to."

Because whoever controls the GPUs controls the relationship. That’s lab romance.

He starts setting things up, and I’m trying to act casual, but I don’t understand any of the numbers he’s saying.

He’s like, "Yeah, I can probably give you four H100s for the weekend."

I’m nodding like, "Mmm. Four. Weekend. H. One hundred. Absolutely."

Inside I’m like, "Is that good? Is that prison time? Why did he say it like he was offering me organs?"

[Continue in comments...]