Submitted by myownskyW7 94 InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output · 27 authors 5
Submitted by msadat97 24 No Training, No Problem: Rethinking Classifier-Free Guidance for Diffusion Models · 4 authors 1
Submitted by sunshine-lwt 23 TokenPacker: Efficient Visual Projector for Multimodal LLM · 7 authors 276 4
Submitted by akhaliq 21 PicoAudio: Enabling Precise Timestamp and Frequency Controllability of Audio Events in Text-to-audio Generation · 4 authors 43 5
Submitted by AaronXyl 14 DisCo-Diff: Enhancing Continuous Diffusion Models with Discrete Latents · 5 authors 93 1
Submitted by chaoweihuang 11 Investigating Decoder-only Large Language Models for Speech-to-text Translation · 7 authors 1
Submitted by iliashum 9 A False Sense of Safety: Unsafe Information Leakage in 'Safe' AI Responses · 5 authors 1
Submitted by wzq016 8 Eliminating Position Bias of Language Models: A Mechanistic Approach · 9 authors 19 1