meta-private

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

akhaliq authored a paper about 2 months ago

BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution

mortimerp9 authored a paper 2 months ago

ARE: Scaling Up Agent Environments and Evaluations

elbayadm authored a paper 12 months ago

Token-level and sequence-level loss smoothing for RNN language models

View all activity

akhaliq

authored a paper about 2 months ago

BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution

Paper • 2510.08697 • Published Oct 9 • 35

mortimerp9

authored a paper 2 months ago

ARE: Scaling Up Agent Environments and Evaluations

Paper • 2509.17158 • Published Sep 21 • 35

akhaliq

posted an update 12 months ago

Post

45978

Google drops Gemini 2.0 Flash Thinking

a new experimental model that unlocks stronger reasoning capabilities and shows its thoughts. The model plans (with thoughts visible), can solve complex problems with Flash speeds, and more

now available in anychat, try it out: https://huggingface.co/spaces/akhaliq/anychat

5 replies

elbayadm

authored 4 papers 12 months ago

Token-level and sequence-level loss smoothing for RNN language models

Paper • 1805.05062 • Published May 14, 2018

Efficient Wait-k Models for Simultaneous Machine Translation

Paper • 2005.08595 • Published May 18, 2020

Added Toxicity Mitigation at Inference Time for Multimodal and Massively Multilingual Translation

Paper • 2311.06532 • Published Nov 11, 2023

Large Concept Models: Language Modeling in a Sentence Representation Space

Paper • 2412.08821 • Published Dec 11, 2024 • 17

akhaliq

posted an update about 1 year ago

Post

45151

QwQ-32B-Preview is now available in anychat

A reasoning model that is competitive with OpenAI o1-mini and o1-preview

try it out: https://huggingface.co/spaces/akhaliq/anychat

1 reply

akhaliq

posted an update about 1 year ago

Post

5018

New model drop in anychat

allenai/Llama-3.1-Tulu-3-8B is now available

try it here: https://huggingface.co/spaces/akhaliq/anychat

akhaliq

posted an update about 1 year ago

Post

3787

anychat

supports chatgpt, gemini, perplexity, claude, meta llama, grok all in one app

try it out there: https://huggingface.co/spaces/akhaliq/anychat

sravyapopuri388

authored a paper about 1 year ago

Law of the Weakest Link: Cross Capabilities of Large Language Models

Paper • 2409.19951 • Published Sep 30, 2024 • 54

sravyapopuri388

authored a paper over 1 year ago

Investigating Decoder-only Large Language Models for Speech-to-text Translation

Paper • 2407.03169 • Published Jul 3, 2024 • 11

mavlyutovrus

authored a paper over 1 year ago

Investigating Decoder-only Large Language Models for Speech-to-text Translation

Paper • 2407.03169 • Published Jul 3, 2024 • 11

hygong

authored a paper over 1 year ago

Investigating Decoder-only Large Language Models for Speech-to-text Translation

Paper • 2407.03169 • Published Jul 3, 2024 • 11

akhaliq

posted an update over 1 year ago

Post

21187

Phased Consistency Model

Phased Consistency Model (2405.18407)

The consistency model (CM) has recently made significant progress in accelerating the generation of diffusion models. However, its application to high-resolution, text-conditioned image generation in the latent space (a.k.a., LCM) remains unsatisfactory. In this paper, we identify three key flaws in the current design of LCM. We investigate the reasons behind these limitations and propose the Phased Consistency Model (PCM), which generalizes the design space and addresses all identified limitations. Our evaluations demonstrate that PCM significantly outperforms LCM across 1--16 step generation settings. While PCM is specifically designed for multi-step refinement, it achieves even superior or comparable 1-step generation results to previously state-of-the-art specifically designed 1-step methods. Furthermore, we show that PCM's methodology is versatile and applicable to video generation, enabling us to train the state-of-the-art few-step text-to-video generator.

radames

posted an update over 1 year ago

Post

7853

Thanks to @OzzyGT for pushing the new Anyline preprocessor to https://github.com/huggingface/controlnet_aux. Now you can use the TheMistoAI/MistoLine ControlNet with Diffusers completely.

Here's a demo for you: radames/MistoLine-ControlNet-demo
Super resolution version: radames/Enhance-This-HiDiffusion-SDXL

from controlnet_aux import AnylineDetector

anyline = AnylineDetector.from_pretrained(
    "TheMistoAI/MistoLine", filename="MTEED.pth", subfolder="Anyline"
).to("cuda")

source = Image.open("source.png")
result = anyline(source, detect_resolution=1280)

akhaliq

posted an update over 1 year ago

Post

21380

Chameleon

Mixed-Modal Early-Fusion Foundation Models

Chameleon: Mixed-Modal Early-Fusion Foundation Models (2405.09818)

We present Chameleon, a family of early-fusion token-based mixed-modal models capable of understanding and generating images and text in any arbitrary sequence. We outline a stable training approach from inception, an alignment recipe, and an architectural parameterization tailored for the early-fusion, token-based, mixed-modal setting. The models are evaluated on a comprehensive range of tasks, including visual question answering, image captioning, text generation, image generation, and long-form mixed modal generation. Chameleon demonstrates broad and general capabilities, including state-of-the-art performance in image captioning tasks, outperforms Llama-2 in text-only tasks while being competitive with models such as Mixtral 8x7B and Gemini-Pro, and performs non-trivial image generation, all in a single model. It also matches or exceeds the performance of much larger models, including Gemini Pro and GPT-4V, according to human judgments on a new long-form mixed-modal generation evaluation, where either the prompt or outputs contain mixed sequences of both images and text. Chameleon marks a significant step forward in a unified modeling of full multimodal documents.

radames

posted an update over 1 year ago

Post

7052

At Google I/O 2024, we're collaborating with the Google Visual Blocks team (https://visualblocks.withgoogle.com) to release custom Hugging Face nodes. Visual Blocks for ML is a browser-based tool that allows users to create machine learning pipelines using a visual interface. We're launching nodes with Transformers.js, running models on the browser, as well as server-side nodes running Transformers pipeline tasks and LLMs using our hosted inference. With @Xenova @JasonMayes

You can learn more about it here https://huggingface.co/blog/radames/hugging-face-google-visual-blocks

Source-code for the custom nodes:
https://github.com/huggingface/visual-blocks-custom-components

radames

posted an update over 1 year ago

Post

2165

AI-town now runs on Hugging Face Spaces with our API for LLMs and embeddings, including the open-source Convex backend, all in one container. Easy to duplicate and config on your own

Demo: radames/ai-town
Instructions: https://github.com/radames/ai-town-huggingface

9 replies

radames

posted an update over 1 year ago

Post

2654

HiDiffusion SDXL now supports Image-to-Image, so I've created an "Enhance This" version using the latest ControlNet Line Art model called MistoLine. It's faster than DemoFusion

Demo: radames/Enhance-This-HiDiffusion-SDXL

Older version based on DemoFusion radames/Enhance-This-DemoFusion-SDXL

New Controlnet SDXL Controls Every Line TheMistoAI/MistoLine

HiDiffusion is compatible with diffusers and support many SD models - https://github.com/megvii-research/HiDiffusion

1 reply

AI & ML interests

Recent Activity

Team members 29

meta-private's activity