586 |
Uncensor any LLM with abliteration |
2024-06-13 |
240 |
Microsoft Phi-2 model changes licence to MIT |
2024-01-06 |
197 |
Space secrets leak disclosure |
2024-06-01 |
181 |
Best 7B LLM on leaderboards made by an amateur following a medium tutorial |
2024-01-05 |
168 |
Llama 3 8B is almost as good as Wizard 2 8x22B |
2024-04-19 |
167 |
Nvidia releases NVLM 1.0 72B open weight model |
2024-10-02 |
163 |
Explaining the SDXL Latent Space |
2024-02-05 |
152 |
Hugging Face and Google partner for AI collaboration |
2024-01-25 |
131 |
A CC-By Open-Source TTS Model with Voice Cloning |
2024-11-04 |
127 |
FineWeb: Decanting the web for the finest text data at scale |
2024-06-02 |
103 |
HuggingChat: Chat with Open Source Models |
2024-02-21 |
95 |
More than 80 AI models from Qualcomm |
2024-02-28 |
94 |
LLaMA-Pro-8B |
2024-01-06 |
82 |
Apple/OpenELM: Efficient Open-Source Family Language Models |
2024-04-24 |
75 |
YouTube-Commons: Audio transcripts of 2,063,066 YouTube videos, CC-By license |
2024-04-18 |
66 |
Show HN: Simply Reading Analog Gauges – GPT4, CogVLM Can't |
2024-01-22 |
58 |
MSFT's WizardLM2 models have been taken down |
2024-04-16 |
54 |
LiteLlama-460M-1T has 460M parameters trained with 1T tokens |
2024-01-07 |
52 |
Fine-Tuning LLMs to 1.58bit |
2024-09-18 |
51 |
LLaMA 3 70B Llamafiles |
2024-04-19 |
47 |
Improving Parquet Dedupe on Hugging Face Hub |
2024-10-08 |
46 |
Open-LLM performances are plateauing |
2024-06-29 |
33 |
Mixtral-8x22B on HuggingFace |
2024-04-10 |
31 |
General OCR Theory: Towards OCR-2.0 via a Unified End-to-End Model |
2024-09-11 |
30 |
Zephyr 141B, a Mixtral 8x22B fine-tune, is now available in Hugging Chat |
2024-04-12 |
30 |
OpenFLUX.1 |
2024-10-04 |
29 |
Mistral 7B v0.2 |
2024-03-31 |
28 |
Video2Game: Real-Time, Interactive, Realistic Environment from a Single Video |
2024-04-16 |
26 |
Llama-3.2-3B-Instruct-uncensored |
2024-09-27 |
26 |
Llama can now see and run on your device – welcome Llama 3.2 |
2024-09-25 |
25 |
New Phi-3.5 Models from Microsoft, including new MoE |
2024-08-20 |
25 |
LLM: Transformer Is Linear |
2024-05-24 |
425 |
Llama-3.3-70B-Instruct |
2024-12-06 |
348 |
A Replacement for BERT |
2024-12-19 |
48 |
DeepSeek v3 beats Claude sonnet 3.5 and way cheaper |
2024-12-26 |
52 |
Train faster static embedding models with sentence transformers |
2025-01-15 |
394 |
Open-R1: an open reproduction of DeepSeek-R1 |
2025-01-28 |
227 |
Kokoro WebGPU: Real-time text-to-speech 100% locally in the browser |
2025-02-07 |
49 |
Janus-Pro: Autoregressive framework unifying multimodal understanding&generation |
2025-01-27 |
39 |
DeepSeek-R1-Distill-Qwen-1.5B Surpasses GPT-4o in certain benchmarks |
2025-01-20 |
38 |
Fully autonomous AI agents should not be developed |
2025-02-07 |
33 |
The Ultra-Scale Playbook: Training LLMs on GPU Clusters |
2025-02-19 |