527 |
Three areas where Google Search lags behind competitors: code, cooking, travel |
2022-04-13 |
470 |
Is Google Search Deteriorating? Measuring Google's Search Quality in 2022 |
2022-01-11 |
334 |
30% of Google's Emotions Dataset Is Mislabeled |
2022-07-14 |
222 |
Evaluation of TikTok vs. Instagram Reels |
2022-09-02 |
212 |
Building a no-code toxicity classifier by talking to GitHub Copilot |
2022-03-25 |
183 |
Are popular toxicity models simply profanity detectors? |
2022-01-25 |
138 |
Generating Children’s Stories Using GPT-3 and DALL·E |
2022-06-29 |
138 |
We asked 100 humans to draw the DALL·E prompts |
2022-05-13 |
49 |
HellaSwag: 36% of this popular large language model benchmark contains errors |
2022-12-06 |
25 |
I wanted burritos. Facebook Search sent me to a dead restaurant 45m away |
2022-06-16 |
25 |
We Evaluated ChatGPT vs. Google on 500 Search Queries |
2022-12-26 |
19 |
Examples of the Importance of Context-Sensitivity in Data-Centric AI |
2021-11-23 |
15 |
Twitter’s Egregious Content Moderation Failures |
2022-11-10 |
13 |
Move Over, Google: The TikTokification of Next-Gen Search |
2022-10-26 |
13 |
The average number of ads on a Google Search recipe? 8.7 |
2022-04-29 |
13 |
DALL·E vs. Imagen, and Evaluating Astral Codex Ten's Bet on AI Progress |
2022-09-30 |
12 |
What if social media optimized for human values? A Facebook case study |
2022-02-11 |
11 |
Explaining Reinforcement Learning with Human Feedback (RLHF) |
2023-01-05 |
11 |
The $250K Inverse Scaling Prize and Human-AI Alignment |
2022-09-28 |
10 |
An Analysis of Omicron Tweets: 30% Are Skeptical of the Medical Establishment |
2022-01-21 |
10 |
How Good is Hugging Face's BLOOM? Human Evaluation of Large Language Models |
2022-07-21 |
10 |
Are the Spammers Winning? Failures in Gmail Spam Detection |
2022-05-24 |
10 |
We measured the percentage of Spammy Twitter users |
2022-05-18 |
9 |
AI Red Teams for Adversarial Training: Making ChatGPT and LLMs More Robust |
2022-12-13 |
9 |
Writing a Super Bowl Worthy Commercial with GPT-3 |
2022-02-16 |
9 |
Inter-Annotator Agreement: An Introduction to Krippendorff’s Alpha |
2022-01-06 |
7 |
Optimizing Facebook's Algorithms for Human Values Instead of Clicks |
2022-07-29 |
7 |
How Good Is Your Chatbot? An Introduction to Perplexity in NLP |
2021-12-10 |
6 |
Understanding Cohen's Kappa in Machine Learning |
2021-12-01 |
6 |
An Introduction to Language Models in NLP (Part 1: Intuition) |
2021-11-11 |
5 |
Building Better Developer Search: How Neeva Measures Search Quality |
2022-07-07 |
4 |
How We Built It: OpenAI's GSM8K Dataset of 8,500 Math Problems |
2022-06-15 |
3 |
Humans vs. Gary Marcus: The Complexity of Measuring Machine Intelligence |
2022-06-23 |
2 |
How TikTok Is Evolving the Next Generation of Search |
2022-11-01 |
2 |
Sentiment Analysis Dataset of Social Media Stock Conversations |
2022-06-10 |
1 |
The Obscenity List |
2022-01-18 |