The Align AI Research Review discusses recent research on using large language models (LLMs) as reference-free metrics for evaluating natural language generation (NLG). Microsoft Cognitive Services Research team proposed the G-EVAL framework, which employs complex LLMs with chain-of-thoughts and a form-filling strategy to evaluate NLG outputs. The G-EVAL framework consists of three main elements: prompt specification, chain-of-thought instructions, and a scoring function. An experiment was conducted to investigate whether LLMs exhibit a preference for their own outputs over human-written summaries. The findings suggest that while LLMs offer efficiency in handling large data volumes, they are not yet reliable enough to be the sole evaluators. Align AI recommends a hybrid approach integrating LLMs' computational power with human expert judgment for effective evaluation strategies.