Company
Date Published
Jan. 22, 2025
Author
Matt Mastracci, Michael J. Sullivan
Word count
2941
Language
English
Hacker News points
322

Summary

The authors of a Rust-based project were experiencing intermittent crashes on their ARM64 CI runners when running tests for an HTTP fetch feature. The crash was not reproducible locally, but was observed in the CI environment. After investigating, they discovered that the issue was caused by a race condition between threads accessing the same memory location while using `libc` functions like `getenv`. Specifically, the `setenv` function would move the environment block to a new location, causing the `getenv` function to access an invalid memory address when called concurrently. The authors were able to reproduce the issue by analyzing the assembly code and disassembling the `getenv` function. They eventually discovered that the problem was caused by the use of `rust-native-tls`'s `openssl-probe` which set the environment variables, but this was not thread-safe. To fix the issue, they decided to migrate away from `reqwest`'s `rust-native-tls` backend to `rustls` on Linux, or hold the Global Interpreter Lock (GIL) while calling `try_init_ssl_cert_env_vars`. The Rust project has already identified this as an issue and plans to make the environment-setter functions unsafe in the 2024 edition. Additionally, the glibc project has recently added more thread-safety to `getenv` by avoiding the `realloc` and leaking the older environments.