FYI, they added a lot more formats to the list after that.
Preferred
1. Platform-independent, character-based formats are preferred over native or binary formats as long as data is complete, and retains full detail and precision. Preferred formats include well-developed, widely adopted, de facto marketplace standards, e.g.
a. Formats using well known schemas with public validation tool available
b. Line-oriented, e.g. TSV, CSV, fixed-width
c. Platform-independent open formats, e.g. .db, .db3, .sqlite, .sqlite3
2. Any proprietary format that is a de facto standard for a profession or supported by multiple tools (e.g. Excel .xls or .xlsx, Shapefile)
3. Character Encoding, in descending order of preference:
a. UTF-8, UTF-16 (with BOM),
b. US-ASCII or ISO 8859-1
c. Other named encoding
---
Acceptable
For data (in order of preference):
1. Non-proprietary, publicly documented formats endorsed as standards by a professional community or government agency, e.g. CDF, HDF
2. Text-based data formats with available schema
For aggregation or transfer:
1. ZIP, RAR, tar, 7z with no encryption, password or other protection mechanisms.
https://www.loc.gov/preservation/resources/rfs/data.html
.7z being there just discredits the entire process. The underlying compression algorithm is a free-hand one and can be anything[0], or contain bugs and exploits[1]. Personally I use only zstd with .7z which is 'non-standard' by the official (Russian) release.
[0]: https://7-zip.org/7z.html
[1]: CVE-2025-0411