Preprint / Version 1

Lossy Compression of LLM Weights in Safetensors Format

##article.authors##

  • Aaron Pinto High Schooler

DOI:

https://doi.org/10.58445/rars.3284

Keywords:

lossy compression, pysz, deepseek, huggingface, gpt, gemma, compression, safetensors, artificial intelligence, machine learning, large language model

Abstract

Large language models (LLMs) such as DeepSeek and Google’s Gemma require hundreds of gigabytes of storage when shared via Hugging Face. These models are typically split into many safetensor files--secure, fast tensor storage formats--each often 4-5 GB in size. Downloading or distributing such massive models is time-consuming. Common solutions like 8-bit or 4-bit quantization reduce size by lowering precision, but may still yield sizable files or require retraining. In contrast, error-bounded scientific compressors (for example, SZ) can achieve much higher compression ratios with controlled error. This work applies the Python SZ implementation (PySZ) to a DeepSeek-V3 safetensor shard. A compression ratio of approximately 13.33× was obtained (from ~4.4 GB to ~0.33 GB) while successfully reconstructing the tensor file for potential model use. Methodology, compression statistics, and implications for model fidelity versus quantization are described.

References

Hugging Face. safetensors — Hugging Face Documentation. Hugging Face, n.d. Web.

DeepSeek AI. DeepSeek-V3-0324 — Model Files. Hugging Face Model Hub, 2024. Web.

Google. google/gemma-3-27b-it — Model Files. Hugging Face Model Hub, 2024. Web.

PySZ. pysz — Python Package Index (PyPI). Python Software Foundation, 2024. Web.

Lim, Seung Moo, and Seunghyeon W. Jin. “Neural Network Compression Using Error-Bounded Lossy Compression Techniques.” Electronics, vol. 11, no. 6, 2022, article 858. Web.

Hugging Face. Quantization and bitsandbytes — Transformers Documentation. Hugging Face, 2024. Web.

ZipNN. ZipNN: Lossless Compression for AI Models (GitHub). ZipNN Project, 2024. Web.

Downloads

Posted

2025-10-19