The checks here seem pretty minimal[1]. I'd recommend taking a look at fickling (FD: former employer) for a more general approach to pickle decompilation/analysis[2].
[1]: https://github.com/Lab700xOrg/aisbom/blob/main/aisbom/safety...
When dealing with stuff like php serialization and pickle, the rule is simple: never unpickle anything you didn't pickle yourself. If anything else could possibly touch the serialized bytes, sign it with HMAC and keep that somewhere untouchable.
I somehow doubt this tool is going to be able to pull off what Java bytecode verification could not.
Ok, as others noted, the tool in question is hardly a solution, but what is, then? I mean, presented like that, it's pretty crazy that everyone just downloads and runs 5GB executable blobs from Hugging Face. Does anyone review them somehow before they are accepted and gather 10K downloads on HF? Or is it really another totally mindbogglingy crazy thing that happens right now across all world, and everybody just shrugs and waits for catastrophic breach of security of planetary scale to happen?
Don't love how ChatGPT the readme is, the bullet points under "Why AIsbom?" are very, very ChatGPT.
Was there any research into prior art? Recently did some research into this space and it seems like there are already a bunch of off the shelf open source projects for address this
Good job. Pickle has no place in production. Yeah, I said it.
Hi HN,
I’ve been working with ML infrastructure for a while and realized there’s a gap in the security posture: we scan our requirements.txt for vulnerabilities, but blindly trust the 5GB binary model files (.pt) we download from Hugging Face.
Most developers don't realize that standard PyTorch files are just Zip archives containing Python Pickle bytecode. When you run torch.load(), the unpickler executes that bytecode. This allows for arbitrary code execution (RCE) inside the model file itself - what security researchers call a "Pickle Bomb."
I built AIsbom (AI Software Bill of Materials) to solve this without needing a full sandbox.
How it works: 1. It inspects the binary structure of artifacts (PyTorch, Pickle, Safetensors) without loading weights into RAM. 2. For PyTorch/Pickles, it uses static analysis (via pickletools) to disassemble the opcode stream. 3. It looks for GLOBAL or STACK_GLOBAL instructions referencing dangerous modules like os.system, subprocess, or socket. 4. It outputs a CycloneDX v1.6 JSON SBOM compatible with enterprise tools like Dependency-Track. 5. It also parses .safetensors headers to flag "Non-Commercial" (CC-BY-NC) licenses, which often slip into production undetected.
It’s open source (Apache 2.0) and written in Python/Typer. Repo: https://github.com/Lab700xOrg/aisbom Live Demo (Web Viewer): https://aisbom.io
Why I built a scanner? https://dev.to/labdev_c81554ba3d4ae28317/pytorch-models-are-...
I’d love feedback on the detection logic (specifically safety.py) or if anyone has edge cases of weird Pickle protocols that break the disassembler.