Why has CI for open-source projects become so difficult to secure? Where did we, collectively, go wrong?
I suppose, it's probably some combination of: CI is configured in-band in the repo, PRs are potentially untrusted, CI uses the latest state of config on a potentially untrusted branch, we still want CI on untrusted branches, CI needs to run arbitrary code, CI has access to secrets and privileged operations.
Maybe it's too many degrees-of-freedom creating too much surface area. Maybe we could get by with a much more limited subset, at least by default.
I've been doing CI stuff in my last two day jobs. In contrast, we worked only on private repos with private collaborators, and we explicitly designated CI as trusted.
It's a web of danger for sure. Configuring CI in-repo is popular (especially in the Gitlab world) and it's admittedly a low-friction way to at least get people to use config control for CI (or use CI for builds at all). I think the number of degrees of freedom is really a footgun.
I remember early Gitlab runner use when I had a (seemingly) standard build for a docker image. There wasn't any obvious standard way to do that. There were recommendations for dind, just giving shell access, etc. There's so much customization that it's hard to decide what's safe for a protected/main branch vs. user branches.
I don't have a solution. But I think it would be better if, by default, CI engines were a lot less configurable and forced users to adjust their repo and build to match some standard configurations, like:
- Run `make` in a Debian docker image and extract this binary file/.deb after installing some apt packages
- Run docker build . and push the image somewhere
- Run go build in a standard golang container
And really made you dance a little more to do things like "just run this bash script in the repo". Restrict those kinds of builds to protected branches/special setups.
Having the CI config in the same source control tree is dangerous and hard to secure. It would probably be better to have some kind of headless branch like Github pages that is just for CI config.
I think usually people like to blame GitHub Action's design, but this repository here seems to have not done the bare minimum in securing itself and more focused on producing a "state-of-the-art (SOTA)" "YOLO" model instead.
There are just a lot of things wrong with just format.yml itself. It honestly seems kind of weird that it needs commit access to push a new commit under the PR author's name/email just to format their code. I personally would find this kind of rude if I'm the PR author as I sign all my Git commits and a bot masquerading as me in submitting a Git commit is not appreciated even for something like code formatting. And of course the author of format.yml didn't seem to know the different between `pull_request` and `pull_request_target` and just threw both in.
I also think these days people go way overboard in CI/CD because things that are automated are obviously better right? I personally do not like any CI pipeline that has the capability to directly commit to the main Git branch without review/signoff (which [this commit](https://github.com/ultralytics/ultralytics/commit/cb260c243f...) did which removed the author check). Things like deploying to PyPI should be more than just a single commit and involves a human. Yes, it introduces a piece of friction to the process, but if you are maintaining a big piece of open source software, a release you made is going to be deployed to lots of people's computers so a little bit of annoyance on the maintainer's side is a small price to pay to make sure you get everything right.
I guess I'm weird. I maintain an OSS macOS app and I see other similar apps just upload their private signing keys to GitHub and just let the CI sign everything for them but I still sign my releases offline and never upload my keys to a public service.
What I'm saying is I don't think we want CI to do everything for us, especially for powerful actions (e.g. making a release) that do not need human approval, and if you do, you should think really hard about whether that's something desired and whether you want to spend the extra mental energy to think about all the security ramifications etc which might just offset the little bits of time you saved.
GitHub doesn't really seem to prioritise security. I just reported a nasty way to smuggle code[0] into Actions pipelines to them and got a classic "expected behaviour WONTFIX” response. It's exactly the kind of sneaky behaviour that the Jia Tans out there would use in an attack.
[0] (see end of) https://cedwards.xyz/github-actions-are-an-impending-securit...
> Maybe it's too many degrees-of-freedom creating too much surface area.
I think this is essentially it: there's extraordinary demand for "publicly dispatchable and yet safe" CI/CD, despite those requirements being fundamentally in tension with each other.
All things considered, I don't think GitHub has done the worst job here: the security model for GitHub Actions is mostly intuitive, so long as you stick to triggers like `push`, `pull_request`, etc. The problems only really begin when people begin to use triggers that (IMO) GitHub should never have added in the first place, like `pull_request_target` -- those triggers break the basic "in repo privileged, out of repo unprivileged" security assumption and cause the kinds of problems we're seeing here.