I had a bunch of fun writing about this one, mainly because it was a great excuse to highlight the excellent news about Kākāpō breeding season this year.
(I'm not just about pelicans.)
I'm not sure if I have the right mental model for a "skill". It's basically a context-management tool? Like a skill is a brief description of something, and if the model decides it wants the skill based on that description, then it pulls in the rest of whatever amorphous stuff the skill has, scripts, documents, what have you. Is this the right way to think about it?
If anyone wants to use skills with any other model or tool like Gemini CLI etc. I had created open-skills, which lets you use skills for any other llm.
Caveat: needs mac to run
Bonus: it runs it locally in a container, not on cloud nor directly on mac
1. Open-Skills: https://GitHub.com/BandarLabs/open-skills
From a purely technical view, skills are just an automated way to introduce user and system prompt stuffing into the context right? Not to belittle this, but rather that seems like a way of reducing the need for AI wrapper apps since most AI wrappers just do systematic user and system prompt stuffing + potentially RAG + potentially MCP.
This is nice, but that it goes into its vendor specific .codex/ folder is a bit of a drag.
I hope such things will be standardized across vendors. Now that they founded the Agentic AI Foundation (AAIF) and also contributed AGENTS.md, I would hope that skills become a logical extension of that.
https://www.linuxfoundation.org/press/linux-foundation-annou...
@simonw Thank you for always setting alt text in your images. I really appreciate it.
we just released Anthropic's Skills talk for those who want to find more info on the design thinking / capabilities: https://www.youtube.com/watch?v=CEvIs9y1uog&t=2s
I think the future is likely one that mixes the kitchen-sink style MCP resources with custom skills.
Services can provide an MCP-like layer that provides semantic definitions of everything you can do with said service (API + docs).
Skills can then be built that combine some subset of the 3rd party interfaces, some bespoke code, etc. and then surface these more context-focused skills to the LLM/agent.
Couldn’t we just use APIs?
Yes, but not every API is documented in the same way. An “MCP-like” registry might be the right abstraction for 3rd parties to expose their services in a semantic-first way.
> It took just over eleven minutes to produce this PDF,
Incredibly dumb question, but when they say this, what actually happens?
Is it using TeX? Is it producing output using the PDF file spec? Is there some print driver it's wired into?
Curious if anyone has applied this "Skills" mindset to how you build your tool calls for your LLM agents applications?
Say I have a CMS (I use a thin layer of Vercel AI SDK) and I want to let users interact with it via chat: tag a blog, add an entry, etc, should they be organized into discrete skill units like that? And how do we go about adding progressive discovery?
This is killing me with complexity. We had agents.md and were supposed to augment the context there. Now back to cursor rules and another md file to ingest.
Hasn’t ChatGPT been supporting skills with a different name for several months now through “agent”?
They gave it back then folders with instructions and executable files iirc
It is interesting that they are relying on visual reading for document ingestion instead of OCT. Recently I read an article which says Handwriting recognition has matured, and I'm beginning to think this is the approach they are takingwirh HAndwiting recognition.
Does this mean I can point to a code snippet and a link to the related documentation and the coding agent refer to it instead of writing "outdated" code?
Some frameworks/languages move really fast unfortunately.
Totally unrelated but what’s up with the word “quietly”? Its usage seems to have gone up 5000%, essentially overnight, as if there’s a contagion. You see the word in the New York Times, in government press releases, in blogs. ChatGPT 5.1 itself used the word in almost every single response, and no amount of custom instructions could get it to stop. That “Google Maps of London restaurants” article that’s going around not only uses the word in the headline, but also twice in the closing passage alone, for example. And now Simon, who’s an excellent writer with an assertive style, has started using it in his headlines. What’s the deal? Why have so many excellent writers from a wide range of subjects suddenly all adopted the same verbal tic? Are these writers even aware that they’re doing it?
It seems to me that skills are:
1. A top level agent/custom prompt
2. Subagents that the main agent knows about via short descriptions
3. Subagents have reference files
4. Subagents have scripts
Anthropic specific implementation:
1. Skills are defined in a filesystem in a /skills folder with a specific subfolder structure of /references and /scripts.
2. Mostly designed to be run via their CLI tool, although there's a clunky way of uploading them to the web interface via zip files.
I don't think the folder structure is a necessary part of skills. I predict that if we stop looking at that, we'll see a lot of "skills-like" implementations. The scripting part is only useful for people who need to run scripts, which, aside from the now built in document manipulating scripts, isn't most people.
For example, I've been testing out Gemini Enterprise for use by staff in various (non-technical) positions at my business.
It's got the best implementation of a "skills-like" agent tool I've seen. Basically a visual tree builder, currently only one level deep. So I've set up the "<my company name> agent" and then it has subagents/skills for thing like marketing/supply chain research/sysadmin/translation etc., each with a separate description, prompt, and knowledge base, although no custom scripts.
Unfortunately, everything else about Gemini Enterprise screams "early alpha, why the hell are you selling this as an actual finished product?".
For example, after I put half a day into setting up an agent and subagents, then went to share this with the other people helping me to test it, I found that... I can't. Literally no way to share agents in a tool that is supposedly for teams to use. I found one of the devs saying that sharing agents would be released in "about two weeks". That was two months ago.
Mini rant over... But my point is that skills are just "agents + auto-selecting sub-agents via a short description" and we'll see this pattern everywhere soon. Claude Skills have some additional sandboxing but that's mostly only interesting for coders.
Something important to keep in mind is the way skills work shouldn't be assumed to be the same and work in the same way.
Welcome to the world of imitation of value and semantics.
Can or should skills be used for managing the documentation of dependencies in a project and the expertise in them?
I’ve been playing with doing this but kind of doesn’t feel the most natural fit.
[dead]
It’s impressive how every iteration tries to get further from pretending actual AGI would be anywhere close when we are basically writing library functions with the worst DSL known to man, markdown-with-english.
[dead]
It’s crazy how Anthropic keeps coming up with sticky “so simple it seems obvious” product innovations and OpenAI plays catch up. MCP is barely a protocol. Skills are just md files. But they seem to have a knack for framing things in a way that just makes sense.