logoalt Hacker News

A Social Filesystem

400 pointsby icyyesterday at 8:18 AM163 commentsview on HN

Comments

swyxyesterday at 10:24 PM

> Apps may come and go, but files stay—at least, as long as our apps think in files.

yes: https://www.swyx.io/data-outlasts-code-but

all lasting work is done in files/data (can be parsed permissionlessly, still useful if partially corrupted), but economic incentives keep pushing us to keep things in code (brittle, dies basically when one of maintainer|buildtools|hardware substrate dies).

when standards emerge (forcing code to accept/emit data) that is worth so much to a civilization. a developer ecosystem tipping the incentive scales such that companies like the Googl/Msft/OpenAI/Anthropics of the world WANT to contribute/participate in data standards rather than keep things proprietary is one of the most powerful levers we as a developer community collectively hold.

(At the same time we shoudl also watch out for companies extending/embracing/extinguishing standards... although honestly outside of Chrome I struggle to think of a truly successful example)

show 1 reply
theturtletalksyesterday at 9:41 PM

POSSE and AT Protocol can be understood as interoperable marketplaces. Platforms like Reddit and Instagram already function this way: the product is user content, the payment is attention, and the platform’s cut is ads or behavioral data. Dan argues that this structure is not inevitable. If social data is treated as something people own and store themselves, applications stop being the owners of social graphs and become interfaces that read from user-controlled data instead.

I am working on a similar model for commerce. Sellers deploy their own commerce logic such as orders, carts, and payments as a hosted service they control, and marketplaces integrate directly with seller APIs rather than hosting sellers. This removes platform overhead, lowers fees, and shifts ownership back to the people creating value, turning marketplaces into interoperable discovery layers instead of gatekeepers.

skybrianyesterday at 6:35 PM

This article goes into a lot of detail, more than is really needed to get the point across. Much of that could have been moved to an appendix? But it's a great metaphor. Someone should write a user-friendly file browser for PDS's so you can see it for yourself.

I'll add that, like a web server that's just serving up static files, a Bluesky PDS is a public filesystem. Furthermore it's designed to be replicated, like a Git repo. Replicating the data is an inherent part of how Bluesky works. Replication is out of your control. On the bright side, it's an automatic backup.

So, much like with a public git repo, you should be comfortable with the fact that anything you put there is public and will get indexed. Random people could find it in a search. Inevitably, AI will train on it. I believe you can delete stuff from your own PDS but it's effectively on your permanent record. That's just part of the deal.

So, try not to put anything there that you'll regret. The best you could do is pick an alias not associated with your real name and try to use good opsec, but that's perilous.

show 3 replies
motoxproyesterday at 7:54 PM

I've always thought walled gardens are the effect of consumer preferences, not the cause.

The effect of the internet (everything open to everyone) was to create smaller pockets around a specific idea or culture. Just like you have group chats with different people, thats what IG and Snap are. Segmentation all the way down.

I am so happy that my IG posts arent available on my HN or that my IG posts arent being easily cross posted to a service I dont want to use like truth social. If you want it to be open, just post it to the web.

I think I don't really understand the benefit of data portability in the situation. It feels like in crypto when people said I want to use my Pokemon in game item in Counterstrike (or any game) like, how and why would that even be valuable without the context? Same with a Snap post on HN or a HN post on some yet-to-be-created service.

show 4 replies
japanuspustoday at 8:45 AM

> Identity -- This is a difficult problem.

My hope is that in 5 years, I will not have anything in my feeds that have not been signed in a way that I can assign a trust level.

Here in the Nordics, we are already seeing messaging apps such as [hudd] that require government issued ID to sign in. I want this to spread to everything from podcasts and old-school journalism to the soccer-club newsletter, so that I can always connect a piece of information back to a responsible source.

[hudd]: (https://about.hudd.dk/))

christophilusyesterday at 11:02 PM

I’ve been reading “The Unix Programming Environment”. It’s made me realize how much can be accomplished with a few basic tools and files (mostly plain text). I want to spend some time thinking of what a modern equivalent would look like. For example, what would Slack look like if it was file (and text) oriented and UNIXy? Well, UNIX had a primitive live chat in the form of live inter-user messaging. I’d love to see a move back to simpler systems that composed well.

show 1 reply
skeledrewyesterday at 8:17 PM

I've been thinking of this for some time, conceptually, but perhaps from a more fundamental angle. I think the idea of "files" is pretty dated and can be thrown out. Treat everything as data blobs (inspired by PerKeep[0]) addressed by their hashes and many of the issues described in the article just aren't even a thing. If it really makes sense, or for compatibility sake, relevant blobs can be exposed through a filesystem abstraction.

Also, users don't really want apps. What users want are capabilities. So not Bluesky, or YouTube for example, but the capability to easily share a life update with interested parties, or the capability to access yoga tutorial videos. The primary issue with apps is that they bundle capabilities, but many times particular combinations of capabilities are desired, which would do well to be wired together.

Something in particular that's been popping up fairly often for me is I'm in a messaging app, and I'd like to lookup certain words in some of the messages, then perhaps share something relevant from it. Currently I have to copy those words over to a browser app for that lookup, then copy content and/or URL and return to the messaging app to share. What I'd really love is the capability to do lookups in the same window that I'm chatting with others. Like it'd be awesome if I could embed browser controls alongside the message bubbles with the lookup material, and optionally make some of those controls directly accessible to the other part(y|ies), which may even potentially lead to some kind of adhoc content collaboration as they make their own updates.

It's time to break down all these barriers that keep us from creating personalized workflows on demand. Both at the intra-device level where apps dominate, and at the inter-device level where API'd services do.

[0] https://perkeep.org/

show 1 reply
Jonovonoyesterday at 6:18 PM

I can’t remember how many times I’ve read an article and enjoyed it so much and then looked and saw it was written by Dan ;) always a pleasure !

show 1 reply
echoangletoday at 1:21 AM

Is there anything stopping me from backdating my own records? Since the createdAt is just an arbitrary field, I can just write whatever I want in there, right? Is there a way for the viewing application to verify when the record was created (and not modified since), maybe by looking at the mentioned signing?

show 1 reply
sroericktoday at 1:38 AM

Interesting - I just spent all day on this on an app which I'm using. My architecture is a little different (probably worse).

The app lives on a single OpenBSD server. All user data is stored in /srv/app/[user]. Authentication is done by accessing OpenBSD Auth helper functions.

Users can access their data through the UI normally. Or they can use a web based filesystem browser to edit their data files. Or, alternately, they can ssh into the server and have full access to their files with all the advantages this entails. Hopefully, this raises the ceiling a bit for what power users of the system can accomplish.

I wanted to unify the OS ecosystem and the web app ecosystem and play around with the idea of what happens if those things aren't separate. I'm sure I'm introducing all kinds of security concerns which I'm not currently aware of.

Another commenter brought up Perkeep, which I think is very interesting. Even though I love Plan 9 conceptually, I do sort of wonder if "everything is a file" was a bit of a wrong turn. If I had my druthers, I think building on top of an OS which had DB and blob storage as the primary concept would be interesting and perhaps better.

If anybody cares, it's POOh stack, Postgres, OCAML, OpenBSD, AND htmx

camgunzyesterday at 11:44 PM

I'm skeptical of these kind of like, self-describing data models. Like, I generally like at proto--because I like IPFS--but I think the whole "just add a lexicon for your service and bickety bam, clients appear" is a leap too far.

For example, gaze upon dev.ocbwoy3.crack.defs [0] and dev.ocbwoy3.crack.alterego [1]. If you wanted to construct a UI around these, realistically you're gonna need to know wtf you're building (it's a twitter/bluesky clone); there simply isn't enough information in the lexicons to do a good job. And the argument can't be "hey you published a lexicon and now people can assume your data validates", because validation isn't done on write, it's done on read. So like, there really is no difference between this and like, looking up the docs on the data format and building a client. There are no additional guarantees.

Maybe there's an argument for moving towards some kind of standardization, but... do we really need that? Like are we plagued by dozens of slightly incompatible scrobbling data models? Even if we are, isn't this the job of like, an NPM library and not a globally replicated database?

Anyway, I appreciate that, facially, at proto is trying to address lock in. That's not easy, and I like their solution. But I don't think that's anywhere near the biggest problem Twitter had. Just scanning the Bluesky subreddit, there's still problems like too much US politics and too many dick pics. It's good to know that some things just never change I guess.

[0]: https://lexicon.garden/lexicon/did:plc:s7cesz7cr6ybltaryy4me...

[1]: https://lexicon.garden/lexicon/did:plc:s7cesz7cr6ybltaryy4me...

show 1 reply
clnhlzmnyesterday at 7:04 PM

Seems similar to remoteStorage [0]. What happened to that anyway?

[0]: https://remotestorage.io/

show 2 replies
hollowoneplyesterday at 10:55 PM

Interesting concept for all new social platforms that already live in federated, distributed environments that share communication protocols and communication data formats.

I bet more difficult to push existing commercial platforms to anyhow consider.

That would make marketing tools to manage social communications and posting across popular social media, much easier. Never the less Social Marketing tools have already invented we similar analogy just to make control over own content and feedback across instances and networks.

We still live in a world where some would say BSKY some would say Mastodon is the future… while everybody still has facebook and instagram and youngsters tik tok too. Those are closed platforms where only tools to hack them, not standards persist

metabagelyesterday at 6:15 PM

How does this relate to the SOLID project?

https://solidproject.org/

show 1 reply
noelwelshyesterday at 6:40 PM

This, Local-first Software [1], the Humane Web Manifesto [2], etc. make me optimistic that we're moving away from the era of "you are the product" dystopian enshittification to a more user-centric world. Here's hoping.

[1]: https://www.inkandswitch.com/essay/local-first/

[2]: https://humanewebmanifesto.com/

show 1 reply
viraptortoday at 7:54 AM

I like the write-up of this idea. It's well presented. But I'd change one aspect: "We could leave author: 'dril' in the JSON but this is unnecessary too." - kind of. What the post lacks is the record of the identity at the time. What the user's username and the avatar was at the time can change the meaning of the post entirely. To really preserve the message, you need to reference what the displayed identity was used to post it - not just the account id.

There's a number of famous accounts that do it continuously. For example popehat today is "Fucking Bitch Hat" but will change to something else soon that may be related to the current events.

nonethewiseryesterday at 7:42 PM

But how do you get people to actually want this? This stuff is pretty niche even within tech.

show 2 replies
geokonyesterday at 6:32 PM

This was a nice intro to AT (though I feel it could have been a bit shorter)

The whole things seems a bit over engineered with poor separation of concerns.

It feels like it'd be smarter to flatten the design and embed everything in the Records. And then other layers can be built on top of that

Making every record includes the author's public-key (or signature?). Anything you need to point at you'd either just give its hash, or hash + author-public-key. This way you completely eliminate this goofy filesystem hierarchy. Everything else is embed it in the Record.

Lexicons/Collections are just a field in the Record. Reverse looking up the hash to find what it is, also a separate problem.

show 2 replies
ahussainyesterday at 9:01 PM

It seems like the biggest downside of this world is iteration speed.

If the AT instagram wants to add a new feature (i.e posts now support video!) then can they easily update their "file format"? How do they update it in a way that is compatible with every other company who depends on the same format, without the underlying record becoming a mess?

show 1 reply
itmiticayesterday at 11:21 PM

To share is to lose control. You can't undo, even once shared, it can't be undone. You can't retract a published novel. You can't retract a broadcast music or show. What makes you think you can do it over internet?

show 1 reply
yladizyesterday at 8:27 PM

I know this is somewhat covered in another comment, but, the concepts described in the post could have been reduced quite a bit, no offense Dan. While I like the writing generally, I would consider writing and then letting it sit for a few days, rereading, and then cutting chaff (editing). This feels like a great first draft but without feedback, and could have greatly benefited from an editing process, and I think using the argument that you want to put out something for others to take and refine isn’t really a strong one… a bit more time and refinement could have made a big difference here (and given you have a decently sized audience I would keep in mind).

show 2 replies
jrm4yesterday at 7:13 PM

The more I read and consider Bluesky and this protocol, the more pointless -- and perhaps DANGEROUS -- I find the idea.

It really feels like no one is addressing the elephant in the room of; okay, someone who makes something like this is interested in "decentralized" or otherwise bottom-up ish levels of control.

Good goal. But then, when you build something like this, you're actually helping build a perfect decentralized surveillance record.

This why I say that most of Mastodon's limitations and bugs in this regard (by leaving everything to the "servers") are actually features. The ability to forget and delete et al is actually important, and this makes that HARDER.

I'm just kind of like, JUST DO MASTODONS MODEL, like email. It's better and the kinks are more well thought about and/or solved.

show 8 replies
nonethewisertoday at 4:28 AM

Ironically, DID is the perfect vehicle for age verification.

diceduckmonktoday at 1:48 AM

Git is the API.

Github/Gitlab would be a provider of the filesystem.

The problem is app developers like Google want to own your files.

air217today at 12:35 AM

nostr protocol and the client/relay model is one simple way to separate apps (clients) from the data (relays)

elbciyesterday at 10:30 AM

agree! Social-media contributions as files on your system: owned by you, served to the app. Like .svg specifications allows editing in inkscape or illustrator a post on my computer would be portable on mastodon or bluesky or a fully distributed p2p network.

jadboxyesterday at 9:43 PM

How do people view AT Protocol vs Nostr? Why choose one over the other? Which has a better chance at replacing X?

hahahahhaahtoday at 6:47 AM

I have always thought open file format > open source. My ideal web everyone has their own web file storage (get from anywhere e.g. email provider) and web apps use that to store things. Team collab etc. built on top of that e.g. sharing a file means share ann accept edits type flow. Everyone owns their file.

James_Kyesterday at 8:22 PM

AT Proto seems very overengineered. We already have websites with RSS feeds, which more or less covers the publishing end in a way far more distributed and reliable than what AT offers. Then all you need is a kind of indexer to provide people with notifications and discovery and you're done. But I suppose you can't sell that to shareholders because real decentralised technology probably isn't going to turn as much of a profit as a Twitter knockoff with a vague decentralised vibe to it that most users don't understand or care about.

show 1 reply
eductionyesterday at 7:56 PM

Unpopular opinion: this should be done with xml, not json. XML can have types, be self describing, and be extended (the X in XML).

That said it’s a very elegant way to describe AT protocol.

show 1 reply
sneakyesterday at 6:55 PM

Losing private keys is much more common than losing domains.

show 1 reply
LoganDarkyesterday at 10:07 PM

I did a double take at "DID as identity" because Dissociative Identity Disorder shares the same acronym

EGregyesterday at 8:22 PM

As someone who explicitly designed social protocols since 2011, who met Tim Berners-Lee and his team when they were building SOLID (before he left MIT and got funded to turn it into a for-profit Inrupt) I can tell you that files are NOT the best approach. (And neither is SPARQL by the way, Tim :) SOLID was publishing ACLs for example as web resources. Presumably you’d manage all this with CalDAV-type semantics.

But one good thing did come out of that effort. Dmitri Zagidulin, the chief architect on the team, worked hard at the W3C to get departments together to create the DID standard (decentralized IDs) which were then used in everything from Sidetree Protocol (thanks Dan Buchner for spearheading that) to Jack Dorsey’s “Web5”.

Having said all this… what protocol is better for social? Feeds. Who owns the feeds? Well that depends on what politics you want. Think dat / hypercore / holepunch (same thing). SLEEP protocol is used in that ecosystem to sync feeds. Or remember scuttlebutt? Stuff like that.

Multi-writer feeds were hard to do and abandoned in hypercore but you can layer them on top of single-writer. That’s where you get info join ownership and consensus.

ps: Dan, if you read this, visit my profile and reach out. I would love to have a discussion, either privately or publicly, about these protocols. I am a huge believer in decentralized social networking and build systems that reach millions of community leaders in over 100 countries. Most people don’t know who I am and I’m happy w that. Occasionally I have people on my channel to discuss distributed social networking and its implications. Here are a few:

Ian Clarke, founder of Freenet, probably the first decentralized (not just federated) social network: https://www.youtube.com/watch?v=JWrRqUkJpMQ

Noam Chomsky, about Free Speech and Capitalism (met him same day I met TimBL at MIT) https://www.youtube.com/watch?v=gv5mI6ClPGc

Patri Friedman, grandson of Milton Friedman on freedom of speech and online networks https://www.youtube.com/watch?v=Lgil1M9tAXU

show 2 replies
catapartyesterday at 5:40 PM

yeah yeah yeah, everyone get on the AT protocol, so that the bluesky org can quickly get all of these filthy users off of their own servers (which costs money) while still maintaining the original, largest, and currently only portal to actually publish the content (which makes money[0]). let them profit from a technical "innovation" that is 6 levels of indirection to mimic activity pub.

if they were decent people, that would be one thing. but if they're going to be poisoned with the same faux-libertarian horseshit that strangled twitter, I don't see any value in supporting their protocol. there's always another protocol.

but assuming I was willing to play ball and support this protocol, they STILL haven't solved the actual problem that no one else is solving either: your data exists somewhere else. until there's a server that I can bring home and plug in with setup I can do using my TV's remote, you're not going to be able to move most people to "private" data storage. you're just going to change which massive organization is exploiting them.

I know, I know: hardware is a bitch and the type of device I'm even pitching seems like a costly boondoggle. but that's the business, and if you're not addressing it, you're not fomenting real change; you're patting yourself on the back for pretending we can algorithm ourselves out of late-stage capitalism.

[0] *potentially/eventually

show 2 replies
bschmidt999yesterday at 10:11 PM

[dead]

ninkendoyesterday at 6:14 PM

> When great thinkers think about problems, they start to see patterns. They look at the problem of people sending each other word-processor files, and then they look at the problem of people sending each other spreadsheets, and they realize that there’s a general pattern: sending files. That’s one level of abstraction already. Then they go up one more level: people send files, but web browsers also “send” requests for web pages. And when you think about it, calling a method on an object is like sending a message to an object! It’s the same thing again! Those are all sending operations, so our clever thinker invents a new, higher, broader abstraction called messaging, but now it’s getting really vague and nobody really knows what they’re talking about any more.

https://www.joelonsoftware.com/2001/04/21/dont-let-architect...

show 2 replies
doctorflanyesterday at 8:09 PM

I was hoping this was literally just going to be some safe version of a BBS/Usenet sort of filesharing that was peer-based king of like torrents, but just simple and straightforward, with no porn, infected warez, randomware, crypto-mining, racist/terrorist/nazi/maga/communist/etc. crap, where I could just find old computing magazines, homebrew games, recipes, and things like that.

Why can’t we have nice things?

I guess that’s what Internet Archive is for.