A frequent visitor to HN. Tip: if you click on the "past" link under the title (but not the "past" link at the top of the page), you'll trigger a search for previous posts.
https://hn.algolia.com/?query=Parse%2C%20Don%27t%20Validate&...
However, it's more effective to throw quotes into the mix, reduces false positives.
https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...
This is a great article, but people often trip over the title and draw unusual conclusions.
The point of the article is about locality of validation logic in a system. Parsing in this context can be thought as consolidating the logic that makes all structure and validity determination about incoming data into one place in the program.
This lets you then rely on the fact that you have valid data in a known structure in all other parts of the program, which don't have to be crufted up with validation logic when used.
Related, it's worth looking at tools that further improve structure/validity locality like protovalidate for protobuf, or Schematron for XML, which allow you to outsource the entire validity checking to library code for existing serialization formats.
I think, more generally, "push effects to the edges" which includes validation effects like reporting errors or crashing the program. If you, hypothetically, kept all of your runtime data in a big blob, but validated its structure right when you created it, then you could pass around that blob as an opaque representation. You could then later deserialize that blob and use it and everything would still be fine -- you'd just be carrying around the validation as a precondition rather than explicitly creating another representation for it. You could even use phantom types to carry around some of the semantics of your preconditions.
Point being: I think the rule is slightly more general, although this explanation is probably more intuitive.
Related. Others?
Parse, Don't Validate (2019) - https://news.ycombinator.com/item?id=41031585 - July 2024 (102 comments)
Parse, don't validate (2019) - https://news.ycombinator.com/item?id=35053118 - March 2023 (219 comments)
Parse, Don't Validate (2019) - https://news.ycombinator.com/item?id=27639890 - June 2021 (270 comments)
Parse, Don’t Validate - https://news.ycombinator.com/item?id=21476261 - Nov 2019 (230 comments)
Parse, Don't Validate - https://news.ycombinator.com/item?id=21471753 - Nov 2019 (4 comments)
I'm sorry, I don't like to title drop, but I am a Staff Data Engineer and I find that "type driven" development is an inappropriate world view for many programming contexts that I encounter. I use "world view" carefully as it makes a contractual assumption about reality -- "give me what I expect". Data processing does not always have the luxury of such imposition. In these contexts a dynamic and introspective world view is more appropriate, "What do we have here?" "What can we use?". In 2019 I would have felt crippled by use of Haskell in data processing contexts and have instead done much in Clojure in these intervening years, though now LLM assisted use of Haskell toward such tasks would be a fun spectator sport.
I make great use of value objects in my applications but there are things I needed to do to make it ergonomic/performant. A "small" application of mine has over 100 value objects implemented as classes. Large apps easily get into the 1000s of classes just for value objects. That is a lot of boilerplate. It's a lot of boxing/unboxing. It'd be a lot of extra typing than "stringly typed" programs.
To make it viable, all value objects are code-generated from model schemas, and then customized as needed (only like 5% need customization beyond basic data types). I have auto-upcasting on setters so you can code stringly when wanted, but everything is validated (very useful for writing unit tests more quickly). I only parse into types at boundaries or on writes/sets, not on reads/gets (limit's the amount of boxing, particularly on reading large amounts of data). Heavy use of reflection, and auto-wiring/dependency injection.
But with these conventions in place, I quite enjoy it. Easy to customize/narrow a type. One convention for all validation. External inputs are by default secure with nice error messages. Once place where all values validation happens (./values classes folder).
It seems modern statically-typed and even dynamically-typed languages all adopted this idea, except Go, where they decided zero values represent valid states always (or mostly).
A sincere question to Go programmers – what's your take on "Parse, Don't Validate"?
Making illegal states unrepresentable sounds like a great idea, and it is, but I see it getting applied without nuance. “Has multiple errors” can be a valid type. Instead of bailing immediately, you can collect all of the errors so that they can be reported all together rather than forcing the user to fix one error at a time.
A great piece.
Unfortunately, it's somewhat of a religious argument about the one true way. I've worked on both sides of the fence, and each field is equally green in its own way. I've use OCaml, with static typing, and Clojure, with maybe-opt-in schema checking. They both work fine for real purposes.
The big problem arrives when you mix metaphors. With typing, you're either in, or you're out - or should be. You ought not to fall between stools. Each point of view works fine, approached in the right way, but don't pretend one thing is the other.
Each repost is worth it.
This, along with John Ousterhout's talk [1] on deep interfaces was transformational for me. And this is coming from a guy who codes in python, so lots of transferable learnings.
> Now I have a single, snappy slogan that encapsulates what type-driven design means to me, and better yet, it’s only three words long
IMHO this is distracting and sort of vain. It forces this "semantics" perspective into the reader, just so the author can have a snappy slogan.
Also, not all languages have such freedom in type expressiveness. Some of them have but offer terrible trade-ofs.
The truth is, if you try to be that expressive in a language that doesn't support it you'll end up with a horror story. The article fails to mention that, and that "snappy slogan" makes it look like it's an absolute claim that you must internalize, some sort of deep truth that applies everywhere. It isn't.
bonus points for the correct use of "cromulent"
I did a lightning talk on this topic last year, with a concrete example in Yesod.
Semi tangent but I am curious. for those with more experience in python, do you just pass around generic Pandas Dataframes or do you parse each row into an object and write logic that manipulates those instead?
I'll be honest, as someone not familiar with Haskell, one of my main takeaways from this article is going down a rabbit hole of finding out how weird Haskell is.
The casualness at which the author states things like "of course, it's obvious to us that `Int -> Void` is impossible" makes me feel like I'm being xkcd 2501'd.
The author's point here is great, but the post does (imho) a poor job illustrating it.
The tl;dr on this is: stop sprinkling guards and if statements all over your codebase. Convert (parse) the data into truthful objects/structs/containers at the perimieter. The goal is to do that work at the boundaries of your system, so that inside of your system you can stop worrying about it and trust the value objects you have.
I think my hangup here is on the use of terms parse vs validate. They are not the right terms to describe this.
Hot take: Static typing is often touted as the end all be all, and all you need to do is "parse, don't validate" at the edge of your program and everything is fine and dandy.
In practice, I find that staunch static typing proponents are often middle or junior engineeers that want to work with an idealised version of programming in their heads. In reality what you are looking for is "openness" and "consistency", because no amount of static typing will save you from poorly defined or optimised-too-early types that encode business logic constraints into programmatic types.
This is also why in practice alot of customer input ends up being passed as "strings" or have a raw copy + parsed copy, because business logic will move faster than whatever code you can write and fix, and exposing it as just "types" breaks the process for future programmers to extend your program.
Maybe I'm missing something and I'm glad this idea resonates, but it feels like sometime after Java got popular and dynamic languages got a lot of mindshare, a large chunk of the collective programming community forgot why strong static type checking was invented and are now having to rediscover this.
In most strong statically typed languages, you wouldn't often pass strings and generic dictionaries around. You'd naturally gravitate towards parsing/transforming raw data into typed data structures that have guaranteed properties instead to avoid writing defensive code everywhere e.g. a Date object that would throw an exception in the constructor if the string given didn't validate as a date (Edit: Changed this from email because email validation is a can of worms as an example). So there, "parse, don't validate" is the norm and not a tip/idea that would need to gain traction.