logoalt Hacker News

OCR for construction documents does not work, we fixed it

120 pointsby wcisco17yesterday at 4:05 PM82 commentsview on HN

So we've built an API and trained models that detects fixtures, extracts schedules, and analyzes construction documents. Check us out!

More examples: - https://www.getanchorgrid.com/developer/docs/endpoints/drawi...

Main website: - https://www.getanchorgrid.com/developer

Why we did it: https://www.getanchorgrid.com/developer/docs/changelog/const...


Comments

Terr_yesterday at 5:34 PM

> OCR for construction documents does not work

I'm reminded of the Xerox JBIG2 bug back in ~2013, where certain scan settings could silently replace numbers inside documents, and bad construction-plans were one of the cases that led to it being discovered. [0]

It wasn't overt OCR per se, end-user users weren't intending to convert pixels to characters or vice-versa.

[0] https://www.youtube.com/watch?v=c0O6UXrOZJo&t=6m03s

show 2 replies
ichamotoday at 1:28 AM

Really interesting to see this space developing. I'm building a masonry-specific quantity takeoff tool (vision model extraction into a parametric domain model that spits out bid-ready quantities) and the "data prison" framing resonates hard.

One thing I've learned going deep in a single trade: the distance between "structured JSON from a drawing" and "numbers an estimator will bid with" is enormous. I've been really impressed with Bobyard and SketchDeck especially.

h317's point about the liability-driven re-counting circus is spot on. Each party in the chain needs to own their numbers. Revit could have solved this a long time ago had this not been the case. An API that makes each individual count faster is valuable but it doesn't collapse the chain.

Would love to talk to anyone else building in this space.

h317yesterday at 8:44 PM

I cannot wait for the day when tech companies become players in the construction industry because it looks like it is the only way forward to make a change.

To think that everything has been digitalized a long time ago, yet contract law cannot properly deal with delineating responsibilities between GC and Architects, who are still sending 2D drawings to each other.

Imagine, all this information about quantities and door types (and everything else) is already available and produced by the architect's team, BUT they cannot share it! Because if they do, they are responsible for the numbers in case something is wrong.

So now there is this circus of: Arch technologist making the base drawing with doors. GC receives documents, counts doors for verification, and sends them to the sub. Subcontractor looks at these drawings, counts them again, and sends data to the supplier. Guess what, the supplier also looks, counts, confirms, and back we go.

Though I think robotics will change all of that. And when we have some sort of bot assistance, big tech players will have a bigger leverage in this, which will lead to the proper change management architecture.

Anyway, cool product. Anything to help with estimation. Really hope it gets traction.

show 4 replies
sreekanth850yesterday at 6:53 PM

We’re taking a different path, building a parsing engine that converts CAD (DWG/DXF) into fully structured JSON with preserved semantics (no ML in the critical path).We also have a separate GIS parser that extracts vector data (features, layers, geometries) independently, Like to know how you handle consistency and reproducibility across runs using models and how you make it affordable, especially at scale. because as far as i know CAD and GIS need precision and accuracy.

show 3 replies
peteeyesterday at 7:45 PM

I ran the example doors given and it missed 9 swinging doors, some that were in double swing pairs, and a few that were just out on their own not clustered. Not bad overall though

show 1 reply
nostrapolloyesterday at 9:46 PM

First off, congrats on the launch! Construction is a tough market to build in. My personal view after being in it a for a few years is that there is no shortage of MVPs. In fact there is an MVP for every problem at every level (or at least it feels that way) but construction is /vast/ and the rough edges that seem juicy at first, in practice are optimizations rather than bottlenecks for constructors.

I hope you succeed because it would be great to have a standard API for this data, but I would advise on one of two directions: become the standard by being close to 100% accurate at finding symbols (one symbol doesn't seem to cut it in our testing) or make a great, comprehensive workflow for a small subset of the market and become standard that way.

In both cases, you cannot do a broad 'market test', you need to spend many hours with a specific sub-set of users in construction.

Disclaimer: I'm a co-founder of Provision.

copypapertoday at 12:21 AM

Very interesting. Im on vacation but will check this out at work next week.

What is the maximum resolution you support for PDFs? The max gemini will do is 3072x3072. We have plans that are 10x that size.

show 1 reply
frogguyyesterday at 6:14 PM

Looks cool! Where are you getting the data to finetune the cv models for element extraction? I'm worried there isn't a robust enough dataset to be able to build a detection model that will generalize to all of the slightly different standards each discipline (and each firm for that matter) use.

show 1 reply
tomedwrdsyesterday at 9:40 PM

I have been working on an extension of this problem lately that involves extracting all doors + any details about those doors to produce quotes. I have found giving the pdf to codex pretty good at it as it can take subcrops of the plans to look at certain areas of high noise in more detail. Only downside is cost is quite high.

punnerudyesterday at 9:01 PM

«Why we did it»; would rather have a “How we did it”. The why gave me AI generated marketing material feelings.

Tailscale’s article about NAT traversal is an example of how to write “how we did it”: https://tailscale.com/blog/how-nat-traversal-works

show 1 reply
mmethodzyesterday at 10:19 PM

Do that for Finnish construction documents. My parser is 30000+ lines candidate based but the lack of standards and the Finnish language...

testUser1228yesterday at 5:37 PM

What do you foresee being the end use case for this (or most valuable use case)?

show 1 reply
Iuliohyesterday at 5:05 PM

When will this be available for 30000x8000px electrical diagrams?

I have to make a BOM and oh boy I hate my job

show 3 replies
hspraggins77yesterday at 5:47 PM

Great points raised!

alexeischiopuyesterday at 5:30 PM

Good idea :)

show 1 reply
vessenesyesterday at 5:31 PM

cool. What's pricing like?

show 1 reply
achillesheelsyesterday at 4:57 PM

Love it! Starbucks Vente Machiato sip

Love to give it to an arc client, not sure who the right person to implement this would be? Hmm…

show 1 reply
i18nagentaiyesterday at 7:06 PM

[flagged]

show 1 reply
ware-intelyesterday at 6:04 PM

Your smart features looks like a game changer? Nice job!

show 1 reply
fithisuxyesterday at 4:48 PM

Of course it is not working. PDF and images are supposed to be tamper resistant. OCR tries to reverse engineer them.

show 1 reply