logoalt Hacker News

Efficient Track Anything

152 pointsby t5512/03/202415 commentsview on HN

Comments

steinvakt212/09/2024

I wish these things were described more clearly. Is this single object tracking or multi object tracking? Just a week ago SAMURAI was posted here, which is kind of the same thing, promising SOTA tracking performance using SAM2. But it only allows single object tracking, which makes it useless for many medical imaging tasks.

show 1 reply
ninalanyon12/09/2024

Was the abstract written by ChatGPT? It's an unreadable wall of text.

wis12/09/2024

It was fun trying out the demo, with the "coffee kettle pouring" video it did really well segmenting the man's hand and arm and tracking it (segmenting them in every frame correctly), but with the "Find the ball cup game" video it lost track of the tracked cup in a strange way, it kept track of it correctly while it went behind other cups, but after it wasn't occluded anymore, it switched to an other cup.

It's still impressive to me how it twice kept track between occlusions, but strange how it lost track when it wasn't occluded.

https://i.imgur.com/hOSQBtw.mp4

show 1 reply
brunorsini12/09/2024

Does anyone know of a method for plugging the output of models like this one with traditional video editing software like Adobe Premiere?

show 2 replies
atoav12/09/2024

What I'd love to see is how these tools perform with low depth of field shots, e.g. one actor in shot and one actor out of focus in front of them standing in front of a street with moving traffic.

This kind of "cinematic" shots is where automatic masking tools typically fall apart.

show 1 reply
datadrivenangel12/09/2024

"On mobile devices such as iPhone 15 Pro Max, our EfficientTAMs can run at ~10 FPS for performing video object segmentation with reasonable quality"

This is pretty impressive! Lowering the compute requirements will allow more applications to be feasible.

thot_experiment12/10/2024

Interesting, I saw this: https://arxiv.org/pdf/2411.11922 on here a few days back but I haven't actually read either paper, anyone who's looked at both care to give us a TL;DR?