Hacker News new | past | comments | ask | show | jobs | submit login
Apple releasing segmentation/pose for humans and animals, embedding for 27 lang (developer.apple.com)
220 points by sumodm on June 6, 2023 | hide | past | favorite | 73 comments



I remember when pose detection was announced, showing an app that corrected your workout movements. i have yet to see an app that actually does that. i'd love to have the equivalent of a personal trainer showing me where i need to adjust my pose in say pushups or other simple excercies.

thus im equally sceptical of seeing these apis used. it seems developers are mostly porting web apps to all platforms ignoring neat but platform specific apis like this.

please prove me wrong and link some awesome apps that use pose detection.


There are multiple apps in the App Store that do this. I spent last year implementing pose detection in an exercise app and we used both Apple’s pose detection and a 3rd party’s. The pose (each point of the human form) itself was sent to a machine learning backend at around 30 fps, analyzed, and data returned at about the same speed using gRPC. Each exercise had a set of specific feedback for both positioning (“Stand facing the camera with you arms at your side/Stand sideways to the camera/etc”) and form correction (“Raise your right arm higher above your head etc”). Feedback was spoken out loud to the user and there was a relatively complex set of rules governing which feedback got priority and how often feedback was spoken. I also implemented an on-screen “skeleton” of the user’s human form points that rendered on top of the camera view. Pretty fun project from a tech point of view.


The signal to noise in fitness apps is high. The mainstream ones don’t do this, or if they do the implementation is so bad it’s not worth using, and discovery of anything else is fraught with shitware that wants a subscription to “unlock” it’s unknown potential.


Did you mean to say signal-to-noise [ratio] is _low_? Meaning that you get way too much noise for the amount of signal. Or did you mean to say it needs to be high (I.e. low noise) to be useful?


For every good fitness app that does what it promises (-> signal) there are at least 50 bad fitness apps that promise too much and let you pre-pay for the (broken) features you wanted, money you'll never get back (-> noise).

The amount of noise in Fitness apps is so high, nobody really dares to try out small apps. Therefore cool implementations from small devs like the workout-correction might stay unnoticed for years.


Right, and SNR is signal dividied by noise.


i did mean low, yes :)


Yeah you mean low


Not sure why you got flagged, so I have vouched - I did indeed mean "low".


Can you name the app?


What’s the best app you’ve tried?


The “there’s an app for that” world where small to medium sized teams can build a very iOS native app that takes advantage of the latest and greatest of the device is long gone.

It’s kinda weird how Apple doesn’t realize this and continues to build for that world. Maybe if they were willing to shift on their % for devs that do build that way but unless they did there just isn’t the audience buying apps outright and the only ways to profit are tricking people into abusive subscriptions or building on ads and their personal data.

Until then no idea why any dev would build just for the Apple ecosystem and not something agnostic.

It’s telling to me that the biggest tech apps of the last 2 years all ran web/desktop first.


There are still a few, e.g. https://halide.cam/

But if you are successful, there is a chance of getting sherlocked, so its a risky business model.


> But if you are successful, there is a chance of getting sherlocked, so its a risky business model.

What does this even mean? Watson shows up to help out?


Watson was an independent search application on Mac OS, until Apple basically photocopied it and named theirs Sherlock. Since then it's become a verb for when Apple takes your app and builds it into the OS.

Another blatant example was Dashboard, which copied Konfabulator, Night Shift is a copy of F.lux, etc.


I still use F.lux, it's much better than night shift IMO, just wish they had it for iOS.


When they demonstrated on-desktop widgets the other day, I thought to myself “there goes Konfabulator/Dashboard 2.0”, and then I thought to myself “you’re old enough to remember that and to be caustically cynical”.


Technically Sherlock predates Watson, it's just a lot of the useful additional features added by Watson were copied to Sherlock.


Ah you're right, it was Sherlock 3 that copied the Watson features. Mostly related to searching the web for things like ebay listings, recipes, stocks, software.

So the Watson name was probably inspired by Apple's from previous versions, but the Sherlock 3 feature set definitely got cloned from Karelia's.


"Getting sherlocked means that Apple just announced the software, or feature that a developer built their business on."

https://www.howtogeek.com/297651/what-does-it-mean-when-a-co...


> Until then no idea why any dev would build just for the Apple ecosystem and not something agnostic.

The standard reason given is that iPhone users are much more valuable than Android users, in that they're a lot more likely to pay for things. If I'm creating a workout app with a fancy form-correction feature then I might well want to use Apple-platform things that make it quicker to develop, at the cost to me of only slightly restricting my actual market.


It's not just the machine learning stuff, they have a non-portable approach for everything, including the platform's primary programming language. They still seem to live in a world where a significant niche of developers targets Apple platforms and their bespoke APIs only.

The problem with that world view is that (a) everything with a network effect can't target a single platform anymore, and (b) the business model for old-school professional single-user apps was killed by the App Store.


They are relying on people looking into spending statistics by platform and realizing that if they want those sweet sweet $$$, they are forced to deal with apple and their walled garden.


Unlike Microsoft? Windows APIs are just as "bespoke".


I think they seem to be doing fairly well as a company, and part of that is not letting themselves be tethered by a standard to allow competitors equal access to their walled garden. Whether you like that or not, it’s the strategy they’ve taken. They would rather not have your app than distort their platform to accommodate its ability to run on another platform.

For developers the reason to adopt the apple ecosystem is fairly simple. People willing to pay for an apple device are likely willing to pay for a subscription. The apple model is essentially you buy a subscription to their hardware - they release at a regular clip, they anticipate most customers will refresh, there’s no meaningful upgrade path, etc. As a developer I prefer subscriptions over one time purchases because it incentives my maintenance and growth of features for existing customers rather than a never ending grab for new customers. As a consumer while my pocket book certainly prefer one time pay, I actually do see the benefit in incentivizing continuous improvements for existing customers. (I do however wish that apple didn’t hide the subscriptions management so deeply and made it very prominent, and until they do it falls into the abusive category IMO)


> It’s kinda weird how Apple doesn’t realize this and continues to build for that world.

If you’re a hardware manufacturer, I don’t think building for the common denominator of the web browser is a viable strategy. Looking at various of their competitors, it certainly brings in less money.

How many people would buy an iPhone that’s basically a “browser device” if, for 50% of its price, they could get something that’s 80% as good (percentages for illustration purposes)?


> It’s telling to me that the biggest tech apps of the last 2 years all ran web/desktop first.

What are the two apps that you are referring to? No snark just curious. Because the only thing I can think of are video calls or social media (which are arguably older than two years).


ChatGPT, Stable Diffusion. Both web/terminal first.

Think 5-8 years ago at least one of them would have been app first.


You already have native chatgpt app by openai. Also Microsoft have integration on native edge browser, SwiftKey keyboard and Bing app.

There are also many mobile stable diffusion apps and even native mobile discord app which is UI for midjourney

And since all those app require a lot of typing or prompt tweaking they where better suited for desktop first


The big question is whether it's even capable of making recommendations like that. You'd have to combine it with your own model.

Having read books on strength training and tried to learn stuff like squatting perfectly myself I'm skeptical it could be to grasp the nuance.

But for dancing and other stuff where it doesn't matter as much it could be useful (health/safety wise when carrying load).


Could you share some reads?


The standard recommendation for strength training is 'Starting Strength'.


Yes that's what I read and reread.


There a home gym setup called Tempo that claims to do this. I have the scaled down mobile app version so idk if the full setup is more informative, but it doesn't really give a lot of feedback at all. It basically just tracks the movement of the dumbbell plates that they send you to count your reps, and if you aren't moving the weight to the correct position your rep won't count. It definitely doesn't do anything like correct your form based on your body position like "hey straighten your back" or anything


I switched from the Move to the Studio (freestanding version) and the form feedback triggers __slightly__ more often but it's still not worth bothering about imo. the rep counting not working bothers me way more.


Hey, I'm the founder of Guru (https://getguru.ai/), a video AI dev platform. Developers are using our movement APIs to build some cool form feedback apps, including NFL coaches.

- Demo: https://www.formguru.fitness/video/c96fa975-fd9e-4912-8f60-1...

- Blog: https://blog.getguru.ai/guru-sports-powering-the-top-prospec...

- Customers: https://www.cadoo.io/, https://www.breakawaydata.com/, https://pharosfit.com/, https://www.producthunt.com/posts/fitx.

We've trained our own models (and customers can finetune them), but it exports cleanly to iOS (and Android!).


One of the reasons that I write native, is so that I can access stuff like this. It usually takes quite a while for hybrid platforms to catch up.

That said, I don’t have an immediate need for this particular SDK, in the project I’m developing. I just like to have the option to integrate stuff like this.

Also, I’m not a “bleeding edge” developer. I’m still using UIKit/AppKit/WatchKit (as opposed to SwiftUI), and my software supports one OS version back, upon release.


I'm part owner of a company (Halterix) that used pose detection (alternately via smartwatch accelerometer) and machine learning to quantify how well exercises are performed.

We built a demo app for use in physiotherapy to improve outcomes and ran a few clinical studies. The detection accuracy was excellent and patient reception was warm.

There are a number of competitors, some with multi-sensor systems targetted to pros, some with vision systems, etc.

We met with all the big fitness app makers and found generally while they weree somewhat interested in pose detection/accuracy assessment and feedback, it's not at the top of their list of priorities to implement (even to incorporate our 3rd party service).


Proper disclosure: I had been formerly involved with them.

Try Kemtai.com.

There's a demo section at https://app.kemtai.com/sample-workouts

We took workout experience super seriously, and in my biased view, got it to be a usability joy.


For a more immediate and quite practical solution, I've had good results from simply taking videos of various movements and watching them right afterwards.



So hire a personal trainer for a session or two?


> it seems developers are mostly porting web apps to all platforms ignoring neat but platform specific apis like this.

I would be weird to have your social app trying to correct your pose ;)


> I would be weird to have your social app trying to correct your pose ;)

It would be less weird for it to call home so that Z*ck knows to serve you ads for painkillers for your spine...


This comment seems to be unrelated to the subject. Is there a specific reason you mention it (along with the down vote I take is also from you, given your passive aggressive tone)?

The latent point is that most applications don't need specific APIs for their value propositions, why it would not make sense to write them in a native framework that enables these APIs.


> This comment seems to be unrelated to the subject.

Isn't it obvious? It's tech companies and mobile ecosystem. Everything that can be used for ads and surveillance, will be used for ads and surveillance.

Pose detection can provide a lot of insights about the user's overall health. All the unscrupulous players now need is some bullshit reason to convince the user to a) install their app, and b) use the feature. Like, idk, high-fidelity dancing AR avatars to use as stickers on social media (a real thing, btw.).

Apple may or may not make this hard, but it's a real thing to be concerned about. Arguably, it's the most obvious use case, given the state of this industry.


> Is there a specific reason you mention it (along with the down vote I take is also from you, given your passive aggressive tone)?

Yes, the reason is I considered it funny. Sorry if that didn't work.

Also, I didn't downvote your comment, but I just upvoted it to compensate for someone who downvoted you, seemingly without a good reason – at least I agree with it.



Where are the actual bindings? Linking to a page with lots of long videos that are mostly not available (it says "Available on June 6" (or 7,8,9) with a editorialized title that is not even on the page is below HN standards.


Can download an Xcode project and view documentation here: https://developer.apple.com/documentation/vision/detecting_a...


Thanks. Where can we find bindings for other non-Swift languages? Or does the 'embedding for 27 lang' in the title refer to 'human language recognition' and not 'programming language binding'?


Given that the talk about 27 languages is about natural language processing, you can probably figure out the answer.


Ok, this was buried behind a few links from the linked page (can't one link directly?). Here's the actual presentation - which is not available yet:

https://developer.apple.com/videos/play/wwdc2023/10042


Just in case you were wondering, animals seems to be just cats and dogs: https://developer.apple.com/documentation/vision/vnanimalide...


> static let cat: VNAnimalIdentifier

This feels funny to read, like from one of those inheritance tutorials on object oriented programming


Seems logical, most people interact with their pets which are mostly cats and dogs.

Horses will probably be next


Short-legged pets like hamsters and rabbits might be trickier.

vs goldfish, which are easier: the pose is simply "Live" or "Dead".


Schrödinger should have used a goldfish.


I am now going to use goldfish for all of my booleans.


This is entering Monty Python territory.


According to the keynote the first party app is going to recognize the family cat/dog as a "person".


We had tried their vision framework for pose, the accuracy was not great compared to other open source models. Hope they solve the issues with the new release.

@lgrebe: Check XTRAVISION and let me know if that is what you were looking for. Demo: https://demo.xtravision.ai/


Does this mean Apple is making it easier to run models on Macs? I have a fairly powerful Mac studio, but I've found it very hard to run any model on it.


Why do you feel that's been dofficult?


(feel free to correct me if I am wrong), but my main gripe against mobile ML frameworks (Android too) is they require the app to embed the ML model with the app (as opposed to the OS storing the model like a shared library).

People with limited storage on low-end device don't have enough memory to store the apps.


CoreML has various models already built-in, although as black boxes that accomplish some task like OCR or rectangle detection. There's also a "feature print" model which I believe are intended to be used as hard-coded features for simple ML tasks. In either case I strongly suspect that when you use them, they're not being embedded in the app.

Another thing to consider is that you don't have to embed the model in your app; at least in CoreML you can download (and update) the model weights over the network.


People have lower HD size (think 64GB iPhone X/iPhone SE) on unreliable internet networks. Downloading five 200MB models to perform five layers of processing (OCR, rm background, object detection, etc.) would take hours and consume too much cellular data.

Sending a 10kb image to the cloud for processing is much faster and user friendly.


On Android, you can choose to include the ML model dependency "bundled into your app in compile time" or shipped through Google Play Services. Saves many megabytes. Drawback there: if the device doesn't have Play Services, nothing works. Also, on first download it takes some seconds to work.


Is there an app using this for coaching running form, or doing a custom bike fitting? That would be awesome.


There's nothing technically stopping us from doing that even 10 years ago.

MTailor [0] is/was a company that, using your phone camera, could measure you for pants/shorts/shirts/etc... it was a YC company 2014 and also on Shark Tank.

I hope the AI hype brings back some of these sort of use cases that might have been ahead of their time.

[0] https://apps.apple.com/in/app/mtailor-custom-clothing/id8160...


I would be interested to know of a consistent, on-board embeddings model. Trying to reduce latency and dependence on API calls for simple vector database search will go a long wag



TL;DR: Good step for the entire market, productization is the harder problem.

I had been formerly involved with Kemtai, which built a fantastic physical therapy/fitness experience (in my biased view) using motion tracking.

If anyone's interested, it is running well and quickly over WebGL on a pretty impressive share of regular phones and laptops across all platforms with WebGL (not just Apple)

My learnings is that the hard part is the productization on top of motion tracking: what constitutes an exercise? What is a "good" performance? How to build the authoring workflow for the many hundreds to low thousands of exercises necessary to reach a typical user base?

In any case, that's awesome news. There are literally billions of people whose condition is going to be better via motion tracking based health and fitness. May it grow there, and quickly!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: