Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

> Recognizes "human" and recognizes "desk". I sit on desk. Does AI mark it as a desk or as a chair?

Not an issue if the image segmentation is advanced enough. You can train the model to understand "human sitting". It may not generalize to other animals sitting but human action recognition is perfectly possible right now.



sort by: page size:

There are lots of things people sit on that we would not categorize as chairs. For example if someone sits on the ground, Earth has not become a chair. Even if something's intended purpose is sitting, calling a car seat or a barstool a chair would be very unnatural. If someone were sitting on a desk, I would not say that it has ceased to be a desk nor that it is now a chair. At most I'd say a desk can be used in the same manner as a chair. Certainly I would not in general want an AI tasked with object recognition to label a desk as a chair. If your goal was to train an AI to identify places a human could sit, you'd presumably feed it different training data.

Our AI (yolo5) can only detect an ergonomic chair. Distinguishing a desk from a kitchen table with AI is probably impossible today.

We can't detect a monitor yet either, because it looks the same as a TV.

AI works great for us only in marketing


But currently humans do that "pattern finding". If you want it to learn to recognize animals you give it thousands of images of different animals in all sorts of poses, angles and environments and tells it what those animals are, basically the "pattern" is created by the humans who compiles the dataset with enough examples to catch every situation, and then the program uses that pattern to find matches in other images. However if you want a human to recognize animals it is enough to show this picture and then they will be able to recognize most of these animals in other photos and poses, humans creates the pattern from the image and don't just rely on having tons of data:

https://i.pinimg.com/originals/35/78/47/35784708f8cc9ef2345c...

Edit: In some cases you can write a program to explore a space. For example, you can write a program that plays board games randomly and notes the outcome, that is a data generator. Then you hook that to a pattern recognizer powered by a data centre, and you now have a state of the art gameplay AI. It isn't that simple, since writing the driver and feedback loop is extremely hard work that an AI can't do, humans has to do it.


How hard would it be to add AI that could distinguish between human and dog/cat motion?

I believe the fact that humans can distinguish a cat from a chair after being shown just a single cat actually demonstrates that humans have much deeper insight into what a "concept" is than current AIs do.

If sensor data were the problem, computers could easily outperform humans since we have sensors that generate much more detailed data than the human senses: High-resolution cameras, multi-spectral and thermal imaging, x-rays, radar, etc.

The actual difference is that when shown a picture and told "this is a cat", humans already know what to look for. Even if a human has never seen a cat before, they will not, for example, examine the background of the photo, or the floor the cat is lying on. They will also instinctively derive analogies from similar animals they already know, and deduce lots of correct information about that "cat" without needing to be told explicitly.


Totally.

But it seems like all these ML models are great at image recognition but not behavior recognition.

What’s the state of the art with that currently?


And at the same time most manual forms of human labor.

Object recognition is AI. A decade ago we were piss poor at Object recog, today labeling AI is at human level. Being able to recognize your environment is the first step in navigating it effectively.


Good to know. Though thinking it thru, any unsupervised learning would have to have some reference points, be them hard codeded into the core system to ina form of supervised learnings. Otherwise the AI could recognise a cat but would know it by a completely different name and unless it had been given a good description of a cat to associate with such image forms or a picture with a label. Then it would never know what a cat was in the way we know them. Which may be a good or a bad thing. But if the AI referers to cats as 478912's then we would not know what it was on about and whilst it may be intellegent, it would be in a way that we would be unable to understand and relate too. Ironicly i suspect if you had a top end AI system and asked it what defined AI it may very well come back with the answer 42, which many would even understand, though no comprehend.

Fun feild of work still, I bet.


I just want to say that the human brain doesn't need to perform billions of matrix products to recognize a dog.

The best AI will always be a meaty human.


I saw a 2 Minute Papers video that talks about the paper [0] I linked below. Basically the AI segments up the human body in to pieces.

Using that data you could create a classifier for either pre-defined postures/positions or let it create its own classes (naive classification, I think?)

Source [1] is a different kind of segmentation.

I think your biggest obstacle will be training sets (assuming you want to use AI/ML). Once you've got training sets, the presence of obstacles (sheets/blankets) might not be so much of an issue.

[0] https://arxiv.org/abs/1808.07371 [0] https://www.youtube.com/watch?v=cEBgi6QYDhQ [1] https://www.mdpi.com/1424-8220/12/11/15376


To me, a fundamental question is "is this a problem?"

Humans have similar problems too. Our intelligence is trained by experience and evolution to operate within certain parameters. More concerning to me is the fact that a human can conceptualize "couch" from a single example. ML algorithims needs to see thousands of couches before they can classify them.


Interesting, care to elaborate?

For a lot of NN/ML applications, they magic ingredient is "humans" and a record of humans doing something enough times to describe statistically. AI sign recognition, sentence completion or checker playing is very often based on estimates of "what would a human do."

"Would a human say this photo contains cats", is really how a lot of ML interprets the question "where is my cat"?


> Aren't AI models already better at image recognition than humans?

On some benchmarks, AI models are better at very well defined tasks like image classification (“label this image from a set of 8 labels you’ve seen before”) or object detection (“draw a box around all instances of class X in this image, where X is a very narrowly defined class”) They’re not even close to being able to understand unscene examples and parse out their meaning in a larger context the way humans can. (“Recognize that this object in the road is a human riding some sort of bizarre unicycle he welded himself, then predict how he’s likely to move given the structure of his custom unicycle thing”)

The bottleneck in AVs isn’t “perception” in the sense of image classification and object detection, it’s deeper scene understanding and abstract reasoning.


Image recognition is AI.

Image recognition requires AI. It used to be believed that it was simple. A famous AI researcher in the 50's once sent a bunch of grad students to solve it over the summer. They then started to realize just how complex and impossible the task was.

60 years later, we have finally made general purpose learning algorithms, vaguely inspired by the brain, which are just powerful enough to do it. And because they are general purpose, they can also do many other things as well. Everything from speech recognition, to translating sentences, or even controlling robots. Image recognition is just one of many benchmarks that can be used to measure progress.


> As a counter anecdote, one can also say that just 10 years ago, no computer systems came close to human at facial recognition or scene recognition for still images. Today's AI systems can caption events in videos. [1]

These "basic" computer vision tasks seem so trivial compared to the actual AI that the original AI researchers famously assigned solving them to a grad student over summer.


This is human like intelligence as the resulting program can accomplish human like tasks of recognition without the need for context on what these images might mean.

That's a very limited subset of what I mean by "human like intelligence". And within that specific subset, yes, AI/ML can and have achieved "human level" results. But that same ML model can recognize cats in vectors of pixels doesn't know anything about falling down. It's never tripped, stumbled, fallen, skinned it's palms, and felt the pain and seen the blood that results. It's never know the embarrassment of hearing the other AI kids laughing at it for falling, or the shame of having it's AI parent shake it's head and look away after it fell down. It's never been in love with the pretty girl AI (or pretty boy AI) and been had to wonder "did he/she see me fall and bust my ass?"

Now giving a computer program some part of the experience of falling we could do. We could load the AI into a shell of some sort, and pack it with sensors: GPS receiver, accelerometers, ultrasonic distance detectors, cameras, vibration sensors, microphones, barometric pressure sensor, temperature detector, etc., and then shove it off a shelf. Now it would "know" something about what falling actually is. And that's what I mean by the need for experiental learning in a situated / embodied setting.

While it might be possible in principle to get that knowledge into a program in some other way, my suspicion is that it would be prohibitively difficult to the point of being effectively impossible.


Yes, though AI has been better than humans at those for some time. The real test now days is checking if the input "looks human" .

EDIT: I am watching the video now, and made my comment below before I started watching the video. I realize now that my comment is somewhat outside the scope of the video itself, but I'll leave the comment up anyway, because I think it reveals a similar conclusion as the video.

I'll take a stab at what seems to me to be an insurmountable problem for AI: vision. And not just vision, but "seeing". Pretend I am standing next to a table, and the AI successfully identifies me, and identifies a table next to me (which, in itself, is a very difficult, if not impossible, thing for AI to do right now). Now, let's say I sit on the table. Now, we ask the AI, "Am I sitting on a table, or sitting on a chair?"

Further to the point (and a less "human" scenario): Musk has been attempting for years to make a lights-out facility for building cars. As of yet, there are still many things that a robot cannot do, or a human can do faster and more reliably, in spite of throwing billions of dollars and millions of work-hours at the problem. Another example: shoe manufacturing robots. Another example: brick laying robots. Another example: pipe welding robots. None of those can even come close the a human's ability to adapt on the fly to small variations, or to learn new behaviors. AI sees the world in a very low-resolution way, and humans are generally unaware of how much "constructing of the world" our brains do compared to the input data it receives from our senses. Replicating this is going to take more than a GPT3, for example.

next

Legal | privacy