sciencehabit shares a report from Science Magazine: A hacked message in a streamed song makes Alexa send money to a foreign entity. A self-driving car crashes after a prankster strategically places stickers on a stop sign so the car misinterprets it as a speed limit sign. Fortunately these haven’t happened yet, but hacks like this, sometimes called adversarial attacks, could become commonplace — unless artificial intelligence (AI) finds a way to outsmart them. Now, researchers have found a new way to give AI a defensive edge. The work could not only protect the public. It also helps reveal why AI, notoriously difficult to understand, falls victim to such attacks in the first place. Because some AIs are too smart for their own good, spotting patterns in images that humans can’t, they are vulnerable to those patterns and need to be trained with that in mind, the research suggests.
To identify this vulnerability, researchers created a special set of training data: images that look to us like one thing, but look to AI like another — a picture of a dog, for example, that, on close examination by a computer, has catlike fur. Then the team mislabeled the pictures — calling the dog picture an image of a cat, for example — and trained an algorithm to learn the labels. Once the AI had learned to see dogs with subtle cat features as cats, they tested it by asking it to recognize fresh, unmodified images. Even though the AI had been trained in this odd way, it could correctly identify actual dogs, cats, and so on nearly half the time. In essence, it had learned to match the subtle features with labels, whatever the obvious features. The training experiment suggests AIs use two types of features: obvious, macro ones like ears and tails that people recognize, and micro ones that we can only guess at. It further suggests adversarial attacks aren’t just confusing an AI with meaningless tweaks to an image. In those tweaks, the AI is smartly seeing traces of something else. An AI might see a stop sign as a speed limit sign, for example, because something about the stickers actually makes it subtly resemble a speed limit sign in a way that humans are too oblivious to comprehend. Engineers could change the way they train AI to help outsmart adversarial attacks. When the researchers trained an algorithm on images without the subtle features, “their image recognition software was fooled by adversarial attacks only 50% of the time,” reports Science Magazine. “That compares with a 95% rate of vulnerability when the AI was trained on images with both obvious and subtle patterns.”
Read more of this story at Slashdot.