Seeking a way of preventing audio models for AI machine learning from being fooled

Warnings have emerged about the unreliability of the metrics used to detect whether an audio perturbation designed to fool AI models can be perceived by humans

Artificial intelligence (AI) is increasingly based on machine learning models, trained using large datasets. Likewise, human-computer interaction is increasingly dependent on speech communication, mainly due to the remarkable performance of machine learning models in speech recognition tasks. CAPTION Jon Vadillo, in his office at the University of The Basque Country.  CREDIT Nagore Iraola. UPV/EHU

However, these models can be fooled by "adversarial" examples, in other words, inputs intentionally perturbed to produce a wrong prediction without the changes being noticed by humans. "Suppose we have a model that classifies audio (e.g. voice command recognition) and we want to deceive it, in other words, generate a perturbation that maliciously prevents the model from working properly. If a signal is heard properly, a person can notice whether a signal says 'yes', for example. When we add an adversarial perturbation we will still hear 'yes', but the model will start to hear 'no', or 'turn right' instead of left or any other command we don't want to execute," explained Jon Vadillo, a researcher in the UPV/EHU’s Department of Computer Science and Artificial Intelligence.

This could have "very serious implications at the level of applying these technologies to real-world or highly sensitive problems", added Vadillo. It remains unclear why this happens. Why would a model that behaves so intelligently suddenly stop working properly when it receives even slightly altered signals?

Deceiving the model by using an undetectable perturbation

“It is important to know whether a model or a program has vulnerabilities," added the researcher from the Faculty of Informatics. “Firstly, we investigate these vulnerabilities, to check that they exist and because that is the first step in eventually fixing them.” While much research has focused on the development of new techniques for generating adversarial perturbations, less attention has been paid to the aspects that determine whether these perturbations can be perceived by humans and what these aspects are like. This issue is important, as the adversarial perturbation strategies proposed only pose a threat if the perturbations cannot be detected by humans.

This study has investigated the extent to which the distortion metrics proposed in the literature for audio adversarial examples can reliably measure the human perception of perturbations. In an experiment in which 36 people evaluated adversarial examples or audio perturbations according to various factors, the researchers showed that "the metrics that are being used by convention in the literature are not completely robust or reliable. In other words, they do not adequately represent the auditory perception of humans; they may tell you that a perturbation cannot be detected, but then when we evaluate it with humans, it turns out to be detectable. So we want to issue a warning that due to the lack of reliability of these metrics, the study of these audio attacks is not being conducted very well," said the researcher.

In addition, the researchers have proposed a more robust evaluation method that is the outcome of the "analysis of certain properties or factors in the audio that are relevant when assessing detectability, for example, the parts of the audio in which a perturbation is most detectable". Even so, "this problem remains open because it is very difficult to come up with a mathematical metric that is capable of modeling auditory perception. Depending on the type of audio signal, different metrics will probably be required or different factors will need to be considered. Achieving general audio metrics that are representative is a complex task," concluded Vadillo.

Anomalo, Snowflake partner to help enterprises trust their data

Anomalo has announced a partnership with Snowflake to help customers trust the data they use to make decisions and build products. The combination provides customers with a way to monitor the quality of the data in any table in Snowflake’s platform without writing code, configuring rules, or setting thresholds.

Today’s modern data-powered organizations are using Snowflake’s platform to centralize all of their data and make it easily available for everything from business decision-making to predictive analytics and machine learning.

However, dashboards and data-powered products are only as good as the quality of the data that powers them. Many data-powered companies quickly encounter one unfortunate fact: much of their data is missing, stale, corrupt, or prone to unexpected and unwelcome changes. As a result, companies spend more time dealing with issues in their data rather than unlocking that data’s value.

Anomalo thus addresses the data quality problem by monitoring enterprise data and automatically detecting and root-causing data issues, allowing teams to resolve any hiccups with their data before making decisions, running operations, or powering models. Anomalo leverages machine learning to rapidly assess a wide range of data sets with minimal human input. If desired, enterprises can fine-tune Anomalo’s monitoring through the low-code configuration of metrics and validation rules. This is in contrast to legacy approaches to monitoring data quality that require extensive work writing data validation rules or setting limits and thresholds.

As a result, Snowflake customers can now begin monitoring the quality of their data with Anomalo in under five minutes. They simply connect Anomalo’s data quality platform to their Snowflake account and select the tables they wish to monitor. No further configuration or code is required.

Anomalo and Snowflake are used by customers globally:

  • Discover Financial Services is leveraging Anomalo to quickly gain trust in their most critical data. Discover’s Chief Data and Analytics Officer Keith Toney said: “Discover is transforming and expanding how we use data as an enterprise asset to serve our customers better through advanced data analytics. We were looking for a product that would help us maintain a scalable foundation of trusted data in a fast-paced digital environment. We selected Anomalo to fully automate the basis of our data quality monitoring because their machine learning and root cause detection technology identify late, missing, or anomalous data across our petabyte-scale cloud warehouse. Our data stewards use Anomalo’s intuitive UI to tailor monitoring to their business needs. Compared to legacy solutions, Anomalo will help us detect more quality issues with just a fraction of the time invested by our team.”
  • Faire uses Anomalo to monitor the most important tables in their Snowflake account. Daniele Perito, Chief Data Officer and co-founder at Faire, said: “We monitor hundreds of key tables in Snowflake’s platform with Anomalo. I sleep better at night knowing our data is more reliable, and my team loves how easy it is to use and how insightful the notifications are.”
  • Substack uses Anomalo to empower their small team to keep up with an ever-growing collection of data. Mike Cohen, Substack’s Data Manager, said: “With a small data team at Substack, the automated checks that Anomalo provides are like having another data engineer on the team whose primary focus is to ensure data quality and integrity. With these checks, we've caught internal data and production bugs and detected the presence of bad actors internal to our system that might have otherwise gone unnoticed for long periods.”

“Snowflake provides an ideal environment for tools like Anomalo. With its ability to centralize the full set of enterprise data and its unique ability to automatically size query workloads based on their priority and urgency, Snowflake is a perfect partner in helping enterprises trust all of their important data,” said Elliot Shmukler, co-founder and CEO of Anomalo.

“Anomalo offers an easy-to-use way to monitor every table in a customer’s Snowflake account for data quality issues," said Tarik Dwiek, Head of Technology Alliances at Snowflake. "We're excited to offer Snowflake customers the ability to leverage Anomalo to further build trust in the data they are using to develop products and make decisions.”

As part of today’s announcement, Anomalo is a Select Partner within the Snowflake Partner Program.

CfA astronomer Karen Collins uses ML to discover mysterious dusty object orbiting TIC 400799224

The Transiting Exoplanet Survey Satellite, TESS, was launched in 2018 to discover small planets around the Sun’s nearest neighbor stars. TESS has so far discovered 172 confirmed exoplanets and compiled a list of 4703 candidate exoplanets. Its sensitive camera takes images that span a huge field of view, more than twice the area of the constellation of Orion, and TESS has also assembled a TESS Input Catalog (TIC) with over 1 billion objects. Follow-up studies of TIC objects have found they result from stellar pulsations, shocks from supernovae, disintegrating planets, gravitational self-lensed binary stars, eclipsing triple star systems, disk occultations, and more. An optical/near-infrared image of the sky around the TESS Input Catalog (TIC) object TIC 400799224 (the crosshair marks the location of the object, and the width of the field of view is given in arcminutes). Astronomers have concluded that the mysterious periodic variations in the light from this object are caused by an orbiting body that periodically emits clouds of dust that occult the star.  Powell et al., 2021

The Center for Astrophysics | Harvard & Smithsonian (CfA) astronomer Karen Collins was a member of a large team that discovered the mysterious variable object TIC 400799224. They searched the Catalog using machine-learning-based computational tools developed from the observed behaviors of hundreds of thousands of known variable objects; the method has previously found disintegrating planets and bodies that are emitting dust, for example. The unusual source TIC 400799224 was spotted serendipitously because of its rapid drop in brightness, by nearly 25% in just a few four hours, followed by several sharp brightness variations that could each be interpreted as an eclipse.

The astronomers studied TIC 400799224 with a variety of facilities including some that have been mapping the sky for longer than TESS has been operating. They found that the object is probably a binary star system and that one of the stars pulsates with 19.77 days, probably from an orbiting body that periodically emits clouds of dust that occult the star. But while the periodicity is strict, the dust occultations of the star are erratic in their shapes, depths, and durations, and are detectable (at least from the ground) only about one-third of the time or less. The nature of the orbiting body itself is puzzling because the quantity of dust emitted is large; if it were produced by the disintegration of an object like the asteroid Ceres in our solar system, it would survive only about eight thousand years before disappearing. Yet remarkably, over the six years that this object has been observed, the periodicity has remained strict and the object emitting the dust has remained intact. The team plans to continue monitoring the object and to incorporate historical observations of the sky to try to determine its variations over many decades.