This week’s list of data news highlights covers July 28 – August 3, 2018, and includes articles about a community using air-quality sensors to fight back against polluters and an AI system that can predict the toxicity of chemicals.
1. Using AI to Fix Wikipedia’s Gender Problem
San Francisco startup Primer has developed software called Quicksilver that can help Wikipedia editors identify gaps in the encyclopedia, particularly to address the underrepresentation of women in science. Only 18 percent of Wikipedia’s biographies are of women, likely due in part to the fact that between 84 and 90 percent of Wikipedia editors are male, and pages for women in science are frequently missing. Quicksilver uses machine learning to analyze news articles and citations in research papers to identify notable scientists that do not have Wikipedia pages and generates entries for them, including citations.
2. Helping Self-Driving Cars Fit In by Making Them Stand Out
Autonomous vehicle startup Drive.ai has launched a ride-hailing pilot in Frisco, Texas, for its self-driving cars that are designed to better communicate with pedestrians and other drivers. Drive.ai’s vehicles have four LED screens around the car that display contextually relevant messages to people nearby, such as “waiting for you to cross” when it is stopped at a crosswalk, or “human driver” when in manual mode.
3. Advocating for Better Air Quality with Low-Cost Sensors
Members of Brandywine, an unincorporated community in Maryland in close proximity to large industrial plants, are using low-cost air quality sensors to gather the data they need to advocate for better air quality. The approval process for fossil fuel power plants does not require an assessment of air quality data before construction and by 2019, there will be three large plants within 2.9 miles of Brandywine, which residents believe threatens public health. Air quality sensors that meet federal regulations that the Environmental Protection Agency or the state would consider cost $100,000, so the community partnered with a scientific organization called the Thriving Earth Exchange to use $260 sensors that can gather the preliminary data they hope can use to advocate for an environmental health assessment and potentially a moratorium on building new plants.
4. Learning to Rotate a Cube in Just 100 Years
AI research nonprofit OpenAI has developed a robotic system called Dactyl that uses a robotic hand, a video camera, and AI to teach itself to manipulate a cube entirely on its own. Dactyl uses a technique called reinforcement learning that allows it to figure out how to solve problems through trial and error. After 100 years of virtual training time, the robot could rotate a cube successfully 13 out of 50 times, and though this is a high error rate, it is impressive for a system that learned entirely on its own.
5. Analyzing Data by Bending Light
Researchers at the University of California, Los Angeles, have developed a method for creating what they call a Diffractive Deep Neural Network, which relies on 3D printed physical constructs that diffract light modeled after how layers in an artificial neural network process data. As an artificial neural network trains on data, its layers gradually become more optimized to more efficiently and effectively perform a particular calculation. The researchers mimicked this process by 3D printing layers of transparent material with complex patterns that bend light in calculated ways so that after several layers, the layers “process” the light as if it were a complex function.
6. Studying Chemical Toxicity with AI
Researchers at Johns Hopkins University have developed a machine learning system that can link the molecular structure of molecules and their potential hazards, including their toxicity. The process of linking molecular structure and biological activity is known as “read-across” however it requires expert analysis and is only narrowly useful, making testing the toxicity of different chemicals on animals, which is time-consuming, costly, and unreliable, standard practice.The researchers trained their system on hundreds of thousands of toxicity studies and then analyzed thousands of chemicals detailed in public scientific datasets. The system can correlate molecular structure with 74 types of hazard, such as skin irritation, mutation-causing potential, and harm to the ozone layer.
Carnegie Mellon University roboticists are renting Airbnbs to serve as unfamiliar training environments for robotic systems attempting to learn to manipulate objects. Researchers frequently train robotic systems to grasp and manipulate in standardized environments as environments with varied backgrounds, lighting, and textures can make it challenging for narrowly-trained AI systems to interpret their surroundings. To overcome this challenge, the roboticists are renting Airbnbs with different styles of floors and having their systems attempt to locate and grasp objects on the ground. In tests, a system trained in the varied Airbnb environments could grasp a novel object in an unfamiliar environment 62 percent of the time, while a lab-trained system could only grasp an object 18.5 percent of the time.
8. Reducing the African Technology Skills Gap
Google and Facebook have partnered with the African Institute for Mathematical Sciences (AIMS), headquartered in South Africa, to launched the first dedicated master’s degree program for machine intelligence in Africa. The program will be free to qualified participants and will emphasize researching and using machine learning to solve problems relevant to Africa, including improving governance and strengthening economies. AIMS will begin offering courses for the program in September 2018.
9. Revealing Medical Insights in Medieval Texts
Researchers at the University of Pennsylvania and the University of Warwick used data analytics to scrutinize apothecarial recipes from a 15th century manuscript called Lylye of Medicynes and found that they stood up to modern medical scrutiny. The text contains 360 recipes to treat 113 different conditions using over 3,000 different ingredients. The researchers mapped the networks between different recipes to reveal common linkages between the use of particular ingredients and different treatments, and then used an algorithm to identify clusters of ingredients commonly used together. By comparing these ingredient clusters to modern medical literature, the researchers found that many of these groupings had valid medicinal uses, such as antibiotic or disinfectant properties.
10. Recreating Sounds from Silent Video
Researchers at Microsoft, Adobe, and the Massachusetts Institute of Technology have developed an algorithm that can analyze silent video footage and recreate its audio, including intelligible speech. The algorithm analyzes the vibrations in objects in videos caused by sound waves, such as the subtle movements of a potato chip bag or the surface of a glass of water. The algorithm works best with high-speed footage shot at 2,000 to 6,000 frames per second (fps) but the researchers also had success using a regular digital camera that shot at 60 fps.
Image: The Medieval Cookbook.