- Leaf fossils dominate the fossil record, but identification lags.
- Machine learning algorithm teaches computers to read cleared leaves.
- Computer vision heat maps teach scientists about botanical history.
If you find a new dinosaur the next time you stick a shovel in the dirt, you’ll be famous. But pity the paleobotanists — they find new leaf fossils every time they dig.
Lack of fame is the least of their problems, though. A central obstacle botanists face is the inability to identify all those fossils. Leaves are naturally complex, with an astounding variety of vein and shape patterns. Comprehensive knowledge and identification are virtually impossible.
This roadblock spells an incomplete evolutionary history of green life. Undaunted, a team of researchers looked to the tools of computer science to assist in the difficult task of decoding leaf information.
“We’re trying to put together a picture of the evolution of our green planet and the plants on which all our lives depend,” says Peter Wilf, a professor of geoscience at Penn State University.
In tandem with Brown University computational neuroscientist Thomas Serre and other colleagues, Wilf has trained a computer to teach itself to identify leaves. Partially funded by the US National Science Foundation, their breakthrough promises to open a new window into the last 135 million years since flowering plants (angiosperms) evolved and came to dominate the earth.
Consider your experience in the woods on any given day in the summer — you see a green world. And just as leaves dominate the landscape, they also dominate the fossil record.
But despite the abundance, attempts to classify and identify leaves constitute a history of errors. It wasn’t until the 1970s that scientists realized 19th century botanists had assigned fossilized leaves from millions of years ago to familiar, living groups (oak, magnolia, etc.) — thus misidentifying many of the leaves entirely.
“Computer vision will change the way we do science.” ~Thomas Serre
With over 250,000 species and 400 families of flowering plants, and each leaf with hundreds and thousands of vein intersections, the difficulty in identification quickly becomes enormous.
“The evolutionary history of so many plants is hidden,” says Wilf. “It’s sitting right there in the museum drawers, but we can’t access it because we don’t have enough tools to identify most of the fossil leaves.”
Teaching a computer to see
One day in 2007, Wilf came across an article that would change the course of his science. Thomas Serre had just perfected a computer vision program that trained a machine to recognize animals from assorted images. Wilf wondered if a similar approach could recognize plants.
“With computer vision and machine learning, we’re on the cusp of a technological revolution,” says Serre. “Computer vision is enabling the development of super-visual senses. I really believe that it will change the way we do science.”
The basic idea behind machine learning is that the computer learns to associate a sample to a category label — a leaf to a name in the angiosperm taxonomy, for instance. It teaches itself the diagnostic features of the category (oak family, maple family, etc.) and then uses those features to categorize unknown leaves.
Using the computers at the Center for Computation and Visualization, the team presented 7,597 cleared leaf images to the machine. Cleared leaves are specially prepared leaves — bleached or stained — to better see veins and intersections in the leaf body. Through this training, the machine learned to associate new images to class labels — and got it right 72% of the time — or about 15 times better than chance.
What was most surprising to the botanists on the team was that the computer was able to identify leaf images to the family taxonomic level. Family identification is the traditional first step for fossil leaves, which usually represent extinct species. Variation among the hundreds to thousands of species in a family is immense, and computerized identification of leaves above the species level had never been attempted before.
This initial success based on modern leaves gives the researchers great optimism for eventual computer-assisted identification of fossils.
“If you look at different species of plants within a family,” says Serre, “you end up with leaves that are so different and look to the untrained eye like arbitrary rearrangements. Computer vision is enabling us to see things that would be literally invisible to the naked eye.”
Not only have paleobotanists found a way to see the invisible, the computer is teaching overlooked relationships and connections to the botanists.
The computer generates heat maps with red spots indicating the tiny parts of the leaf it thinks identify a certain family.
“This gets really exciting because when you start looking at those maps, as a botanist I do see patterns,” says Wilf. “The computer vision tool has become this wonderful assistant for evolutionary botany and paleobotany and now it's teaching us.”
Don’t worry: the team's leaf reader will soon come to a smart phone near you. But more important to these scientists is the huge trove of data unlocked by computer vision and its application to fossil identification.
And because our entire existence is bound up with the fate of plants, the more we learn about what lives and what dies, the more we know about the world that we live in and how it could respond in the future.
“It’s going to help scientists and nonscientists understand the green world better. Our clothes, our medicines, a lot of our building materials, much of our oxygen, and almost everything that we eat — it comes from the flowering plants. This is the architecture of our world.”