One of several interactive workshops offered at Botany 2020 was on the digital Flora of North America (North of Mexico, to give it its full name; FNA for short), which can be found in beta here. The workshop functioned as both an introduction to the project and a primer on using it for semantic searches and information output in the form of taxon lists. Many of us are familiar with the print version of the FNA, a massive 30 volume series in the works since 1993, of which 21 have thus far been published. When complete, the project will treat more than 20,000 plant species – around 7% of the world’s total – including synonyms, identification keys, descriptions, ranges, illustrations, and more.
The weakness of printed floras, however, is that they can go out of date quickly, as scientific understanding of groups changes. Fern taxonomy, for example, has already changed significantly since the FNA began to be published. Enter FNA Online, a searchable repository that can be updated as needed to remain current. I spoke with Jocelyn Pender, a Biodiversity Data Manager for Agriculture and Agri-Food Canada and facilitator of the workshop, about the goals and challenges of the FNA Online project.
Two of the principal goals of the project are to remain current and to expand the user base by making taxonomic descriptions easily searchable by both humans and machines. “I believe the future of floras is digital and data-centric,” says Pender. “We are building the FNA Online with this in mind. We’d like to extend the utility of the FNA beyond its traditional user group of professional botanists, taxonomists, etc. towards a more expansive user group that includes educators, citizen scientists, hobby botanists, regulators, policymakers, horticulturalists, agronomists, ecologists, molecular biologists, phylogeneticists, etc. This means increasing the number of ways that users can interact with the content. Our vision includes interactive keys available in several expertise levels, on-the-fly checklists for regulators and educators, and downloadable taxon-character matrices for ecologists and molecular biologists.”
A major challenge facing the creators of the digital FNA is making the taxonomic descriptions, written in natural language by many different authors, machine readable so they can be easily searched and compared. Several aspects of natural language use, and of taxonomic descriptions in particular, make this a difficult task.
First, individual authors have unique description styles and use different vocabulary. “We face challenges in enabling comparison of parsed content across treatments,” explains Pender. “How can we develop an interactive key that allows users to filter plants to petal colour ‘red’ when one author described the petals as ‘fuchsia’ and the other as ‘maroon-auburn’? We’ve been working hard to develop synonymies for terms, but this is labour-intensive and prone to human error and incorrect inferences.”
Another hurdle lies in the complexity of botanical language. “One term can have two unique, non-overlapping meanings in two families,” says Pender. “Additionally, within some complex groups, there is no strong consensus among botanists on the meaning of words. Lastly, taxonomic descriptions use a particular style of sublanguage that is telegraphic; it omits inessential words that humans easily insert. Machines struggle to make inferences that connect phrases and ideas.”
For these and other reasons, the language parser generates ‘junk’ – nonsensical output of names or values that are difficult to work around and must be addressed in order for the search functions to be fully operational. To date, the team behind the digital flora have parsed all of the descriptions in all of the published volumes of the FNA, but are still working to improve the “cleanliness” and organization of the data. A Canadian team is also in the process of building a dedicated online Flora of Canada, which Pender envisions as “an evolving, data mash-up product, integrating specimen data, occurrence data, parsed trait data from various sources.”
If you’d like to give the FNA Online beta a try, the site offers a guide for composing various queries and output types. Pender hopes a wide variety of users will experiment with it. “[W]e’d love user groups and use cases to emerge that we haven’t yet imagined.”