Everyone knows that it is important to have the right tools for the job, whether collecting plants in the field for scientific research or cooking them in the kitchen for dinner. The same is true for digitisation: converting physical objects into data and images which can be stored digitally. In recent years, there has been a push for natural history and museum collections to be digitised. Biological museum collections hold organisms gathered by explorers, some centuries old, from around the world. While some of these curiosities may be displayed for the public in exhibitions, the vast majority will be waiting behind the scenes, for their moment in the spotlight.
Digitisation offers an opportunity to thoroughly catalogue each artefact, any information that comes with it, such as labels relating to its origin and identity. Digitally documenting a collection of physical objects in this way is a safeguard against catastrophic events, such as the fire that decimated Brazil’s National Museum in 2018, or objects going missing, as the British Museum experienced in 2023. Aside from the collection oversight this allows, digitisation makes collections accessible to those who cannot physically travel to see them and creates new opportunities for research and art on a scale that is impractical without a repository of digital data and images. So how does one go about digitising a botanical collection?
Botany One spoke to Dr Heather Cole, Biodiversity Data Manager for Agriculture and Agri-Food Canada’s (AAFC) Biological Collections. Cole is actively involved in promoting and supporting the digitisation of AAFC’s collections and bio-resources, including developing and implementing standards for image and data capture, storage, and sharing. During the six-year Biological Collections and Data Mobilization project, Cole and her team photographed up to 2,000 herbarium specimens per day, amounting to over 600,000 specimens imaged during the course of the project.
Digitisation often starts with a deceptively simple question: how do you capture the specimen in the first place? What equipment and software did you use for digitisation?
I consider digitisation in two broad categories: imaging and data transcription. At DAO (AAFC’s National Collection of Vascular Plants), we use two different setups for imaging herbarium specimens. One is a high-throughput conveyor belt system made by Bioshare Digitization, which includes integrated software to automate the conveyor and camera. It uses a 50mm macro lens with either a Canon EOS 5DS R or a Nikon Z7 II camera.
AAFC’s National Collection of Vascular Plant’s high-throughput conveyor belt
The other process, based on the New York Botanical Garden’s model, is a manual “lightbox” approach. It uses a camera shooting through a hole in the top of a table-top box, which illuminates the specimen from above and three sides. For the lightboxes, we use Canon EOS 5D Mark II/III cameras with a 50mm macro lens mounted on a copy stand. Similar setups can be replicated with good surround lighting. Currently, the DAO herbarium uses Specify as its collection management system. Whenever possible, data from specimens is transcribed directly into the database; if that is not possible, we use spreadsheets, following Darwin Core standards for data structure.

But taking the image is only the beginning. Once specimens have been photographed or scanned, the real value of the digital collection depends on how the data are structured, checked, linked and made reusable. How did you manage the data, including metadata, and quality assurance?
These aspects of digitisation are challenging. They take tools, expertise and time, which are not always available, and it can be hard to find the balance of integrating existing resources with future or ideal states. For our imaging processes, we do a colour calibration at each workstation to ensure that specimen colours are as close to realistic as possible, but to streamline our workflows, we do not adjust the camera settings for each different specimen. A visual review of each image can help detect quality issues. We embed AAFC ownership, copyright, and licensing into the metadata of each image file, which also includes the camera settings used for the picture, as well as the date and the name of the person logged onto the machine. For data transcription, we provide training on how to interpret label data and spot-checks are performed by senior technicians. When importing data from spreadsheets into our database, quality checks include reviewing spelling for taxonomy and geography, as well as formatting for dates and geographical coordinates. With specimen data, we record the name of the person who transcribed the data, the date of transcription, and the relevant protocol or project. In our current citizen science workflows, three different volunteers transcribe information from each specimen, which significantly improves confidence in the data quality.

Fluent in both English and French, Cole is enthusiastic about advancing data-driven decision-making and fostering a culture of innovation within her group. What is one thing you wish you had known before you started digitising your collection?
Don’t use spreadsheets for data transcription! It may seem like the simplest solution, but unless you have dedicated people and software for reviewing, cleaning, and importing data into a database, it quickly becomes a problem. There are many free collection management software systems available, and even a general-purpose database can often be configured to suit your needs. Even though choosing, setting up, and learning to use a proper system may take longer initially, the benefits are enormous. Purpose-built databases or collection management systems offer more reliable, consistent, and scalable data management functionality, making it easier to maintain and access your data as your digitisation efforts grow.

With all this experience to hand, Cole advocates for best practices for digital data management and has implemented successful citizen science projects which continue to enhance access to AAFC’s biodiversity information. Her efforts in managing copyright and licensing considerations have further advanced responsible stewardship of AAFC’s data assets. Furthermore, her commitment to Open Data and Open Science initiatives reflects her dedication to transparency and collaboration in scientific research.
Cole’s advice brings digitisation back to its foundations: good images matter, but good systems matter just as much. A successful digitisation project needs equipment that fits the collection, data standards that make records reusable, and workflows that can keep growing without collapsing into a maze of spreadsheets. Done well, digitisation is not just about photographing specimens. It is about making collections easier to manage, share, study and protect for the long term.
Guest Writer Profile
Magda (she/her) is a British/Polish ecologist currently based in London, UK. Trained as a field ecologist, she made the pivot to herbarium digitisation and curation in 2023. This has taken her from Royal Botanic Gardens Kew, to the South London Botanical Institute, and she will be joining Trinity College Dublin this autumn.
Cover image: A specimen of Dombeya burgessiae from Kew’s herbarium that underwent digitisation (http://specimens.kew.org/herbarium/K004979670)
