Digitization & Imaging Processes and Products
Eighty-eight percent of our proposed budget is devoted to digitizing specimen labels and imaging specimens, while 12% of the budget provides for logistical/technical support, informatics development, custom data sets, travel, education, and outreach. LepNet protocols are set, ready to be implemented, and represent extensions of existing, successful digitization pipelines and socio-cultural networks (e.g., SCAN, Tri-Trophic), and new additions of protocols developed at leading institutions (e.g., Yale Peabody, Harvard MCZ). During the initial LepNet “all-hands” meeting to be held at ASU in August, 2016, we will finalize the implementation of these protocols to ensure all collaborators can comply . We are prepared for technological advances that may emerge and increase our capacity to digitize and incorporate data into research-ready datasets (e.g., Beyond the Box II, OCR technology) .
A. Digitizing Specimen Label Data
A.1 Digitizing Specimens de Novo. We will capture label data from 2,082,878 pinned adult specimens in 29 research collections (see Figure 1 for individual museum contributions), using trained students, collection staff, and supervised volunteers. Seven member collections will digitize 43,280 larval vials employing a high throughput workflow modified from scanning protocols by LepNet member Zaspel (Purdue). This will be the first concerted effort to digitize North American lepidopteran larvae, which are particularly valuable since 95% of larval records also have host plant data. Of the 3 million lepidopteran records currently in GBIF, only three of these are from larvae! Although most adult specimens are not reared from host plants, we expect they will provide an additional 40,000 host records.
We set a network-wide rate of $1.00 per specimen (direct cost) to capture all specimen label data, georeference records, and obtain > 90% species-level identifications. Our cost rate includes all training and supervisory costs as well as project-specific curation required during the project and the cost to annotate records throughout the project period. The cost rate is based on data from two existing TCNs (SCAN, Tri-Trophic) with over three years of experience digitizing all arthropod taxa, as well as assessment data collected at Yale that included 263,000 Lepidoptera specimens digitized by scores of students over several years. For these projects, specimen labels were transcribed directly by students, volunteers, or technicians. An estimated 78% of LepNet specimens are identified to species, yet <10% of labels have latitude-longitude coordinates. Thus, having effective workflows is a priority. Georeferencing in LepNet will follow best practices developed through iDigBio. Most georeferencing will be performed on-line through Symbiota-GeoLocate workflows. We have protocols for using the georeferenced records in iDigBio, which greatly enhances our ability to assign coordinates for even the most obscure locations.
Certain collections image specimen labels as part of their digitization process, driven partly or wholly by external volunteers. While Symbiota fully supports this workflow from skeletal records to crowd-sourced annotations, analyses of the SCAN and Tri-Trophic TCN projects indicate that the cost of digitization doubled when imaging labels was included. Applying OCR to insect labels is developing but remains highly challenging without human application because of the small size of labels, the extensive use of context-contingent abbreviations, and in particular the handwritten format of many labels. Human error can occur in both processes, and in each case specimens can be re-examined when data validity concerns arise. LepNet Museums can use either method provided they meet their respective goals and cost rate.
A.2 Previously Digitized Specimens. Nine LepNet collections have been integrating >287,000 fully digitized Lepidoptera data records into the SCAN portal. Two collections will contribute 73,000 pinned specimen records to LepNet from existing spreadsheets (Oklahoma, NAU). The records only need to be georeferenced. Through separate funding, we are also coordinating with butterfliesandmoths., which operates an online web portal for Lepidoptera observation projects. These databases will initially provide 447,696 vetted observational records to the LepNet portal, and we expect an additional 100,000 records during the project period. These data will augment museum data for developing ecological filters in LepSnap (see below) and ecological niche modeling (Section 8).
A.3 Taxonomic Thesaurus. A LepNet working group will oversee the maintenance of the primary reference classification for Lepidoptera. We will follow the system of Hodges et al. (1983) , as modified by Brown, Lafontaine & Schmidt, Lee et al., Warren et al. (2014) [49-54], and others, with necessary adjustments that include valid and synonymous names. Symbiota can represent alternative classification through multiple assignments of valid/invalid name statuses.
Lepidoptera are very well suited for imaging because wing shape and coloration provide much of what is needed for identification. We will produce a total of 255,000 specimen images (see Figure 1 for individual museum contributions). Nineteen institutions will produce 95,170 high-resolution, species-exemplar image suites (male and female, ventral and dorsal) using imaging systems already present at each collection (McGuire Center and CSU will purchase imaging systems). Members of LepNet have agreed upon 15 imaging standards (Table 1). Collections will load images into the LepNet portal via individual and bulk upload features of Symbiota. We will image exemplar specimens for all species digitized. These whole-body images will be high enough resolution to characterize the morphology of individual scales . Our high-resolution imaging systems can make automontage images of important specimens (type status,
historical value, key locality, rare), that are not in perfect condition and require dedicated high-end image systems. These systems will also facilitate detailed images for select taxa that need images of non-wing structures (e.g., tympana, legs, genitalia) to create comprehensive identification guides and publications. The LepNet portal will maintain a master taxon list – noting key metrics of portal images (e.g., institution, locations, sex) – to promote collaborative efforts across institutions. All images will be used for automated taxa identification (LepSnap), geographic documentation for research, and outreach.
Table 1. Imaging Protocol (The 15 standards adopted by LepNet)
1. A dorsal and ventral image of each specimen is required for some Lepidoptera. See Guide to knowing when to image the ventral surface of Lepidopteran wings
2. The specimen should fill most of the image frame
3. Only a dorsal image is required for smaller micros (e.g., Gelechiidae)
4. Dorsal and ventral views can be separated or combined in a composite image
5. The dorsal view is above the ventral view in composite images.
6. Calibration for size is required in each image (e.g., insert scale bar)
7. Calibration for color is NOT required, having an 18% gray card for white balance correction is required. You may also use a color chart card in each image (e.g., insert X-rite card)
8. Image background should be a neutral gray
9. Use indirect and/or diffused lighting to reduce shadow effects
10. Images should be at least 6000 pixels on the long dimension
11. Save a TIF format and create a JPG version for web
12. A fully populated database record is required for each specimen image and a unique identifier (e.g., catalog number) should be in the image
13. Images of the labels for a specimen are not required, but can be incorporated
14. Only some specimens should necessitate image stacking (e.g., wing profile other than flat)
15. EXIF metadata includes unique identifier, institution, and other basic data
31. Cobb, N., K.C. Seltmann, and N.M. Franz, The current state of arthropod biodiversity data: Addressing impacts of global change. 2014.
47. Aibs, Beyond the Box Digitization Competition. 2014.
48. Hodge, R.W., et al., Check list of the Lepidoptera of America North of Mexico, including Greenland. 1983: EW Classey Ltd. 1986-1986.
49. Brown, J.W., World catalogue of insects. Volume 5: Tortricidae (Lepidoptera). 2005: Apollo Books. 741-741.
50. Lafontaine, J.D. and B.C. Schmidt, Annotated check list of the Noctuoidea (Insecta, Lepidoptera) of North America north of Mexico. ZooKeys, 2010. 40: p. 1-239.
51. Lafontaine, J.D. and B.C. Schmidt, Additions and corrections to the check list of the Noctuoidea (Insecta, Lepidoptera) of North America north of Mexico.ZooKeys, 2011. 149: p. 145-161.
52. Lafontaine, J.D. and B.C. Schmidt, Additions and corrections to the check list of the Noctuoidea (Insecta, Lepidoptera) of North America north of Mexico.ZooKeys, 2013(264): p. 227-227.
53. Lee, S., R.W. Hodges, and R.L. Brown, Checklist of Gelechiidae (Lepidoptera) in America north of Mexico. Zootaxa, 2009: p. 1-39.
54. Warren, A.D., et al., Interactive listing of American butterflies. 2014.
55. McMillan, W.O., A. Monteriro, and D.D. Kapan, Development and evolution on the wing. Trends in ecology & evolution, 2002. 17.3: p. 125-133.