Digitizing Efforts to Date: Impetus for a Large Lepidopteran TCN

In the process of assessing TCNs focused on extant arthropods (Tri-Trophic, SCAN, & InvertNet), we reviewed all data for digitized arthropod specimens (1850-2014) in North American research collections [31]. Among the 225 museums that have significant arthropod holdings (>20,000 specimens), only 6 million specimen records of arthropods have been digitized and made publicly available (Figure 2). The current rate of digitization (i.e., transcribing specimen label data) with ADBC funding is about 2 million specimens per year. We estimate that 3-4 million specimens per year are being added to the existing 250 million specimens housed in North American collections. Thus, increased effort is needed to digitize 10% of existing research collections by 2050.

A TCN project based on the largest group of herbivores is an excellent model to lead this effort and address what is a grand challenge for museum digitization. There are there are 465,000 North American lepidopteran museum records publicly available, with specimens from the USA (62%) being the best represented, followed by Canada (25%) and Mexico (3%). UNAM, the largest collection for Mexico has over 3 million curated, paper specimens being digitized and will soon be releasing an additional 88,000 digitized records. In preparation for our proposed project, we have already assembled 287,364 lepidopteran records and 4,504 images from direct connections to museums to our data portalSymbiota Collections of Arthropods Network (SCAN). Additionally, all 29 LepNet museums have established collections within the SCAN database portal that will serve to house LepNet data.

An unusually large source of information that can complement these museum records in creating research data sets is already available for Lepidoptera, with an estimated 600,000 observational records of Lepidoptera with vetted coordinate data available for integration with LepNet specimen data to be used in biodiversity research. In particular, the Butterfly and Moth Information Network (BAMIN) aggregates both butterfly and moth observation data [32, 33]. We are specifically integrating these data into the LepNet portal (see Section 7). The Moth Photographers Group (MPG) [34]is the primary resource for identification of North American moths, currently hosts images of 5,557 species, spread images of 9,862 species, larval images of 1,203 species, and more than 500 images of diagnostic genitalia. The Pacific Northwest Moths[35]serves as a standard for providing high quality data products, including maps and background biology. The Lepidoptera Barcode of Life [36]provides DNA barcodes for >22,000 species with an active North American campaign. The recently funded NSF DEB GoLife ButterflyNet project will yield an extensive worldwide phylogeny, and centralize butterfly trait data through the nascent ButterflyNet dashboard. However, ButterflyNet will not digitize specimens in museums, and specimen data from LepNet will capture museum-specimen trait data that can be ingested into ButterflyNet and vice versa. Although these initiatives and virtual environments are individually highly valuable and jointly reflect the importance of Lepidoptera, they critically lack integration and full, research-facilitating accessibility of vouchered species occurrence data. Existing information from our large research collections needs to be synthesized and integrated in order to obtain broad-scale taxonomic and geographic coverage for this mega-diverse lineage, and facilitate comparative analyses based on a scalable network solution. This is the goal of LepNet.

Figure 2. Estimated number of specimens in North American museums. The blue and red columns represent all specimens and those collected in North America, respectively. Nested green columns represent Lepidoptera.

