Data Management Plan

Symbiota Application: LepNet data will be aggregated for dissemination using an instance of the Symbiota database and portal software (http://symbiota.org). Symbiota is presently the most widely used database for TCNs and provides a host of tools for immediate utilization and analysis. Modifications will be available, as is all of Symbiota, on GitHub for community reuse (currently on SourceForge).

LepSnap Application: The LepSnap project will collaborate with the Visipedia project to produce a web interface extension for Symbiota and an Android and iPhone mobile app which will interface with the Visipedia algorithms to identify unidentified material and expand their reference library to further build identification capabilities. The LepSnap apps will facilitate rapid production of high-quality images and efficient delivery of those images and their associated Audubon Core formatted data to the LepNet portal and Visipedia reference library. The web interface extension to the Symbiota portal software will be available to a large number of museums and projects, allowing for the adoption of this workflow to our museum community.

Principal Data Types: The principal data types collected by content providers will be occurrence records from observations and museum specimens, and images from museum specimens. Observation data harvested from aggregators will be stored in Symbiota as observations.

Data Standards: The model for occurrence records in Symbiota is well tested for both observations and specimen data. Metadata follows present best practices and includes Darwin Core (http://terms.tdwg.org/wiki/Darwin_Core) for occurrence metadata and Audubon Core (http://terms.tdwg.org/wiki/Audubon_Core) for image metadata. LepNet will provide data using an extended version of the Darwin Core Archive protocols, which is intended to include detailed information on specimen interactions (see Project Description, Section 9 for details).

Data Aggregation: We will work with content providers individually to integrate the new metadata methods at their home institutions, and to encourage adoption of the semantically enabled specimen interaction relationships.

Physical and Cyber Resources: In most cases, LepNet (via SCAN) will serve as the primary repository of LepNet data, although some home institutions using a database other than Symbiota will remain the primary repository of their data and LepNet will serve as an aggregator of these data, as well as a resource for dissemination and metadata augmentation. LepNet will be a mirror of those data. LepNet will be a customized view of thematically relevant data (Lepidoptera) stored within the SCAN (http://symbiota4.acis.ufl.edu/scan/portal/index.php) database. LepNet / SCAN is an instance of Symbiota hosted at the University of Florida by iDigBio. The University of Florida / iDigBio has committed to supporting Symbiota, SCAN and LepNet. LepNet will act as a gateway to sharing data with iDigBio and EOL. Northern Arizona University will provide use of biodiversity server and cluster computing for researchers that want to explore ecological niche modeling and do not have access elsewhere.

Dissemination Methods: Data and images will be made available through these methods; a publicly available Darwin Core Archive (DwC), downloadable dataset files (CSV). All LepNet participants have established collections in the data portal (SCAN), 15 museums will enter data directly into SCAN, 10 museums will initially use other databases and batch upload their data into the SCAN portal, and make whole copies of their data available to aggregators (e.g., iDigBio) through DwC archive.  Four institutions with IPTs will serve data to SCAN and directly to aggregators via their IPT.  By default, Symbiota redacts locality details from public access and publishing, though collection managers and other approved users will maintain full access to this information while logged in on the SCAN portal.

User Management, Access Control: The central LepNet repository will make use of the already existing user management and access control built into Symbiota. This will insure that only approved users have the ability to modify database records, although Symbiota also has the added ability of keeping track of editorial changes. Monitoring of imaging and data processing activities at each collaborating collection will be the responsibility of that institution under the supervision of the project data manager. We envision that most editing of specimen records will be done through the LepNet repository, however, some collections may choose to perform data management within local database system tools.

Data Use Tracking: Data-use tracking will be provided by the Symbiota web portal’s Google analytics extensions. Statistics for each collection will show the number of searches against their records as well as number of downloads. General web site access will be tracked using Google analytics.

Data Quality Control and Assurance: Quality Assurance (QA) and Quality Control (QC) will be implemented at all nodes in as standard a way as possible, with an understanding that there can be more than one method for achieving results. Data QA/QC will occur at three levels, the local repository (various database tools), the LepNet repository (Symbiota) and Filtered Push (http://wiki.filteredpush.org/wiki/). Symbiota’s use of controlled taxonomy tables and automatic population of data entry forms for people, entities and locations from the database will be used to reduce errors in these parameters. Once an entity, person or location is added to the database and approved by a collection’s manager, these data can be offered to data editors as pull-down selections, as can taxon information and localities. The LepNet project website http://scan-all-bugs.org/ will host how-to-guides, recorded webinars, and network policy papers.

Data access and Usage Rights: By default, the Creative Commons 4.0 CC0 “No Rights Reserved” Public Domain license (http://creativecommons.org/publicdomain/) is applied to all LepNet data types (records, datasets and images) shared through the portal. Some collections may choose to use more restrictive licenses, but all collections agree to make their data publicly available for non-profit use through the Creative Commons (CC BY-NC-SA 4.0) license http://creativecommons.org/licenses/by-nc-sa/4.0/) or less restrictive. All efforts are made to retain institution, project, and individual attribution information with these records. All LepNet collaborators understand and accept this data policy. The LepNet database records are available for public download, and the new Symbiota Custom Research Datasets and Data Blitz are available for public use, once a login has been created.

Roles and Responsibilities: Cobb and Brandt are dually responsible for communicating to contributing institutions and iDigBio. They are responsible for maintaining appropriate taxonomy, training data entry persons from all institutions to use the new interfaces effectively, resolving taxonomic conflict, and general quality monitoring of the records, and reformatting legacy specimen interaction data to fit the new model. Caitlin Chapman and Anne Barber will assist Brandt with Symbiota modifications.

Contingency Plan: This is an ongoing digitization effort, with dedicated institutional support at Northern Arizona University (NAU) and Arizona State University (ASU). Both institutions are committed to providing support for SCAN and LepNet.

 

Comments are closed.