Five years ago, researchers at the Johns Hopkins University Applied Physics Laboratory (APL), in Laurel, Maryland, created the Brain Observatory Storage Service & Database (BossDB) — a scalable, cloud-native data ecosystem for high-resolution volumetric neuroscience datasets — with an eye towards making data open, accessible and easy to use.
Now, an expanded collaboration between BossDB and the Amazon Web Services (AWS) Open Data Sponsorship Program will further enable the storage and accessibility of ever-expanding neuroimaging datasets generated by the neuroscience research community.
“This relationship will catalyze the democratization of data access and accelerate scientific exploration by researchers and members of the public,” said Will Gray-Roncal, a co-investigator in APL’s Research and Exploratory Development Department.
Sandy Hider, the lead APL developer for the BossDB team, agreed. “With this collaboration, APL hopes to provide an avenue for more individuals and organizations to participate in creative research in neuroscience, with potential downstream benefits to us all.”
BossDB was initially developed to facilitate data sharing as part of the Machine Intelligence from Cortical Networks (MICrONS) program, funded by the Intelligence Advanced Research Projects Activity (IARPA) to reverse-engineer brain algorithms. The team is currently supported by the National Institutes of Health (NIH) as part of its BRAIN Initiative Informatics Program, created to build communities and infrastructure around shared data.
The BossDB ecosystem was designed with scale in mind to support increasingly larger and larger contiguous electron microscopy (EM) datasets. It lives in the AWS ecosystem and utilizes numerous AWS resources and server-less components such as S3, DynamoDB, Lambda, and SQS that enable high ingest speeds, as well as a variety of on-demand data access tools to support visualization, image processing and annotation, and analysis.
Integration with community tools and resources has been a key enabler for data sharing and follow-on discoveries. To facilitate collaboration, BossDB provides a scalable Application Program Interface (API) and python-based software development kit (SDK) called intern. It also offers data visualization through tools like Neuroglancer and syGlass, and leverages the Scalable Analytics for Brain Exploration Research analytics platform for image processing and annotation.
“With the cloud-based ecosystem provided by BossDB, we can easily work with teams over distributed regions to share insights and collaboratively process data to accelerate scientific discovery,” said Eva Dyer, an assistant professor in the Department of Biomedical Engineering at Georgia Tech, and director of the Neural Data Science (NerDS) Lab.
The technology underlying the BossDB ecosystem originated as part of the NeuroData project in a collaboration with Joshua Vogelstein and Randal Burns, researchers at the Johns Hopkins Whiting School of Engineering. Since that time, the data stored within BossDB has tripled in size, and continues to grow at a rapid pace.
“With the support of computer scientists, engineers, and neuroscientists at APL, BossDB currently hosts over 10 petavoxels of data consisting of dozens of public and private datasets, including large amounts of complex multidimensional data from over 30 collaborators,” said Brock Wester, the principal investigator for the APL BossDB team.
“This enables anyone with internet access to visualize image data from different technologies to generate hypotheses or plan new experiments,” he continued. “If investigators wanted to download data or code, they are able to access and analyze disparate data with the same functionality and syntax, which allows for faster comparisons and scientific discoveries.”
“The power of the BossDB ecosystem is the diversity of our community datasets and passion of our scientists – all leveraged within a common ecosystem,” said APL’s Jordan Matelsky, a ”big data” computational neuroscientist working on the team.
BossDB currently supports dozens of geographically distributed academic partners in the neuroscience community, and hosts data from a wide variety of imaging modalities, including X‑ray, MRI, light microscopy, and electron microscopy. The BossDB technology enables the community to take part in new research on shared data, such as:
- Investigating circuit mechanisms underlying locomotor behavior through a synapse-resolution dataset of the ventral nerve cord of an adult female Drosophila melanogaster(https://bossdb.org/project/phelps_hildebrand_graham2021)
- Composing eight high-resolution electron microscopy volumes of C. elegans brains at different stages of development, spanning from birth to adulthood (https://bossdb.org/project/witvliet2020)
- Leveraging X-ray tomography to image millimeter-scale volumes with nanometer resolution in both fly and mouse nervous tissue (https://bossdb.org/project/kuan_phelps2020)
- Building an X-ray tomography dataset that spans multiple brain areas, enabling high-resolution probing of the microstructures therein (https://bossdb.org/project/prasad2020)
The AWS Open Data Sponsorship Program covers the cost of storage for publicly available, high‑value cloud-optimized datasets, and the program’s vision aligns closely with the APL BossDB team’s commitment to make neuroscience data and tools available to the world.
“Many of the world’s most important data sets are open source and hosted on platforms like the AWS Cloud,” said Wester. “As neuroscience continues to advance with anticipated exponential growth of shared datasets over the next few years, cloud-native data ecosystems like BossDB will be critical for neuroscientists to scale their work, driving new scientific discoveries.”
The Applied Physics Laboratory, a not-for-profit division of The Johns Hopkins University, meets critical national challenges through the innovative application of science and technology. For more information, visit jhuapl.edu.