- Single Dish Summer School 202220 Dec, 2021
- F Region Electric Field Effects on the Intermediate Layer Dynamics During the Evening Prereversal Enhancement at Equatorial Region Over Brazil16 Dec, 2021
- Announcing a Change in Leadership of the Florida Space Institute16 Dec, 2021
- AO Scientist studies Near-Sun Asteroid 2005 UD polarimetric comparison with asteroids and meteorites15 Dec, 2021
- Near-Earth Asteroid 1999 KW4 Moshup: Planetary Defense Characterization Exercise15 Dec, 2021
- AO Scientist Contribute to European Pulsar Timing Array: Gravitational Wave Background Study15 Dec, 2021
- The Arecibo Observatory’s Big Data Program: Award Winning Preservation of AO’s Historic Dataset15 Dec, 2021
- Topical Symposium: Science and Discoveries at Arecibo Observatory 15 Dec, 2021
- Arecibo Observatory Restarts Radio Astronomy Observations15 Dec, 2021
- Beating the Noise: Arecibo and Green Bank Telescopes Detect Faint Signals from Cold Clouds in our Galaxy15 Dec, 2021
- Abrupt Change in one of the Most Precisely-Time Pulsars14 Dec, 2021
- Air Pollution Concentration Study14 Dec, 2021
- Arecibo Scientists investigate variability of Blazar J1415+132014 Dec, 2021
- Arecibo Observatory at the 239th AAS Meeting14 Dec, 2021
- Detection of the YORP Effect on the contact-binary (68346) 2001 KZ66 from combined radar and optical observations14 Dec, 2021
- AO Radar Data Used to Study NASA Mission Target Asteroid (16) Psyche14 Dec, 2021
Byadmin19 July 2021 #AOScienceNow
Through the Big Data Program at the Arecibo Observatory (AO), we are developing the Arecibo Archives Data Catalog to facilitate the access to AO's projects, observations, datasets, and attributes. Approximately half of the AO database is currently available in the catalog: https://www.naic.edu/datacatalog/
The purpose of the Data Catalog is to provide a user-friendly portal where users can browse, query, and explore the projects observed at Arecibo for more than 55 years. This catalog consolidates multiple data sources that have been built throughout AO's operation. The main component of the Data Catalog is the Projects Catalog, which provides all of the technical information about a proposal or project. This is essentially what the scientists would submit as a proposal to receive Arecibo observing time. The Data Catalog is complemented by the Observations Log, a Files Catalog and an Attributes Catalog. The Observations Log provides a detailed log recorded by the observing scientists for each project. The Files and Attributes catalogs contain all of the raw data files that were captured in the observations as well as key metadata of those files.
To build this catalog, the Big Data team worked to first identify and catalog all of the projects that have been done at Arecibo. This was no easy task since the data was stored in many formats throughout the years. For each format, the team created scripts that scraped or extracted all technical information from the documents and saved them into a database. This first step is the foundation of the Data Catalog.
In a similar way, the team extracted and compiled the Observations Log using log information that existed in different locations. Most of the observations were already saved in a database, making it easier to integrate into the catalog. The Files Catalog is being built as the datasets are copied to the Texas Advanced Computing Center. Once a dataset is copied, the team catalogs it and creates a record for it within the Catalog Database, keeping record of the file location, corresponding project, and size. Finally, the Attributes Catalog is being actively populated by extracting headers, metadata, and attributes from the raw files. This is being done using scripts that navigate through the server's paths and extracts the attributes from each file. This is catalogued and saved into a database that keeps record of all scientific attributes including related file name and project.
This catalog's importance is incalculable. It is the steppingstone to make Arecibo's Datasets accessible to the community and curious minds. The Data Catalog project is a computing strategy that will make the necessary data and resources widely available to the scientific community, continuing the Arecibo Observatory’s legacy of enabling groundbreaking new results about our atmosphere, our Solar System, and our universe.
Article written by Eng. Julio Alvarado Negrón
Big Data Manager
Keywords: observatory, arecibo, data, big, data, catalog, texas, TACC, advanced, computing