July 2010
Significant progress has been made in archiving, distribution, access, interoperability, visualization and other aspects of data and information systems for remotely sensed Earth science data over the last decade. Given this context, the DAD TC community has identified a few key questions to be addressed by research and development activities in the near future. These are identified below in four major groups: Archiving and Preservation, System Interoperability, Access, and On-Line Services and Enabling Analysis. These are extracted from a paper presented at IGARSS 2010 titled “ADVANCES IN SPATIAL DATA INFRASTRUCTURE, ACQUISITION, ANALYSIS, ARCHIVING & DISSEMINATION” by H. K. Ramapriyan, G. L. Rochon, R. Duerr, R. Rank, S. Nativi, and E. F. Stocker.
1. Archiving and Preservation
- Data Readability and Integrity: How long can current data formats be expected to survive, and will they be readable after 2 or 3 updated versions of the format have been released or after other formats have become more popular? How can we insure ensure that critical data survive this process of technological evolution with integrity?
- Data Availability: How can we insure ensure that data remain accessible for reasonable periods of time, irrespective of what happens to the site archiving them? How long a time period should be considered minimal for public access?
- Data Identity: How do we know that two files contain the same data even if the formats are different? That is, how do we ensure that two data sets are “scientifically identical”? How do we find the data used in a particular publication? How can we uniquely and unambiguously identify a particular piece of data no matter which copy a user has? How can we provide online citation technology in a consistent and interoperable way?
- Provenance: How do we define the appropriate levels of provenance information and ensure that they are included along with data during production and in the archive?
- Data Encoding and Compression: How can we encode data in an interoperable, flexible, scalable, efficient way that preserves the likelihood that the data will be understandable decades into the future? This includes data compression issues for network (Web) exchange.
- Validation of Data Properties: What are the appropriate methods and frequencies with which data object properties should be validated in an archiving system that is subject to hardware and software failures, operational errors, natural disasters, or malicious attacks?
- Transparent Technology Refreshment: What techniques should be used to ensure “transparent technology refreshment”, i.e., upgrading to new generations of hardware and software while maintaining high levels of operational availability, addressing the dynamic and evolving archive environment, and maximizing the application of limited resources?
2. System Interoperability
- Data Discovery: How can we provide online discovery for disparate (i.e. heterogeneous and distributed) datasets?
- Hardware Technology Trends: What are the continuing trends of technology evolution and cost (a la Moore’s law) in processing, storage and network bandwidth? What are their implications on overall end-to-end systems’ architecture?
- Standardization: Are current standards adequate or are new standards needed to eliminate or reduce impacts of heterogeneity? Standardization is essential for interoperability and information heterogeneity management. It also facilitates evolvability and helps reduce costs. Standardization efforts apply to:
- people, primarily in the form of terminology standards
- information, primarily in the form of structural and semantic representation standards
- systems, primarily in the form of interface and communication standards.
- Conceptual Composability (System of Systems): How do we introduce the necessary “interoperability arrangements” necessary to implement complex System of Systems collecting task-oriented, autonomous systems that pool their resources together to obtain more complex, ‘meta-system‘ (e.g. GEOSS) ?
3. Access
- Security: How do we strike a balance between open access to data and the need to protect data from malicious or inadvertent corruption? How do service providers protect their systems from “denial of service” attacks and other improper uses of the data and services?
- Standards: What standards should be developed or adopted to facilitate access to data? Which basic processing functionalities should be included (e.g. domain/co-domain subsetting, transformations, etc.)?
4. On-Line Services and Enabling Analysis
- Data Visualization and Analysis: How can we provide on-line visualization and analysis tools that can assist users in identifying meaningful data subsets within large sets?
- Data, Algorithms, and Services: How do we associate data with the services that act on them and with the algorithms that create them? How do we make distributed data and associated services discoverable without requiring users to learn multiple search tools? In order to support the data and information needs of the application communities is it possible to determine what products have the most socio/economic value and what algorithms are needed in order to produce them? Can such lists be updated dynamically as sensors and applications continue to evolve? For example, how best can we make the user community aware that digital elevation maps (DEMs) can be produced accurately from Synthetic Aperture Radar (SAR) interferometry, or that sea ice surface temperatures are now available from infra-red (IR) channels, or that a new vegetation index has been produced from Moderate Resolution Imaging Spectro-radiometer (MODIS) data?
- Data Evaluation: How do we evaluate datasets, including their quality, content, and constraints? Data quality issues are especially important in the present Web era where global viewers help inexpert users fuse and visualize heterogeneous and distributed datasets, potentially in scientifically erroneous ways due to a substantial lack of information about data uncertainty and error propagation.
- Standards: How will uncertainty and error propagation description and management affect existing standards?

