1st International Workshop on Searching and Mining Large Collections of Geospatial Data (GeoSearch)

1st International Workshop on Searching and Mining Large Collections of Geospatial Data (GeoSearch)

This workshop brings together the art of search engine construction with both geospatial data modeling, data processing and data management to provide a forum for researchers and practitioners interested in GeoSearch.


General Information

The amount of location data generated and models that are being developed is increasing quickly. Remote sensing provides exabytes of Earth observation data, sensor networks generate measurements with unprecedented velocity, social networks, autonomous cars, smart cities, and the Internet of Things (IoT) add to these collections. Traditionally, geospatial data management is based on curating datasets and catalogue services which provide the ability to filter datasets based on size, location, and thematic focus. For example, the Worldview-3 satellite observes the world at a resolution of 31cm per pixel, which translates into 10.4 million pixels per square kilometer, and covers 680,000 square kilometer a day, resulting in more than 7 trillion pixels per day.

Our ability to develop models that can recognize objects on a given image has improved tremendously in the last decade, allowing us to monitor a region to detect flooding, or forest fires using high resolution imagery and videos collected by unmanned aerial vehicles. A limiting factor for such approaches is that it is difficult to search the huge collections for interesting patterns. Not only does one need to know where to look to find objects of interest but also what model to use for such a task? What if a forest fire breaks out in an area that is not monitored? What if prior efforts had already created models on an exact or very similar task? How should users search for such models? When models are available how should they be stored? Many applications become possible if we manage to make data collections and models searchable by content, metadata and application tasks.

Application users would like to solve such challenges knowing which model to use, which task is the model relevant for and finding all objects of a certain type in a huge data cube or a large point cloud. And users will want to be able to search broadly, interactively, fast and using different or even mixed modalities. For example, you want to search using a text query and retrieve images from a satellite data collection, retrieve models from a database of existing models. Similarly, you want to search with an image for locations on Earth that have a certain similarity. You want to monitor broad areas to rapidly identify changes like emergencies or disasters to alert and guide rescue teams. When you are on the go, you might want to search with audio description of what you aim to find and you want to search across all geospatial data representations (vector, raster, text, object, fields, etc.).

This is the 1st International Workshop on Searching and Mining Large Collections of Geospatial Data (GeoSearch) that brings together the art of search engine construction with both geospatial data modeling and data processing and management.

Program (Eastern Time)

16:00-16:05 : Opening Remarks : Foreword

16:05-17:00 : Keynote Talk

Manil Maskey (NASA)

The anticipated growth of the NASA Earth science data in both the data ingest rate, as well as the overall archive volume, pose new challenges in data management and analysis. This talk will introduce the barriers, opportunities, and way forward in exploiting increasingly large volumes of NASA Earth science data.  Specifically, advanced data-driven technology such as artificial intelligence and machine learning (AI/ML) to extract knowledge, scale analysis, and enhance data discovery will be discussed.

17:00-17:10 : Break

17:10-17:40 : FAIR Interfaces for Geospatial Scientific Data Searches

Ranjeet Devarakonda, Kavya Guntupally, Michele Thornton, Yaxing Wei, Debjani Singh and Dalton Lunga

Several factors must be considered in designing a highly accurate, reliable, expansive, and user-friendly geospatial data search interfaces. This paper examines four critical questions that ought to be considered during the design phase: (1) Is the search interface or API that provides the search capability useable by both humans and machines? (2) Are the results consistent and reliable? (3) Is the output response format free to use, community-defined, and non-propriety? (4) Does the API clearly state the usage clauses? This paper discusses how certain data repositories at the US Department of Energy’s Oak Ridge National Laboratory apply FAIR data principles to enable geospatial searches and address the above-mentioned questions.

17:40-18:10 : gtfs2vec – Learning GTFS Embeddings for comparing Public Transport Offer in Microregions

Piotr Gramacki, Szymon Woźniak and Piotr Szymański

We selected 48 European cities and gathered their public transport timetables in the GTFS format. We utilized Uber’s H3 spatial index to divide each city into hexagonal micro-regions. Based on the timetables data we created certain features describing the quantity and variety of public transport availability in each region. Next, we trained an auto-associative deep neural network to embed each of the regions. Having such prepared representations, we then used a hierarchical clustering approach to identify similar regions. To do so, we utilized an agglomerative clustering algorithm with a euclidean distance between regions and Ward’s method to minimize in-cluster variance. Finally, we analyzed the obtained clusters at different levels to identify some number of clusters that qualitatively describe public transport availability. We showed that our typology matches the characteristics of analyzed cities and allows succesful searching for areas with similar public transport schedule characteristics.

18:10-18:20 : Break 18:20-18:50 : An AI-based Spatial Knowledge Graph for Enhancing Spatial Data and Knowledge Search and Discovery

Zhe Zhang, Zhangyang Wang, Angela Li, Xinyue Ye, E. Lynn Usery and Diya Li

Geospatial knowledge graph has been widely used in GIS to address the challenges in various application domains such as disaster response, agriculture risk management, environmental planning, and water resources protection. The need to ensure AI-readiness in data searching compounds the challenge of facilitating accurate and efficient spatial data sharing and communication across different domains and stakeholders. The existing data search platforms such as NASA Open Data Portal and USGS Geo Data portal require users to enter a data code and/or a subject name as keywords for data searching. However, from the cognitive perspective, a user is more familiar with place names and subject domains rather than a data code or coordinate points during the procedure of searching spatial data. Another challenge appears in building meaningful semantics using spatiotemporal similarities among different data sets so a user can find all the relevant data or information related to the keywords within the study area. We developed a novel AI-based graph embedding algorithm to build semantic relationships between different spatial data to enable efficient and accurate data search. We applied the graph embedding algorithm to 30,0000 NASA metadata to develop a spatial knowledge graph for data searching. In the end, we visualized the knowledge graph using the Neo4j database graphical user interface to demonstrate the performance.

8:50-19:20 : Joining Street-View Images and Building Footprint GIS Data

Yoshiki Ogawa, Takuya Oki, Shenglong Chen and Yoshihide Sekimoto

This paper proposes a new method to join building footprint GIS data with the relevant buildings in a street-view image, taken by a vehicle-mounted camera. This is achieved by segmenting buildings in the street-view images and identifying the relevant building coordinates in the image. The building coordinates on the image are then estimated from the building vertices in the building footprint GIS data and vehicle trajectory history. Finally, the objective building is identified and relevant building attributes corresponding to each building image are linked together. This method enables the development of building image datasets with associated building attributes. The building image data, when linked to the relevant building attributes, could contribute to many innovative urban analyses, such as urban monitoring, the development of three-dimensional (3D) city models, and image datasets for training with annotated building attributes.

19:20-19:55 : Panel/Discussions/Opportunities for Collaboration

19:55-20:00 : Closing Remarks


Gabriele Cavallaro (Forschungszentrum Jülich)

Dora B. Heras (University of Santiago de Compostela)

Dalton Lunga (Oak Ridge National Laboratory)

Martin Werner (TU Munich)

Andreas Züfle (George Mason University)


Download the proceedings