SEISMOLOGY AND GEOLOGY ›› 2023, Vol. 45 ›› Issue (6): 1432-1451.DOI: 10.3969/j.issn.0253-4967.2023.06.011

• Review • Previous Articles     Next Articles

STATE OF ART AND PERSPECTIVE ON DATABASE CONSTRUCTION FOR LOW-TEMPERATURE THERMOCHRONOLOGY

DAI Meng-yao1)(), WANG Ping1,2),*(), LI An-bo1,2), DING Lu1), LIU Pin-qin1), DAI Jin-gen3,4), ZHANG Hui-ping5), LIU Shao-feng3,4)   

  1. 1) School of Geography, Nanjing Normal University, Nanjing 210023, China
    2) Jiangsu Center for Collaborative Innovation in Geographical Information Resource Development and Application, Nanjing 210023, China
    3) School of Earth Sciences and Resources, China University of Geosciences, Beijing 100083, China
    4) State Key Laboratory of Geological Processes and Mineral Resources, China University of Geosciences(Beijing), Beijing 100083, China
    5) State Key Laboratory of Earthquake Dynamics, Institute of Geology, China Earthquake Administration, Beijing 100029, China;
  • Received:2023-01-17 Revised:2023-07-06 Online:2023-12-20 Published:2024-01-16

低温热年代学数据库建设现状与前景展望

戴梦瑶1)(), 王平1,2),*(), 李安波1,2), 丁璐1), 刘品钦1), 戴紧根3,4), 张会平5), 刘少峰3,4)   

  1. 1) 南京师范大学, 地理科学学院, 南京 210023
    2) 江苏省地理信息资源开发与利用协同创新中心, 南京 210023
    3) 中国地质大学(北京), 地球科学与资源学院, 北京 100083
    4) 中国地质大学(北京), 地质过程与矿产资源国家重点实验室, 北京 100083
    5) 中国地震局地质研究所, 地震动力学国家重点实验室, 北京 100029;
  • 通讯作者: 王平, 男, 1981年生, 博士, 副教授, 主要研究方向为构造地貌学与沉积学, E-mail: tigerwp@njnu.edu.cn
  • 作者简介:

    戴梦瑶, 女, 1999年生, 现为南京师范大学地理科学学院地图学与地理信息系统专业在读硕士研究生, 主要从事地质年代学数据库研究, E-mail:

  • 基金资助:
    深时数字地球国际大科学计划种子基金(GJ-C03-2023-002); 国家重点研发计划项目(2018YFE0204204); 国家自然科学基金(42272114); 江苏省自然科学基金(BK20211270)

Abstract:

Low-temperature thermochronology is a key technology for studying neotectonics and landscape evolution. However, it is intrinsically different from the other geochronological methods in the data expression, analysis and interpretation. In recent years, with the widespread adoption of low-temperature thermochronology techniques, the size volume of data has continuously increased, giving rise to many studies on tectonic geomorphic evolution based on big data. However, these data are mostly scattered across literature from different sources, with inconsistent formats and contents, and varying data quality, which to a certain extent hampers innovative research based on big data. There is a need to construct specialized databases to cope with the growing low-temperature thermochronology data and meet the demands of innovative research using big data.

In this paper, four conventional geochronological databases, including National Geochronological Data Base, Geochron, Petlab, DataView, and recent databases, AusGeochem and Sparrow are reviewed for comparison of their capability in data sources, data volume, data storage structure, completeness of data content, data entry methods, data retrieval methods, coverage areas, database update patterns, and data analysis tools. The conventional geochronological databases, of which the thermochronological data comprise only a small part, are generally stored in databases similar to or outside this subject, such as radioisotope chronology database, geochronology database, petrological mineral and geological analysis databases. They amplify the commonalities between different disciplines, and thus focus only on the presentation of sample units. It is not suitable for “big data” research, because all the data are managed by relational database with strictly structured tables and limited data sources. It was found that conventional geochronological databases design approaches are often suitable for absolute age data. However, low-temperature thermochronology differs from conventional geological dating methods, as its age values only record cooling time. The more geologically significant cooling history comes from numerical simulations based on elevation profiles, track lengths, and the diffusion dynamics models of the(U-Th)/He system. Additionally, the innovation in experimental techniques also imposes new requirements on the construction of thermochronology databases.

Comparing with the conventional geochronology databases, recent databases focus more on low-temperature thermochronological data and support both the structured and unstructured data with variable data sources, which makes it more comprehensive and professional. These databases own the characteristics of flexibility and expandability, especially for the addition of new dating methods and experimental methods, the storage of big data and the linkage between laboratories and database. Using different types of database platform and associated APIs, both relational and non-relational data can be involved and managed for data query, analysis and visualization. However, the construction of these recent databases is still in the preliminary exploration stage, and ensuring the continuous growth of data remains a challenge. Moreover, establishing a flexible numbering system for sustainable and expandable unique identification of samples and data is also an important task for recent databases. Finally, in addition to raw data, numerous thermal history information is included in published paper related to fission track. These interpretations or inverted results constitute interpretive data, which are crucial for reconstructing cooling history or tectonic uplift. Therefore, how to incorporate such data into the database is also a question that must be considered during database design.

The key to supporting the database lies in the users who it oriented. Considering the needs of users in professional field for scientific research management, experimental analysis and “big data” innovative research, as well as in view of the problems existing in the current databases, we put forward following suggestions for the future construction of low-temperature thermochronology database.

Firstly, in order to ensure the activity of specific low-temperature thermochronology database. from a technical perspective, artificial intelligence technologies such as natural language processing or other forms of machine learning algorithms should be utilized to semi-automatically or automatically extract information from paper, assisting users in quickly extracting relevant information and understanding the content of the literature. Platforms like Semantic Scholar, GeoDeepDive, and DeepShovel have implemented interactive features in data mining, wherein data is normalized and automated into the database based on user-specified rules, significantly reducing manpower and time costs in data acquisition, providing great convenience. In terms of ideology, the open-sharing academic ecosystem has given rise to open-sharing platforms such as arXiv for preprints, data repositories like Pangaea, and the Deep-Time Digital Earth integrated online research platforms, drastically shortening the cycle from research and experimentation to publication. This facilitates the incorporation of the latest research data into databases, greatly expanding the data sources. Regarding user volume, academic social networks possess advantages in academic tracking and dissemination, breaking down academic-related hierarchies, promoting academic exchange and cooperation, and attracting more users.

Secondly, more detailed data storage capabilities and simpler data operation behaviors help improve the expansibility of the database. Most existing geochronological databases use relational databases, which are a strictly structured way of storing data. The most typical data structure presentation form is two-dimensional table, which is very suitable for logical geological data. However, non-relational databases are not tables but databases oriented towards structured and unstructured data storage requirements, which have filled the gaps in relational databases. In practical applications, the advantages of both types of databases can be combined to comprehensively include basic geological information and interpretive information, achieving the effect of New SQL.

Thirdly, highlight its highlight. Chronological data of sample and the single data that make up the sample chronology are significant, it will be effective in distinguishing low-temperature thermochronology from other similar disciplines if the coding style of sample and single data that are not registered on IGSN can be standardized to highlight the characteristics of subject data.

Finally, by combining the strengths of both conventional and recent databases, incorporating the concept of open academia, leveraging advanced information mining and transmission technologies, and utilizing a storage approach that combines structured and unstructured data, it can greatly meet the comprehensive needs of users, ranging from laboratories to scientists, and further to data consumers.

Key words: low-temperature thermochronology, big data, artificial intelligence, relational database, non-relational database

摘要:

低温热年代学是新构造和地貌演化研究的重要技术手段, 与常规的地质年代学方法在数据表达、 分析和解释方面都存在明显差异, 需要建设专门的数据库应对其数据日益增长, 并满足大数据创新研究的需求。文中选取4个具有代表性的典型传统数据库和2个新一代数据库进行对比分析, 发现传统数据库(如NGDB)的数据来源单一, 样品数据以结构化的表格呈现, 并以关系型数据库的形式进行数据管理; 而新一代低温热年代学数据库(如AusGeochem)的数据来源广泛, 包含结构化和非结构化数据, 且数据库的扩展性强, 能够适应新方法和大数据分析的需求, 并采用灵活的数据库类型与应用程序接口(API)联合管理数据, 兼具数据查询、 分析与可视化功能。文中针对现有数据库存在的问题, 在数据持续增长、 数据库的可扩展性和数据编号管理等方面对下一步数据库的建设进行了展望, 以期为新构造和地貌演化的大数据研究提供基础保障。

关键词: 低温热年代学, 大数据, 人工智能, 关系型数据库, 非关系型数据库