The Python OpportunityFinder® algorithm can now automatically detect fossil names and their associated Lithostratigraphic Units and Geological Ages without a prior list of names.
This can be useful because it is not always possible to predefine all the names and variations you are likely to come across in text. Furthermore, the way names are used in text can differ from reference lists of names. This can support industry and academia needs.
Mining Palaeontological data from large amounts of text has led to new scientific discoveries Peters et al (2014) .
Text mining algorithms were used to discover hidden geo-resource (metals, elements, minerals) associations in reports, maps, sketches and logs from the archives of the Geological Survey of Queensland in Australia. The Geological Survey of Queensland have made a number of excellent improvements recently increasing the accessibility of these data.
A subset of report packages over the past 40 years manually tagged to hydrocarbons were analysed using the OpportunityFinder® algorithm equating to over 2 Million sentences. Using Natural Language Processing (NLP), Knowledge Engineering and Machine Learning, locational information, Chronostratigraphy and Lithostratigraphy were automatically extracted along with co-occurrence data.
Indicator mineral evidence (as well as direct evidence) for critical resources were detected (such as Rare Earth Elements (REE), Gold, Silver, Copper and Nickel), despite not being mentioned in the petroleum report package metadata. These were ranked by several factors including speculation.
The Geological Survey of Queensland (GSQ) is the state’s custodian of geoscience knowledge and data. GSQ collects and provides geoscience data, information and advice about Queensland’s mineral and energy resources and resource potential. https://geoscience.data.qld.gov.au/
Infoscience Technologies Limited is an Artificial Intelligence tech start-up, extracting geoscience knowledge from unstructured text. www.infosciencetechnologies.com
The heatmap chart below shows co-occurrences between minerals driven from the text, clustered automatically using Pearson’s. This may produce interesting associations that may warrant further investigation.
The OpportunityFinder® algorithm has now exceeded 50,000 terms in its lexicon for detecting petroleum systems automatically in text. This is combined with hundreds of thousands of labelled data for machine learning. These can support laser like tasks, improve search & discovery, insights, knowledge mining and also support the tuning of very large language models.
We are using Deep Learning to leverage the unique 45,000 petroleum system related textual clues in OpportunityFinder®.
Designed for automation, the clues combined with auto-annotation of millions of sentences allow a deep learning model to generalise (learn). This enables the detection of valid clues in geoscience text (reports, presentations, papers) not present in the original lexicon.
Whilst an algorithm will never read text like a Geoscientist, we can teach it some elements. The advantage is reading differently to us, and processing larger volumes than any person can read.
This allows us to detect evidence, join the dots and stimulate business ideas we may not have had without the assistance of the algorithm.
OpportunityFinder® is being tested within a renewables geothermal project in collaboration with the British Geological Survey. BGS are investigating mine water in underground abandoned coal mines as a low carbon sustainable heat source for housing and manufacturing, and have several other potential use cases for knowledge extraction from their data archives to meet the challenges of decarbonisation and resource management.
Example showing autoclassification output from GeoClassifier® from a selection of public domain geoscience documents. The proportion of topics are clustered in a Pearson dendrogram heatmap. Those above the mean are in red, below the mean in dark blue relative to the corpus/collection. Easy to see clusters of documents predominantly about certain topics and to spot ‘anomalies’ – which can be interesting to see and read further.