The OpportunityFinder Python based Natural Language Processing (NLP) algorithm has been extended to detect clues for porphyry copper in text.
Launched in early 2020 and used by organisations for petroleum and native hydrogen exploration, the algorithm uses hundreds of thousands of lexicons, taxonomies and labelled data for machine learning models. The novel Patented method combines these, placing a geological lens over unstructured information – turning it into structured information which can be visualised.
This can assist the Geoscientist ‘read’ hundreds of thousands/millions of notes, papers, reports, presentations, logs, maps and sketches for clues to potentially new hidden opportunities. Some ideas and opportunities may only be apparent by combining clues from many documents.
GeoClassifier® can automatically classify sentences, paragraphs and documents to geoscience categories and detect well/borehole names in text. GeoClassifier® uses over 250,000 labelled public geoscience sentences to train deep learning models to achieve this. When an organisation licenses the algorithm they also receive the actual training data, so can build and train their own ML models for prediction if they wish.
Due to the performant way the patented algorithm has been designed, it can check through millions of permutations in every sentence in a document extremely quickly. In a large corpus of text this equates to trillions of permutations.
Run on nothing more sophisticated than a standard i7 high street laptop, the algorithm processed 1 Million Documents for extractions in 26 hours. If an organisation had more resources than 1 laptop – this could be substantially quicker again..
These permutations may include clues for petroleum plays, conditions for ore bodies/minerals, evidence for hydrogen as well as empirical evidence, generic Geobody, Lithostratigraphic, Chronostratigraphic, Lithological, Environmental and Structural evidence.
Infoscience Technologies was delighted to guest author an article on Natural Language Processing (NLP) in the Geosciences for Halliburton’s September issue of Subsurface Insights magazine. This month’s issue is a Minerals special.
Detecting entities such as well names in unstructured text can be useful for many aspects of information discovery.
Lookup lists from corporate databases and regular expression pattern rules can be useful. They do have limitations though, it can be difficult to predict sometimes what may lie within thousands of old reports and documents.
Having a machine learning model tuned and trained on thousands of public domain examples may help and support existing digitalisation activities. This new capability was added to the GeoClassifier® algorithm recently.
The screenshot shows some examples in oil & gas, geothermal, hydrogeology, mining and carbon capture sectors.
Tested on several hundred UK License Reports gave 96% accuracy detecting 718 well names.
As well as detecting Geo-resources in unstructured text reports, papers and logs, OpportunityFinder can detect and disambiguate all kinds of geological concepts. High level lithology groupings in the Williston Basin are shown above in the Beeswarm chart.
Oxford, UK 18th June 2021: Infoscience Technologies Ltd, the pioneer in extracting geoscience and subsurface knowledge from text, is delighted to announce the United States Patent and Trademark Office has granted a patent for the OpportunityFinder® technology.
The patent award is for a Natural Language Processing System which suggests Geo-Resource Ideas & Plays from Unstructured Text. The invention uses a novel method to move beyond documents, sentences and entities – to detect interesting patterns in text relevant to the geo-resource sector.
The new patent reinforces that Infoscience Technologies is on the leading edge of digital transformation in the geoscience and subsurface sectors.
The OpportunityFinder® algorithm is currently used by organisations in the Petroleum Exploration, Geothermal, Mining and Hydrogen sectors.
The Python OpportunityFinder® algorithm can now automatically detect fossil names and their associated Lithostratigraphic Units and Geological Ages without a prior list of names.
This can be useful because it is not always possible to predefine all the names and variations you are likely to come across in text. Furthermore, the way names are used in text can differ from reference lists of names. This can support industry and academia needs.
Mining Palaeontological data from large amounts of text has led to new scientific discoveries Peters et al (2014) .
Text mining algorithms were used to discover hidden geo-resource (metals, elements, minerals) associations in reports, maps, sketches and logs from the archives of the Geological Survey of Queensland in Australia. The Geological Survey of Queensland have made a number of excellent improvements recently increasing the accessibility of these data.
A subset of report packages over the past 40 years manually tagged to hydrocarbons were analysed using the OpportunityFinder® algorithm equating to over 2 Million sentences. Using Natural Language Processing (NLP), Knowledge Engineering and Machine Learning, locational information, Chronostratigraphy and Lithostratigraphy were automatically extracted along with co-occurrence data.
Indicator mineral evidence (as well as direct evidence) for critical resources were detected (such as Rare Earth Elements (REE), Gold, Silver, Copper and Nickel), despite not being mentioned in the petroleum report package metadata. These were ranked by several factors including speculation.
The Geological Survey of Queensland (GSQ) is the state’s custodian of geoscience knowledge and data. GSQ collects and provides geoscience data, information and advice about Queensland’s mineral and energy resources and resource potential. https://geoscience.data.qld.gov.au/
Infoscience Technologies Limited is an Artificial Intelligence tech start-up, extracting geoscience knowledge from unstructured text. www.infosciencetechnologies.com
The heatmap chart below shows co-occurrences between minerals driven from the text, clustered automatically using Pearson’s. This may produce interesting associations that may warrant further investigation.