Geoscience Text Mining Presentation 23rd March 2020, Geol Society of London


Looking forward to presenting at the Finding Petroleum conference on the 23rd March 2020 at the Geological Society of London.


Most of the published literature on text mining in exploration geoscience focuses on extraction of data or concepts typically in the sentence or document ‘container’. There are no known approaches that look for combinatorial patterns within and across documents, that may indicate an opportunity (such as a hydrocarbon play, lead or idea). A ‘DNA profiling’ inspired data science technique combining machine learning and natural language processing will be presented. These techniques may facilitate data driven ideation for the geoscientist and stimulate a line of thought that would not have otherwise occurred had it not been for the algorithm.

Predicting Hydrocarbon Play Types from Unstructured Text

Predicting hydrocarbon plays from text using machine learning and natural language processing. I recently tested the OpportunityFinder Algorithm on a selection of public domain geoscience literature. Only literature published between 1990 to 2010 was used, some time before a major gas discovery was made in the area. The hypothesis was whether the algorithm could surface the existence of a ‘play type’ and supporting evidence way in advance to its impending discovery.

The algorithm surfaced the play type of [RESERVOIR]-[TRAP]-[SEAL] of Miocene Shallow Water Limestones in Atoll-like reef structures capped by thick salt in the area. Similar in nature to what was subsequently found through exploration. Evidence of gas was picked up through seafloor pockmarks, present where the salt was absent or via faults. There were no pockmarks at the vicinity of the discovery, showing evidence for potentially a good seal and gas trapped below the salt. This perhaps hints as to what may be achieved through these types of text mining techniques.

No claims are made that algorithms can provide an ‘x’ marks the spot. AI is generally unimaginative and lacks the retroductive reasoning of a geoscientist. However, what algorithms can do, is ‘read’ more reports & papers than a person can feasibly read in a lifetime; joining the dots to surface subtle potentially interesting patterns to spark ideas. These suggestions may point the geoscientist towards a line of thinking they may not have had otherwise. We may be just scratching at the surface of what is possible.

Introducing the company

Infoscience Technologies Ltd provides practical digital transformation advice and algorithms to the geoscience industries.

The tech start-up was founded by Dr Paul Cleverley ( in Nov 2018 based in Oxford (UK) with a focus on Artificial Intelligence (AI) applied to unstructured text – doubling up as an R&D Innovation Lab.

Target sectors range from geological surveys, petroleum exploration, metals and mining through to hydrogeology, geohealth, renewables and space exploration.

The company’s unique selling point is its deep geoscience domain knowledge, combined with computer science, data science, business analysis and internationally recognised research.

For more information:


Unlocking the hidden codes in geoscience unstructured text