Example showing autoclassification output from GeoClassifier® from a selection of public domain geoscience documents. The proportion of topics are clustered in a Pearson dendrogram heatmap. Those above the mean are in red, below the mean in dark blue relative to the corpus/collection. Easy to see clusters of documents predominantly about certain topics and to spot ‘anomalies’ – which can be interesting to see and read further.

A Gift to the Geoscience Community: GEOCLASSIFIER® – A Predictive Geological Text Classifier


To welcome in 2021 we are gifting GEOCLASSIFIER® – a geological machine learnt text classifier to not-for-profit organisations.

This assists information searching, filtering and discovery of geoscience topics in text. Even documents predominantly about one topic, often reference other geoscience topics buried within their pages. Automatically surfacing these topics could lead to insights that may otherwise go unnoticed.

Over 125 Million words from public geological texts were used to build the models.

The models in GEOCLASSIFIER® enable the automatic classification of text by industry sector; Metals and Mining, Engineering, Environmental, Geothermal, Hydrogeology, Petroleum and Planetary Geology.

They also classify by topic including; Mineralogy, Petrology, Sedimentology, Igneous, Metamorphic, Lithology, Volcanology, Commercial, Palaeontology, Geophysics, Tectonics, Geochemistry, Diagenesis, Hydrothermal, Glaciology, Geomorphology and Stratigraphy.

#artificialintelligence #machinelearning #textmining #geology #cognitiveassistant

Complex Geoscience Knowledge Graphs from Unstructured Text

The OpportunityFinder® algorithm automatically produces complex geoscience Knowledge Graph networks from unstructured text.

The algorithm uses ‘DNA profiling inspired’ techniques to populate the graph.

This enables interesting patterns and new knowledge to be surfaced that are beyond the reach of other approaches.

To find out more information and arrange a presentation or pilot contact:

OpportunityFinder update – digitally transforming Geoscience Opportunity Generation workflows using Natural Language Processing (NLP)

OpportunityFinder Milestone: the geoscience Python algorithm is now trained on 25,000 terms & phrases for specifically identifying clues for source rock, maturation, migration, reservoir, hydrocarbon occurrence, trap and seal in unstructured text (reports, presentations and papers).

This is used by its one-of-a-kind pattern based discovery method to assist the Geoscientist and surface possible leads, opportunities, analogues and plays that may have been overlooked.

OpportunityFinder®: A codebreaker for geoscience unstructured text


Bletchley Park Bombe (replica of the original Bombe) Antoine Taveneaux CC BY-SA 3.0

Over the past few years, geoscience and data science knowledge was used to label over one million diverse geoscience sentences from public domain Internet sources (papers, reports, presentations etc.).

The purpose was to identify clues for source rock, maturation, migration, hydrocarbon occurrence, reservoir, trap and seal as mentioned in unstructured text; in such a way they could be used for automated inference. This included both obvious explicit terms and phrases, along with more subtle non-obvious textual clues.

These data are used with Natural Language Processing, Machine Learning and a First-of-a-Kind novel ‘DNA inspired’ method to create a predictive classifier. The algorithm (OpportunityFinder®) can surface non-obvious patterns of interest that may be useful to an Exploration Geoscientist.

These may be contained within any repository of reports, documents or text, too large for a person to ever read. This may include old hardcopy reports now scanned/digitized, those in different languages, external and internal to an organization. The resulting patterns, which are surfaced from trillions of permutations, can be displayed in time and space to assist the Geoscientist with discovery and ideation.

The image below in Fig 1 is a simulation of data extracted from a large body of reports and geo-referenced using OpportunityFinder®. The pie-charts represent the differing elements that have been discovered in text (e.g. potential trap/seal clues in green).


Fig 1 – Thematic play elements from text (public domain WMS Basins data) 

These allow the Geoscientist to drill down in more detail. These raw ‘DNA’ are used by the data driven pattern algorithm in OpportunityFinder® to surface potential plays, leads and opportunities that may not be obvious. These may be browsed by the geoscientist stimulating lines of thought that may not have necessarily occurred had it not been for the algorithm.

This may provide a ‘fast start’ to organizations and aid companies with geoscience exploration. The algorithm (Python) can plug-in to existing search & discovery approaches used by organizations, who can also fork their own version of OpportunityFinder® should that be required.

There are also opportunities to target a variety of geological themes not currently addressed should that be of interest.

Update on OpportunityFinder – detecting petroleum exploration geoscience opportunities in text


The Python based algorithm has been further developed over the past 3 months with significant expansion in a number of areas:

  • Machine Learning: Refinement of the machine learning models using over 4,000 labelled sentences
  • Taxonomy: Expansion of the petroleum geoscience play element lexicon / taxonomy (source rock, maturation, migration, reservoir, trap, seal and hydrocarbon occurrence) to over 15,000 terms and phrases. The most extensive in the entire industry for this specific area.
  • Natural Language Processing (NLP): State-of-the art inductive data driven Geotagging for geoscience and geographical entities using Natural Language Processing (NLP) rules. State-of-the-at inductive data driven Lithostratigraphy extraction using NLP rules.
  • Ranking: An 11 dimension ranking method for ‘non-obviouness’ or ‘interestingness’ for potential opportunities, tested on Geoscientists. Driven by the philosophy to show Geoscientists ‘something they don’t already know’.
  • Patent Pending method: Further optimized in Python with unique hierarchical hash-table approach. On one public domain petroleum geoscience collection of several hundred thousands sentences the algorithm finished in just under 5 minutes on a cheap $500 i5 laptop after completing over 50 Billion possible permutations/ computations.

This enables any company to take a collection of reports, papers (any unstructured text) and almost immediately create high quality structured data (potential knowledge) that can be displayed spatially on a map using existing applications (Fig 1). This complements the “long list” of document search results lists produced by the traditional search engine.


Fig 1 – Fast delivery of value: exploiting unstructured information.

The information above (Fig 1) has been created entirely from unstructured text in geoscience reports. Areas in red indicate hydrocarbon occurrence, black is source rock, greens are trap/seal, yellow is reservoir, greys indicate evidence for migration/maturation. These data are 1 of 3 outputs that OpportunityFinder produces.

Whilst there is plenty of scope to build bespoke user interfaces, the advantages of OpportunityFinder is the delivery of data from unstructured text which is ready to load into existing company applications. Not just extracting data, but surfacing patterns that could lead to new business opportunities.

OpportunityFinder is available to license yearly or a one-off perpetual license that allows a company to fork its own version that it then subsequently owns. Components of OpportunityFinder can also be plugged into existing technologies a company may license/have developed – to supercharge initiatives to make search ‘intelligent’.

Quick Proof of Concepts (PoC) can be done remotely for a ‘fast start’ to support a company’s existing digital transformation initiatives.

Next steps include creating a version of OpportunityFinder for the mining industry.

For more information please contact:

About Infoscience Technologies Ltd

The tech start-up was founded by Dr Paul Cleverley ( in Nov 2018 based in Oxford (UK) with a focus on Artificial Intelligence (AI) applied to unstructured text.

The company can also work as a remote R&D lab with existing in-house innovation teams, building proprietary algorithms and technologies.

Target sectors range from geological surveys, petroleum exploration, metals and mining through to hydrogeology, geohealth, renewables and space exploration.

The company’s unique selling point is its deep geoscience domain knowledge, combined with computer science, data science, business analysis and internationally recognised research.