The Python based algorithm has been further developed over the past 3 months with significant expansion in a number of areas:
- Machine Learning: Refinement of the machine learning models using over 4,000 labelled sentences
- Taxonomy: Expansion of the petroleum geoscience play element lexicon / taxonomy (source rock, maturation, migration, reservoir, trap, seal and hydrocarbon occurrence) to over 15,000 terms and phrases. The most extensive in the entire industry for this specific area.
- Natural Language Processing (NLP): State-of-the art inductive data driven Geotagging for geoscience and geographical entities using Natural Language Processing (NLP) rules. State-of-the-at inductive data driven Lithostratigraphy extraction using NLP rules.
- Ranking: An 11 dimension ranking method for ‘non-obviouness’ or ‘interestingness’ for potential opportunities, tested on Geoscientists. Driven by the philosophy to show Geoscientists ‘something they don’t already know’.
- Patent Pending method: Further optimized in Python with unique hierarchical hash-table approach. On one public domain petroleum geoscience collection of several hundred thousands sentences the algorithm finished in just under 5 minutes on a cheap $500 i5 laptop after completing over 50 Billion possible permutations/ computations.
This enables any company to take a collection of reports, papers (any unstructured text) and almost immediately create high quality structured data (potential knowledge) that can be displayed spatially on a map using existing applications (Fig 1). This complements the “long list” of document search results lists produced by the traditional search engine.
Fig 1 – Fast delivery of value: exploiting unstructured information.
The information above (Fig 1) has been created entirely from unstructured text in geoscience reports. Areas in red indicate hydrocarbon occurrence, black is source rock, greens are trap/seal, yellow is reservoir, greys indicate evidence for migration/maturation. These data are 1 of 3 outputs that OpportunityFinder produces.
Whilst there is plenty of scope to build bespoke user interfaces, the advantages of OpportunityFinder is the delivery of data from unstructured text which is ready to load into existing company applications. Not just extracting data, but surfacing patterns that could lead to new business opportunities.
OpportunityFinder is available to license yearly or a one-off perpetual license that allows a company to fork its own version that it then subsequently owns. Components of OpportunityFinder can also be plugged into existing technologies a company may license/have developed – to supercharge initiatives to make search ‘intelligent’.
Quick Proof of Concepts (PoC) can be done remotely for a ‘fast start’ to support a company’s existing digital transformation initiatives.
Next steps include creating a version of OpportunityFinder for the mining industry.
For more information please contact:
About Infoscience Technologies Ltd
The tech start-up was founded by Dr Paul Cleverley (www.paulhcleverley.com) in Nov 2018 based in Oxford (UK) with a focus on Artificial Intelligence (AI) applied to unstructured text.
The company can also work as a remote R&D lab with existing in-house innovation teams, building proprietary algorithms and technologies.
Target sectors range from geological surveys, petroleum exploration, metals and mining through to hydrogeology, geohealth, renewables and space exploration.
The company’s unique selling point is its deep geoscience domain knowledge, combined with computer science, data science, business analysis and internationally recognised research.