File system migration to a document management system, support for acquisitions and mergers: GeoClassifier® – A new way of automatically organising geoscience, subsurface and wells documentation

The GeoClassifer(R) Python algorithm launched back in December 2020 (for petroleum, mining and renewables) can automatically read the ‘body text’ of geoscience, subsurface & wells documentation (PDF, PPT, Word, Excel etc) and:

Classify by document type
Classify by document category
Classify by chronostratigraphy
Classify by Lithostratigraphy
Classify by well / borehole name

Classify by prospect / lead name
Classify by survey name
Classify by deposit / orebody name
Classify by reservoir / aquifer name

Classify by field name
Classify by block / license name
Classify by play name

Classify by basin / geobody name
Classify by area /region name
Classify by country and region
Categorise by discipline/topics
Extract dates, people’s names & company names (ootb models)
Classify to machine learnt topics (custom geoscience model)
Also extract all of the names above that occur in the document if required
Extract drilling & operation problems
Many more features..

The resultant tags can be used to help organise records & document management and improve search & discovery of geoscience, subsurface and wells documents.

The GeoClassifier algorithm achieves this in a unique and novel way using several techniques.

– Knowledge Engineering (a taxonomy with thousands of clues for document types and categories)

– Machine Learning (250,000 labelled topic examples in an ensemble model), custom SpaCy NER models.

– Natural Language Processing (NLP) state-of-the-art geoscience name extraction

There are many limitations and problems when taking a taxonomy or thesaurus built for manual tagging of documents – then trying to apply that automatically to text. Unlike traditional methods (and taxonomies), GeoClassifier(R) was built from the ‘get-go’ for automated not manual document tagging – supporting digital transformation.

The Python algorithm can be applied immediately to diverse documentation, from any geographical location without using prior lists of names. Lists of names (e.g. well names) can be added to improve detection.

The algorithm can run stand alone against files on a filesystem and/or a company can take parts of it and embed in their existing tooling that may be more integrated with SharePoint / EDMS and Search systems.

The algorithm also uses an automatic document scoring system based on a number of criteria to identify those documents that will have tendencies to be ‘most important’ from a search and document & records retention point of view. This can aid file system migration projects, as well as acquisition, divestment and mergers.

More: contact@infosciencetechnologies.com

Patented next generation algorithms: The GeoClassifier(R) algorithm disrupts traditional document classification and extraction whilst OpportunityFinder(R) disrupts traditional business ideation processes, targeting associative extraction of petroleum, mining and renewables concepts and opportunities.

Share this:

Leave a comment Cancel reply