Infoscience Technologies Ltd is a tech start-up providing Text Analytics Research & Development to companies seeking to extract, discover and exploit geoscience knowledge from text.
Target industries range from geological surveys, petroleum exploration, economic mining through to geohealth and space exploration.
The company’s unique selling point is its deep geoscience domain knowledge, combined with computer science, data science, business analysis and international award winning research.
The Company currently licenses two methods (algorithms), GEOSAPIEN and GEOCLASSIFIER both related to predicting geoscience entities or classes in unstructured text.
GEOSAPIEN® Patent Pending.
A Method and System for Generating Geological Lithostratigraphical Analogues using Theory-Guided Machine Learning from Unstructured Text.
Over 60% of geoscientists believe the identification of global analogues reduces risk and improves decision making in areas of sparse data (Sun and Wan 2002). They also help to challenge bias (Rose 2016). This is applicable in sectors such as petroleum, economic mining, geohealth and space exploration.
Analogues can uncover subtle opportunities that may not be apparent from any other technique or technology, with the ability to make global comparisons “casting a wider net than just what is close by‘ identified as particularly useful. Traditional Information Retrieval (IR) search engines are not well suited to find similar cases or analogues (Mukherjee et al 2013).
Manually curated analogue databases are useful, but as put by Greenberg (2011) manually created taxonomies and knowledge representations can blind us to new information discoveries. With information in reports in volumes too vast for geoscientists to read (Big Data), data driven techniques that can scan through vast amounts of text may surface differentiating insights.
Geoscience information has high dimensionality, geoscience bodies are not as clear cut as objects in some other domains, rare events are of interest and geoscience processes can show long-memory characteristics in time (Karpatne et al 2017). Some research (Cleverley 2017) points to issues where unsupervised machine learning from text overfits and ignores the wealth of knowledge we have in geoscience. Anchoring unsupervised machine learning frameworks with high level knowledge of what geoscientists find most useful for analogues may be preferable to avoid such overfitting issues.
The GEOSAPIEN® method uses a number of Geoscience Theories to inform the automated manipulation of text using Natural Language Processing (NLP) techniques in company reports and/or external literature, ‘boosting the geoscience signal’ before input into a text embedding model (such as word2vec, GLoVe, FastText, ELMo, BERT). These use effectively complex word co-occurrence and deep learning to define a word by its co-occurring words and the words that co-occur with those and so forth (vector). A key aspect is that GEOSAPIEN does this without significantly biasing or blinding us to new discoveries.
This allows a geoscientist to ask questions such as “What are good analogues for color changes in muddy contourites?”, the system does not return lists of documents, but instead returns answers (in the form of geological lithostratigraphic units) ranked by similarity to the geoscientist’s question. It does this by learning from company/external papers and reports that most closely match this context using vectors (not keywords).
In another use case, a geoscientist may also be working on a particular geological formation (such as the Tasraft Formation) and wish to know “What other formations around the world are good analogues for this formation? and I don’t want Palaeozoic ones“. The system can return the most similar Lithostratigraphic units it can find ranked by similarity to the Tasraft Formation vector (subtracting the vector for Palaeozoic).
The more text put into the model, the more accurate it is likely to be, with the GEOSAPIEN method providing a geoscience framework for machine learning – producing more useful geoscience analogues than pure unsupervised methods.
There are a number of ways to use and exploit GEOSAPIEN:
- Companies can license the Intellectual Property (IP) method and build their own technology systems (models and user interfaces) in their pipelines (or incorporate it into what they already have) with advice from Infoscience Technologies if needed.
- Companies can ask Infoscience Technologies to supply the technology and apply it to their text (and/or 3rd party subscription text the company has the rights to perform text and data mining on), so licensing a technology and IP.
- Variants on above
For more information contact: email@example.com
Unlocking the hidden codes in geoscience unstructured text