Enhancing output from ChatGPT

We have been comparing ChatGPT responses using statistical similarity v knowledge representations in the automated text selection process.

There are token limits with Gen AI tools like ChatGPT meaning the selection of the subset of text from a search index is critical. Whilst statistical similarity of sentences or chunks of sentences that match the initial question, will ensure reasonable sounding response and answers, it may be useful to see what may be missed. What might be termed a ‘false negative’ of sorts based on expectations.

In the comparison attached, to the question “provide a source rock summary” using the same document results, the top response is using statistical similarity, the bottom using a large petroleum systems knowledge representation used by Infoscience’s NLP algorithms.

Highlighted in yellow is the unique response information produced by both methods. We won’t comment on which is better.

One takeaway is perhaps the differences and how sensitive outcomes can be to the original input before the LLM algorithm is even applied.

If a Geoscientist or Engineer does not know what they are missing they may be obliviously satisfied with results churned out by ChatGPT but it has in fact left out or smoothed out, some interesting and useful information. Note that this “smoothing” may have occured during the text selection process from a company’s search index, even before the ChatGPT API has even been applied.

From a prompt engineering perspective, we may be able to nudge these systems into providing more of the types of responses we want by teaching them about the domain as part of the workflow.

Infoscience’s algorithms exploit language models for narrow extraction tasks as part of the overall method. In addition, the outputs from Infoscience’s algorithms can be used to enhance the experience of applying ChatGPT to a corpus of documents.

More: contact@infosciencetechnologies.com

Leave a comment

Blog at WordPress.com.

Up ↑