University of Essex

Brief

University of Essex selected Magora as a prominent AI development company to develop a critical AI Orchestrator module for managing secure data processing.

The core requirement was a specialised tool capable of intelligently analysing files within a specified directory to automatically generate a structured .yaml settings file. This file would contain determined confidentiality levels and rich semantic metadata, enabling other modules in the AI Orchestrator pipeline to process the files appropriately.

Given the highly sensitive nature of the data, the use of cloud-based AI was prohibited, requiring an on-premise system.

Solution

Our team architected and developed a secure, self-contained AI Orchestrator module from the ground up, utilising a sophisticated two-model approach to address the project challenges.

The solution employs two specialised local models:

  1. A large language model (LLM), specifically Mistral (7B), was integrated to dynamically analyse text and define new semantic tags and confidentiality rules. This addresses the challenge of being dependent on a static ruleset, allowing the system to discover and propose new tags from the data itself, though all proposals are flagged for human review to ensure precision.

  2. A Named Entity Recognition (NER) model, a finetuned FacebookAI/roberta-large, was implemented to utilise these rules with high precision. This model was meticulously trained on a real, annotated dataset. This finetuning process taught the model to accurately recognize and extract the approved semantic tags from text, with its performance rigorously validated through F1 score testing against a benchmark dataset.

The module's workflow is designed for continuous improvement:

1) Files are processed to extract semantic information

2) The LLM suggests new rules

3) A human expert reviews and annotates samples with these new tags

4) The NER model is periodically retrained on the expanded dataset

The entire system processes files locally on specified paths, combining the LLM's exploratory power with the NER model's precision to output a standardised .yaml file containing confidentiality levels and semantic metadata for seamless consumption by subsequent modules.

Result

The collaboration resulted in a secure and highly adaptive AI Orchestrator module that fundamentally automated and secured the data classification process for the University of Essex.

The product significantly enhanced data governance and security protocols. It successfully balanced the need for automated discovery with the requirement for high-precision, reliable extraction.

This hybrid approach eliminated the risk of human error in manual classification and entirely avoided the data exposure risks associated with third-party cloud APIs.

The local implementation ensured guaranteed data sovereignty and compliance with the strictest security policies. The project delivered a tailored, scalable, and foundational component that intelligently grows more capable over time, fully meeting the client's precise technical and security objectives.

next
Artificial intelligence apps
Empath AI
see more
Logo Magora LTD
close
Get in touch
Open list
Open list
Open list
Logo Magora LTD
close
Thank you very much.

Your registration to the webinar on the 27th of September at 2 p.m. BST was successfuly completed.
We will send you a reminder on the day before the event.
Magora team
Registration for a webinar

"Let Smart Bots Speed up your Business"
Date: 27.09.2018 Time: 2 p.m. BST
Do you agree to the personal data processing?