AI Tools & Software in ONTOX

AI TOOLS & Software in Ontox

This section highlights AI tools and software used or developed within the ONTOX project.

AI TOOLS and software

Integration of AI TOOLS and ADVANCED Software solutions represent a paradigm shift in chemical safety assessment.

Sysrev

Sysrev is a human-in-the-loop evidence review platform that accelerates systematic literature review and data extraction workflows. The platform enables teams to collaboratively screen, annotate, and extract structured data from scientific literature with integrated quality control and validation features. Sysrev has supported over 20,000 users across academic, government, and industry applications. Within ONTOX, Sysrev facilitated collaborative evidence synthesis, automated data extraction using large language models, and standardized data curation workflows. The platform supported 62 ONTOX projects including publications on genetic susceptibility in Parkinson’s disease risk assessment and drug-induced fatty liver disease ontogeny.

Website: https://sysrev.com

ToxIndex

ToxIndex is an agentic platform for regulatory toxicology that integrates data sources, predictive models, and analytical workflows to automate risk assessment processes. The system orchestrates multiple computational tools to address chemical safety evaluation challenges, supporting applications from hazard identification through regulatory dossier generation. ToxIndex combines curated toxicology data infrastructure with AI-powered workflow automation to reduce assessment timelines while maintaining scientific rigor and audit trails suitable for regulatory review. Within ONTOX, ToxIndex capabilities supported data gap analysis, automated identification of relevant testing strategies, and demonstration of the OPRA (ONTOX Probabilistic Risk Assessment) framework at the ECETOC-VHP4Safety-ONTOX workshop in Brussels (October 2025).

Website: https://toxindex.com

ToxTransformer

ToxTransformer is a proprietary multi-property prediction model for chemical toxicity assessment. The model predicts multiple toxicological properties simultaneously from molecular structure, enabling conditional predictions where known properties improve accuracy for unknown endpoints. Trained on the ChemHarmony dataset (117 million chemicals, 254 million chemical activity records), ToxTransformer predicts over 4,000 toxicological, ADMET, and environmental properties. The model’s architecture enables cross-property transfer learning and counterfactual analysis to support testing prioritization and read-across approaches. Within ONTOX, ToxTransformer was integrated into the ToxIndex platform to support data gap filling, hazard prediction, and identification of relevant OECD in vitro assays for missing endpoints.

DockTox

DockTox is an online molecular docking platform designed to automatically screen small molecules against a predefined set of proteins associated with molecular initiating events (MIEs) for several toxicity endpoints, including liver steatosis and cholestasis, nephrotoxicity, neural tube closure, and cognitive functional defects in the brain. DockTox is not an AI-based tool but a deterministic computational workflow that automates a series of predefined docking calculations. The server accepts small molecules encoded in SMILES format, automatically generates conformers, and performs molecular docking against the selected protein targets. The workflow provides predicted binding energies, lists of interacting residues, interaction fractions, and interaction maps describing ligand-protein interactions.

Website: https://chemopredictionsuite.com/DockTox

ECCS-DT

The ECCS-DT tool is a computational workflow developed to classify small molecules according to their predominant clearance mechanism following the Extended Clearance Classification System (ECCS) described by Varma et al., 2015. This workflow follows a predefined rule-based decision tree which uses pH-dependent ionisation, molecular weight, and permeability of molecules. When experimental values are not available, these properties are calculated, including QSAR predictions for permeability, to complete the classification.

Website: https://chemopredictionsuite.com/ECCS-DTpredictor

Link to publication: Predicting Clearance Mechanism in Drug Discovery: Extended Clearance Classification System (ECCS)

QSAR Physchem - Models for Physicochemical Property Prediction

QSAR models were used to predict physicochemical properties required for downstream modelling workflows. These models rely on machine learning approaches trained on experimental datasets to estimate molecular properties directly from chemical structure. Within ONTOX, these predictions were used as part of a benchmarking exercise evaluating different widely used prediction platforms which provide QSAR-based estimations of physicochemical and ADMET-related properties (Gadaleta et al., 2024). The results supported data gap filling and enabled the integration of predicted parameters into subsequent computational workflows.

Link to publication: Comprehensive benchmarking of computational tools for predicting toxicokinetic and physicochemical properties of chemicals

The prediction platforms included in the benchmarking are:

ADMETlab: https://admetmesh.scbdd.com
OCHEM: https://ochem.eu/home/show.d
OPERA: https://ntp.niehs.nih.gov/whatwestudy/niceatm/comptox/ct-opera/opera
ProtoPRED: https://protopred.protoqsar.com/
VEGA: https://www.vegahub.eu/portfolio-item/vega-qsar/
vNNADMET: https://vnnadmet.bhsai.org/vnnadmet/

VEGA QSAR

VEGA QSAR is a tool within the VEGA HUB suite of programs that provides computational models to for the prediction of chemical properties and toxicological endpoints. Within ONTOX, new machine learning models were developed and implemented in VEGA to predict the interaction of chemicals with 25 protein targets associated with molecular initiating events upstream of the project’s adverse outcome pathways. Within the ONTOX workflows, the new VEGA models have been applied to support data gap filling and to complement experimental evidence, providing mechanistic insights that help strengthen in vitro results in the project case studies.

Website: https://www.vegahub.eu/portfolio-item/vega-qsar/

Llemy - LLM-based system to explore molecular maps

Llemy is an online agentic system designed for the exploratory analysis and summarisation of molecular interaction maps. The platform addresses the complexity of navigating large biomedical knowledge repositories by providing an LLM-based interface to structured diagrams hosted on the MINERVA Platform and shared via MINERVA Net. Llemy utilises a multi-agent architecture where a retrieval agent fetches map data (nodes, edges, and annotations) via the MINERVA API, and a synthesis agent (GPT-4.1) processes this information alongside user prompts to generate answers. A key feature of the system is its transparency; it provides clickable, verifiable links to specific map elements and literature references, allowing users to audit the data supporting each response. Within ONTOX, in collaboration with VHP4Safety, the Disease Maps community and Elixir Luxembourg, Llemy was developed through a user-driven process starting with a hackathon in Liège to facilitate the exploration of liver lipid and bile metabolism maps, known as physiological maps. Evaluation by domain experts showed that the tool is particularly effective for summarising maps and identifying pathway connections, with users reporting that the system saved them time in their research workflows. The system is available as a public web service and as a containerised application for local deployment.

Website: https://llemy.vhp4safety.nl
Code Repository: https://github.com/ontox-project/Llemy
Preprint: https://www.biorxiv.org/content/10.64898/2026.03.10.710813v2
MINERVA tutorial: https://www.youtube.com/watch?v=CKKpAvSq560

NeuroTox-KPGT

Early identification of chemical activity on MIE-relevant protein targets supports first-line toxicity assessment and helps researchers prioritize mechanisms for subsequent experimental investigation. Here we present an automated AI pipeline NeuroTox-KPGT that converts raw ChEMBL bioactivity data into optimized deep learning models for MIE prediction. The pipeline builds on the Knowledge-Guided Pre-training of Graph Transformer (KPGT) framework, which represents chemical structures as knowledge-enriched molecular graphs. It integrates data curation, molecular graph generation, and model training and tuning. This integration enables users to construct target-specific prediction models in a seamless and reproducible way, starting from initial data and ending with deployable AI. We demonstrate its use in a neural tube defect (NTD) case study, where fine-tuned KPGT models outperformed traditional Support Vector Machine models with a radial basis function kernel (SVM-RBF) when predicting MIEs linked to developmental toxicity. The results highlight the potential of AI-driven toxicity modeling to accelerate AOP development, improve endpoint prioritization, and prioritize chemicals for experimental follow-up. By providing an end-to-end, data-to-model workflow, the pipeline lowers the technical barrier to using modern graph-based neural architectures in toxicology. It offers a reproducible route to deployable MIE prediction models that support AOP development, compound prioritization, and early-stage chemical safety evaluation.

Code repository: https://github.com/MerelFlorian/NeuroTox-KPGT

Link to publication: A pipeline for developing AI-driven models to predict molecular initiating events: a case study on neural tube defects

Leaflet

acknowledgmentS

Sysrev, ToxIndex and ToxTransformer tools were developed by Insilica with US federal grant funding (NSF and NIH SBIR).

DockTox and ECCS-DT tools were developed by ProtoQSAR, QSAR Models for Physicochemical Property Prediction were carried out by Istituto di Ricerche Farmacologiche Mario Negri (IRFMN) and ProtoQSAR.

Llemy was developed in collaboration with VHP4Safety, the Disease Maps community, and user testing of was supported and funded by Elixir Luxembourg.

These tools were refined using ONTOX datasets and made available to consortium partners to support the project’s research objectives in next-generation risk assessment.