Name: From LLM vulnerabilities to AI agent red teaming & continuous evaluation
Start: 2026-06-30
End: 2026-06-30
Location: From LLM vulnerabilities to AI agent red teaming & continuous evaluation

Webinar Description

Key Takeaways

Explores real-world vulnerabilities in large language models (LLMs) used as customer-facing agents
Highlights the risks of prompt injection, hallucination, and bias in regulated industries
Demonstrates the limitations of standard benchmarking for LLM security
Showcases red teaming and continuous evaluation as practical solutions for risk mitigation
Features a live demonstration of the Giskard Hub workflow for vulnerability scanning and test suite development

Giskard AI’s live session, led by CTO Matteo Dora, delves into the operational risks and security challenges facing organizations deploying large language models as customer-facing agents. The event is tailored for technical leaders and decision-makers in regulated sectors, where compliance and reputational stakes are high.

Understanding LLM Vulnerabilities in Production

As generative AI systems become more deeply embedded in business operations, their vulnerabilities have moved from theoretical to practical concerns. Issues such as prompt injection, hallucination, and bias are no longer just technical curiosities—they represent real threats that can lead to compliance failures, security breaches, and reputational incidents, particularly in industries with strict regulatory oversight.

Why Standard Benchmarking Falls Short

Traditional benchmarking methods often provide a false sense of security. While benchmarks can measure performance under controlled conditions, they rarely expose the nuanced weaknesses that emerge in live, customer-facing environments. This gap leaves organizations vulnerable to risks that only become apparent after deployment.

Red Teaming and Continuous Evaluation: Closing the Gap

The session introduces red teaming as a proactive approach to uncovering hidden vulnerabilities in LLM-based systems. By leveraging automated probes and frameworks like the OWASP LLM Top 10, red teaming simulates real-world attacks and edge cases that standard tests miss. Continuous evaluation, meanwhile, ensures that security is not a one-time effort but an ongoing process, adapting to new threats as they arise.

Live Demonstration: Giskard Hub in Action

Attendees are guided through a hands-on demonstration of the Giskard Hub platform. The workflow includes launching vulnerability scans, interpreting results, and building reusable test suites for ongoing evaluation. This practical segment offers a clear view of how technical teams can operationalize AI security and governance within their organizations.

Industry Context and Audience Relevance

The event sits at the intersection of AI security, governance, and application risk management. It is particularly relevant for Heads of AI, product leads, and technical managers responsible for deploying LLM-based applications in sectors where compliance and trust are non-negotiable. The session not only addresses immediate technical challenges but also frames them within the broader context of organizational risk and industry standards.

Event Format and Experience

This virtual session combines technical insight with practical demonstration, offering both thought leadership and actionable guidance. The inclusion of a live Q&A segment encourages direct engagement and knowledge sharing among peers facing similar challenges in AI deployment.