News & Announcements

vijil
August 28, 2024
Vijil came out of stealth on July 24 with funding from Mayfield's AIStart seed fund and Gradient Ventures, Google's AI-focused seed fund. We are thrilled to have them as partners backing our venture.
Along with funding, we announced our first product, Vijil Evaluate, designed to help AI teams automate the QA of LLM-based applications. Vijil Evaluate tests the performance, reliability, security, and safety of a generative AI system (chatbot, virtual assistant, RAG, or custom SLM) within minutes using over 200,000 prompts. Available as a SaaS and as an on-prem service, Vijil Evaluate cuts out hundreds of hours of undifferentiated heavy lifting that go into the QA, AppSec, and GRC reviews of an enterprise-grade AI application.
We are building our products for enterprises in regulated industries that are drawn to the transformative potential of generative AI but are held back from deploying prototypes into production because the technology poses unknown risks to reputation and revenue. AI chatbots have recommended a competitor's product, confabulated airline ticket refund policies, and concocted legal cases. Enterprises today do not trust generative AI applications in business-critical use because the large language models (LLMs) inside are inherently unreliable, vulnerable to attack, and prone to damage.
To build AI applications that we can trust, we have to be able to measure trust. Sadly, neither LLM vendors nor enterprise AI teams today have metrics and mechanisms to measure trust, leave along improve it. AI teams rely on external red-team consultants, resort to benchmarks, or surrender to "vibe checks". But AI red-team consultants cannot scale. And benchmarks are broken in many ways. Most academic benchmarks are irrelevant to enterprise use cases. Publicly available benchmark data is pulled into the training data contaminating test results. Open-source benchmarking tools are neither fast nor free – they take many days and thousands of dollars to run one benchmark. Commercial evaluation services test for task performance but fail to test reliability, security, and safety. Meanwhile, the threats are unrelenting – developers must scour the media continuously for "jailbreaks" and malicious prompts. As a result, AI teams are delayed or blocked from deploying AI agents that they can trust.
Vijil is on a mission to help enterprises build and operate AI agents that people can trust. We’d rather spend all our time building trusted agents but, frankly, we’ve had to work from first principles to define metrics and build mechanisms to evaluate trust at speed and scale. We didn’t want to just test for task performance, hallucinations, and “tone”, and call it a day. We spent the better part of 2023 and 2024 scouring research preprints and academic benchmarks to build an enterprise-grade LLM evaluation framework.
Vijil measures the trustworthiness of an LLM by scoring its performance, reliability, security, and safety comprehensively. Using only a few samples from a customer’s log file combined with the Vijil database of over 200,000 prompts, we synthesize a test suite tailored to the customer’s use case. Not only do we test for correctness (MMLU-Pro benchmark), consistency (Vijil-developed benchmark), and hallucination (Vijil-developed benchmark), we use state-of-the-art red-team testing techniques to probe the LLM for jailbreaks, prompt injections, PII disclosure, presence of copyrighted content, robustness under out-of-distribution inputs, adversarial robustness, profanity, hateful speech, stereotypes, fairness, bias, and ethical behavior. We’ve tested the top-10 LLMs available publicly and you’ll find the results are often surprising and always useful.
Vijil Evaluate is available today for private preview. Individual developers and small teams can use Vijil Evaluate as a SaaS to test their LLM-based applications or SLMs, wherever they are hosted. We support hosts including AWS, Google Cloud, Replicate, Together, and OctoAI. Enterprises can deploy the Vijil platform within their private network on any cloud provider infrastructure or on-premises. Enterprises can rest assured that all application inputs, outputs, and metrics stay inside their corporate network. As a Vijil subscriber, you get continuous detection and mitigation of risks along with customized technical support.
If you’re building or operating an LLM-based application, save yourself time and trouble by testing it with the most comprehensive evaluation framework available today. Sign up and tells us what you think.