Product / Gentrace.ai

Gentrace

Designed the v1 of a cutting-edge devtools platform enabling LLM-first companies to monitor, evaluate, and debug AI agents

Context

AI apps are notoriously "black boxes", making them difficult to debug. As more LLM-based products interact with a variety of APIs, tools, and users in multi-step workflows, developers need better ways to evaluate, monitor, and debug these systems.

Gentrace is an AI observability platform that instruments LLM apps using the OpenTelemetry standard, recording traces for every AI interaction. Built as an AI-native tool, Gentrace lets developers query and chat with their traces to create new columns, custom views, filters, and alerts, turning opaque workflows into actionable insights for performance monitoring and error analysis.

Impact

As the sole designer in the startup, I shipped a variety of features around two main focuses:

  • Shift the initial evals-focused product to trace logging
  • Revamp the visual language to accommodate for the increased complexity of the new version.

Gentrace refocused its product to target agentic AI start-ups by shifting from purely evals, which are mostly static tests to benchmark model performance, to active trace logging on live interactions. I built new features were all AI-native and involved deep technical understanding of how AI interactions operate:

  • Collaborating with a LLM (via chat) to create helpful custom values
  • Applying filters and views to support workflows
  • Visualizing OTEL performance by introducing the first graphing elements into the app
  • Building alerts with email and Slack integration

These new features all introduced new panels and content sections that had to be fit into an increasingly cramped design system. To accommodate for this increased data density, I designed and shipped (in code) an extensive visual revamp to improve the clarity of the information architecture, while also making the product look more modern.


Please contact me with any inquiries about this project. All data is notional and mocks shown here are representational.

AI-native interaction patterns: Users build onto their trace views with additional AI-computed columns to flexibly evaluate useful metrics like sentiment, printing error logs, analyzing accuracy rates, and more.

Trace Details: Details of every interaction in the trace are clearly outlined, along with any tool calls and system/user messages.

Before (left) and after (right) of UI changes: The original designs featured 5 levels of grays and an artificially tighter-kerned font, which worked well when the app had less panels and content. With the increased feature complexity, the headers became banded and distracting, and the lower contrast made the dense UI difficult to read. I changed the font, increased the background contrasts while reducing the number of elevation levels, and established a uniform system in Figma.

Create with AI: Gentrace's internal chat feature allows users to use AI to create additional helpful columns of additional values. These values are programmatic and can be edited afterwards.