Posted by Alice Yuan, Developer Relations Engineer at Google, Arti Arutiunov, Product Manager at Datadog and Nikita Ogorodnikov, Staff Software Engineer at Datadog
Performance regressions are notoriously hard to reproduce, making regressions a massive bottleneck for mobile developers. Although signals like ANR rates indicate what issues occur in production, pinpointing the specific line of code that resulted in the performance issue has historically necessitated exhaustive manual reproduction or speculative trial-and-error experimentation.
Datadog collaborated with Google to mitigate this frustration by integrating the ProfilingManager API (available on Android 15+ devices) into its Real User Monitoring (RUM) and Continuous Profiling platforms. This integration transforms the debugging workflow, allowing developers to move beyond surface-level symptoms to being able to detect the why behind a performance bottleneck.
By leveraging this system-level API, Datadog now processes millions of production profiles weekly across the globe according to Datadog internal data of June 2026. It provides engineering teams with a new level of visibility into real-world performance, all while maintaining a low runtime overhead for production-scale performance monitoring.
The impact of ProfilingManager
ProfilingManager is a system service introduced in Android 15 that enables apps to programmatically collect performance data such as call stack samples, field traces and memory heap dumps directly from production environments. This capability shifts the engineering paradigm from reactive manual reproduction to proactive field analysis.
For example, a Google communications app used field traces to investigate why its cold start times were slower on newer, more powerful hardware. By diving into the field-collected traces and comparing traces across different device types, the engineer discovered a hidden scheduling issue: a background text-to-speech service was unnecessarily being prewarmed during app startup. The traces revealed that this background process was monopolizing the device’s highest-performing big CPU core, forcing the app’s main thread to sleep while the prewarm occurred.
Solving the Android code-level visibility challenge
Prior to the implementation of ProfilingManager, Datadog’s Real User Monitoring (RUM) focused on high-level application health and session-level telemetry to assess the user journey. Engineering teams could monitor Android performance signals like time to initial display, ANR rates, CPU load, and frozen frames. These insights extended to granular interactions, such as network latency, touch events, and main thread hangs. However, while this data effectively highlighted which performance bottlenecks were surfacing in the field, it provided no clear path to identifying the root cause of these failures.
To address this, Datadog needed a profiling engine capable of capturing Android traces directly from devices in production with minimal performance impact. After evaluating alternative approaches, such as writing their own trace processor using Android Debug APIs, the team selected ProfilingManager because it is the most performant solution of the profiling options they evaluated and offloads the sampling decisions overhead to the OS.
ProfilingManager supports a wide range of collection methods, including CPU traces, call stack sampling, memory analysis through Java heap dumps and native heap profiles. It enables developers to profile production builds, upload trace files to external storage, and review them in the Perfetto trace analyzer UI. As a SaaS provider, Datadog uploads, visualizes, and analyzes these profiles collected via its SDK, providing a unified view of application health.
By centralizing high-fidelity telemetry within a unified observability API, ProfilingManager empowers Datadog and its clients to proactively monitor, investigate, and remediate complex Android performance regressions through key technical advantages:
- Granular session diagnostics: ProfilingManager enhances debuggability by delivering direct OS-level trace data, overcoming the visibility and alignment challenges typical of custom logging with system services. To dive deeper, developers can download these traces from Datadog to investigate further in visualization tools like the Perfetto UI.
- Automated telemetry triggers: By leveraging native system events to initiate trace recordings at key optimization points, Datadog reduces the need to build custom collection logic. While the initial rollout focuses on the APP_FULLY_DRAWN signal, there are already plans to expand this observability to include ANR, OOM, and COLD_START triggers.
- Proactive trace snapshots: By interfacing directly with the system-level Perfetto service (traced), ProfilingManager utilizes a proactive background recording model designed to capture unpredictable issues. This ensures that developers receive a precise visualization of the events leading up to a performance anomaly, offering a level of insight that exceeds what is possible through manual instrumentation.
- Bottleneck detection at scale: Datadog is able to synthesize telemetry from across Datadog’s global customer base to uncover regressions that only emerge under unique hardware configurations and variable network environments.
- System-enforced resource stability: The API leverages sampling trace collection to ensure performance and user experience impacts remain unnoticeable.
- On-device data controls: ProfilingManager filters out irrelevant information from other processes on-device before the profile is delivered to the app. This minimizes file sizes and ensures that only data relevant to the app’s processes is provided.
Processing millions of weekly profiles to optimize real-world apps
Integrating a system-level profiling API into a global monitoring SDK required solving infrastructure challenges. Because ProfilingManager generates highly detailed performance traces, the Datadog engineering team had to build a pipeline capable of parsing and analyzing these profiles on the server side at scale. Beyond profile collection, Datadog also emphasizes the importance of balancing sampling frequency with collecting enough data to generate meaningful insights about your application. Datadog relies on ProfilingManager’s built-in rate limiting as a critical stability safeguard, preventing excessive telemetry requests from overburdening user devices.
The team has been profiling Datadog’s own native Android application and a number of early adopters’ applications for months, gathering millions of profiles to ensure a fast, error-free launch experience and to refine their performance-detection algorithms. Today, the production integration seamlessly scales across a variety of Android devices.
Conclusion
By integrating Android’s ProfilingManager API, Datadog successfully closed the visibility gap between backend systems and mobile client applications for their customers. By processing millions of profiles weekly with negligible device overhead, Datadog equips Android developers with the code-level insights necessary to diagnose complex performance bugs instantly, helping developers build smoother applications and improve their app’s performance signals in the Play Store. To adopt the ProfilingManager API directly into your performance observability framework, check out our documentation.
In the future, Datadog aims to make Android profiling data a first-class input for coding agents to autonomously resolve performance bottlenecks, closing the feedback loop between detection and remediation. Datadog is working toward making Android profiling broadly accessible to developers.
To get started using the Datadog real user monitoring feature powered by ProfilingManager, visit Datadog Mobile Real User Monitoring.

