Posted by Niharika Arora, Developer Relations Engineer
In Part 1 and Part 2 of our “Optimizing for Android Go” blog series, we discussed why we should consider building for Android Go and how to optimize your app to perform well on Go devices. In this blog, we will talk about the tools which helped Google optimize their Google apps performance.
Tools
Monitoring Memory
Analyze Memory Footprint
Resident Set Size (RSS): The number of shared and non-shared pages used by the appProportional Set Size (PSS): The number of non-shared pages used by the app and an even distribution of the shared pages (for example, if three processes are sharing 3MB, each process gets 1MB in PSS)Note: Private Set Size (PSS) = Private memory + (shared memory / the number of processes sharing).Unique Set Size (USS): The number of non-shared pages used by the app (shared pages are not included)
PSS is useful for the operating system when it wants to know how much memory is used by all processes since pages don’t get counted multiple times. PSS takes a long time to calculate because the system needs to determine which pages are shared and by how many processes. RSS doesn’t distinguish between shared and non-shared pages (making it faster to calculate) and is better for tracking changes in memory allocation.
So, which method should you choose? The choice depends on the usage of shared memory.
For example, if the shared memory is being used by the application only then we should use the RSS approach. While, if the shared memory is taken by the Google Play Services then we should use the USS approach. For more understanding, please read here.
2. Take a heap dump and analyze how much memory is utilized by the running processes. Follow
Developer options Don’t keep activities must be OFF.
Use recent release builds for testing.
Execute the user journey you desire to measure.
Run the following command: adb shell am dumpheap <You Android App Process ID> <output-file-name>
3. Understand low-memory killer
In Android, we have a process called low memory killer, and this will pick a process from the device and will kill that process when the device is under low RAM, the thresholds can be tuned by OEMs. By doing so, you will get back all the memory that the process was using.
But what if the low memory killer kills the process that the user cares about?
Tools
This is one the best tools to find where all your app memory is consumed. Use Perfetto to get information about memory management events from the kernel. Deep dive and understand how to profile native and Java heap here.
The Memory Profiler is a component in the Android Profiler that helps you identify memory leaks and memory churn that can lead to stutter, freezes, and even app crashes. It shows a real time graph of your app’s memory use and lets you capture a heap dump, force garbage collections, and track memory allocations. To learn more about inspecting performance, please check MAD skills videos here.
adb shell dumpsys meminfo <package_name|pid> [-d]
Java heap – memory allocated by Java code
Native heap – memory allocated by native code. These are best understood using debug malloc. Allocations made by the application from C or C++ code using malloc or new.
Code – memory used for Java, native code and some resources, including dex bytecode, shared libraries and fonts
Stack – memory used for both native and Java stacks. This usually has to do with how many threads your application is running.
Graphics – Memory used for graphics buffer queues to display pixels to the screen, GL surfaces and textures and such.
Private Other – uncategorized private memory
System – memory shared with the system or otherwise under the control of the system.
Key memory terms:
Private – Memory used only by the process.
Shared – System memory shared with other processes.
Clean – Memory-mapped pages that can be reclaimed when under memory pressure.Dirty – Memory-mapped page modified by a process. These pages may be written to file/swap and then reclaimed when under memory pressure.
Note :
Debug class is super useful and provides different methods for Android applications, including tracing and allocation counts. You can read about usage here.For deeper understanding and tracking allocations for each page, read about page owner here.
$adb root
$ adb shell pgrep <process>
Output – process id
$ adb shell showmap <process id>
Sample Output :
virtual shared shared private private
size RSS PSS clean dirty clean dirty object
——– ——– ——– ——– ——– ——– ——– ——————————
3048 948 516 864 0 84 0 /data/app/……
2484 2088 2088 0 0 2084 4 /data/app/……..
144 72 2 68 4 0 0 /data/dalvik-cache/arm64/system@framework@<…>.art
216 180 5 176 4 0 0 /data/dalvik-cache/arm64/system@framework@<…>.art
168 164 8 136 24 0 4 /data/dalvik-cache/arm64/system@framework@<…>.art
12 8 0 4 4 0 0 /data/dalvik-cache/arm64/system@framework@<…>.art
1380 1300 73 1100 164 0 36 /data/dalvik-cache/arm64/system@framework@<…>.art
…
Common memory mappings are:
[anon:libc_malloc] – Allocations made from C/C++ code using malloc or new.
*boot*.art – The boot image. A Java heap that is pre-initialized by loading and running static initializers where possible for common frameworks classes.
/dev/ashmem/dalvik-main space N – The main Java heap.
/dev/ashmem/dalvik-zygote space – The main Java heap of the zygote before forking a child process. Also known as the zygote heap.
/dev/ashmem/dalvik-[free list ] large object space – Heap used for Java objects larger than ~12KB. This tends to be filled with bitmap pixel data and other large primitive arrays.
*.so – Executable code from shared native libraries loaded into memory.
*.{oat,dex,odex,vdex} – Compiled dex bytecode, including optimized dex bytecode and metadata, native machine code, or a mix of both.5. Analyze native memory allocations using malloc debug
ASan runs on both 32-bit and 64-bit ARM, plus x86 and x86-64. ASan’s CPU overhead is roughly 2x, code size overhead is between 50% and 2x, and the memory overhead is large (dependent on your allocation patterns, but on the order of 2x). To learn more, read here.
Camera from the Google team used it and automated the process that would run and get back to them in the form of alerts in case of Asan issues, and found it really convenient to fix memory issues missed during code authoring/review.
Monitoring Startup
Analyze Startup
1. Measure and analyze time spent in major operations
Once you have a complete app startup trace, look at the trace and measure time taken for major operations like bindApplication, activitystart etc.
Look at overall time spent to
Identify which operations consume high time where it is not expected.
Identify which operations cause the main thread to be blocked2. Analyze and identify different time consuming operations and their possible solutions
Identify all time consuming operations.Identify any operations which are not supposed to be executed during startup (Ideally there are a lot of legacy code operations which we are not aware about and not easily visible when looking at our app code for performance)Identify which all operations are absolutely needed OR could be delayed until your first frame is drawn.
Measure time taken for overall operations of measure, draw,inflate,animate etc.
Look at frame drops.
Identify layouts taking high time to render or measure.
Identify assets taking a long time to load.
Identify layouts not needed but still getting inflated.
Tools
To know CPU usage, thread activity, frame rendering time, Perfetto will be the best tool.Record trace either by using command line or UI tools like Perfetto. Add app package name with the -a tag, to filter data for your app. Some ways to capture trace :Capture a system trace through command lineRecord using a quick settings tileProduces a report combining data from the Android kernel, such as the CPU scheduler, disk activity, and app threads.Best when enabled with custom tracing to know which method or part of code is taking how long and then develop can dig deep accordingly.Understand Atrace, and ftrace while analyzing traces through Perfetto.2. App Startup library
The App Startup library provides a straightforward, performant way to initialize components at application startup. Both library developers and app developers can use App Startup to streamline startup sequences and explicitly set the order of initialization. Instead of defining separate content providers for each component you need to initialize, App Startup allows you to define component initializers that share a single content provider. This can significantly improve app startup time. To find how to use it in your app, refer here.
Baseline Profiles are a list of classes and methods included in an APK used by Android Runtime (ART) during installation to pre-compile critical paths to machine code. This is a form of profile guided optimization (PGO) that lets apps optimize startup, reduce jank, and improve performance for end users. Profile rules are compiled into a binary form in the APK, in assets/dexopt/baseline.prof.
During installation, ART performs Ahead-of-time (AOT) compilation of methods in the profile, resulting in those methods executing faster. If the profile contains methods used in app launch or during frame rendering, the user experiences faster launch times and/or reduced jank. For more information on usage and advantages, refer here.
Method and function traces: For each thread in your app process, you can find out which methods (Java) or functions (C/C++) are executed over a period, and the CPU resources each method or function consumes during its execution.5. Debug API + CPU Profiler
Debug.startMethodTracing(“sample”) – Starts recording a trace log with the name you provide
Debug.stopMethodTracing() – he system begins buffering the generated trace data, until the
application calls this method.
Usage
Added in API level 30 and supported in the latest Studio Bumblebee preview (2021.1)Uses simpleperf with customized build scripts for profiling.Simpleperf supports profiling java code on Android >M.Profiling a release build requires one of following:Device to be rootedAndroid >=O, use a script wrap.sh and make android::debuggable=“true” to enable profiling.Android >=Q, add profileable in manifest flag.
<profileable android:shell=[“true” | “false”] android:enable=[“true” | “false”] />
The Jetpack Microbenchmark library allows you to quickly benchmark your Android native code (Kotlin or Java) from within Android Studio. The library handles warmup, measures your code performance and allocation counts, and outputs benchmarking results to both the Android Studio console and a JSON file with more detail. Read more here.
Monitoring App size
No user wants to download a large APK that might consume most of their Network/Wifi Bandwidth, also most importantly, space inside the mobile device.
The size of your APK has an impact on how fast your app loads, how much memory it uses, and how much power it consumes. Reducing your app’s download size enables more users to download your app.
Tools
Use the Android Size Analyzer
Note : Libraries that you add to your code may include unused resources. Gradle can automatically remove resources on your behalf if you enable shrinkResources in your app’s build.gradle file.
For more details on the API, refer to the API reference and the sample on GitHub.
Note : Please check Android developer documentation for all the useful tools which can help you identify and help fix such performance issues.
Recap
This part of the blog captures the tools used by Google to identify and fix performance issues in their apps. They saw great improvements in their metrics. Most Android Go apps could benefit from applying the strategies described above. Optimize and make your app delightful and fast for your users!
Optimize for Android (Go edition): Lessons from Google apps – Part 1Optimize for Android (Go edition): Lessons from Google apps – Part 2Optimize for Android (Go edition): Lessons from Google apps – Part 3