Client
A media and entertainment company aimed to build a real-time voice changer app for Android. They wanted:
-
Live voice effects
-
Fast audio processing
-
Pitch shifting and tempo modification
-
High-quality MP3 recording
-
Minimal latency
-
Compatibility with different Android devices
-
Smooth UI and instant playback
Their existing prototype used basic AudioTrack/AudioRecord pipelines with Java processing, which resulted in:
-
High latency
-
Poor quality effects
-
CPU overload on mid-range devices
-
Inconsistent behavior across Android versions
They needed a native, high-performance audio processing engine.
Project Overview
We engineered a real-time voice processing engine using:
-
FFmpeg for decoding, filtering, and mixing
-
libmp3lame for high-quality MP3 encoding
-
SoundTouch for real-time pitch & tempo manipulation
-
Native C/C++ code (JNI) for low-latency audio pipeline
-
OpenSL ES / AAudio for fast audio I/O
The final app provides instant voice effects during recording or playback.
Key Challenges
1. Achieving Real-Time Processing
Applying pitch, tempo, and filter effects without delay required native audio processing pipelines.
2. Audio Latency Issues
Typical Java-based audio APIs created latency that made live voice changing unusable.
3. Cross-Device Compatibility
Different Android devices use:
-
Different sample rates
-
Different buffers
-
Different audio hardware paths
Ensuring consistent performance was crucial.
4. High-Quality Encoding
The client required high-quality MP3 export, not raw PCM.
Our Solution
1. Native Audio Pipeline Using C++ & JNI
We built:
-
Native audio engine
-
Real-time audio buffer queues
-
Separate threads for input, processing, output
This ensured minimal latency and smoother playback.
2. SoundTouch for Real-Time Pitch & Tempo Manipulation
We integrated SoundTouch with custom optimizations:
-
Pitch shifting
-
Tempo changes
-
Voice deepening/high effects
-
Robot, chipmunk, monster, echo effects
Optimizations included:
-
SIMD acceleration where available
-
Reduced buffer copies
-
Custom tuning for responsiveness
3. FFmpeg for Audio Filters and Pre/Post Processing
FFmpeg was compiled with:
-
libswresample
-
audio filters
-
libavcodec / libavutil
Used for:
-
Equalizer effects
-
Reverb, chorus, echo
-
Noise reduction
-
Format conversion
-
Mixing background audio
4. libmp3lame for High-Quality MP3 Export
Many voice changer apps export low-quality audio.
We enabled:
-
128 kbps / 192 kbps MP3
-
CBR or VBR modes
-
Efficient real-time streaming into encoder
This provided studio-grade output quality.
5. Real-Time Input/Output via OpenSL ES or AAudio
Depending on device:
-
OpenSL ES for older versions
-
AAudio for Android 8+
Benefits:
-
Low-latency recording
-
Smooth playback
-
Less jitter
-
Stable buffer flow
6. Custom Audio Mixer & Effects Layer
We built a flexible effects engine that lets users:
-
Chain multiple effects
-
Adjust effect intensity with sliders
-
Preview changes in real time
-
Apply filters to pre-recorded clips
Effects included:
-
Pitch shift
-
Tempo change
-
Echo
-
Reverb
-
Distortion
-
Radio effect
-
Background music mixing
7. Cross-Device Compatibility Handling
We added:
-
Automatic sample rate detection (44.1kHz / 48kHz)
-
Dynamic buffer negotiation
-
Fallback paths for low-end devices
-
Graceful degradation when hardware is limited
Architecture Diagram (Text Version)
Results & Impact
Real-Time Effects
Effects applied instantly during recording and preview.
Low Latency
End-to-end latency reduced to a minimal, interactive level.
High Audio Quality
MP3 export produced clear, distortion-free audio.
Smooth UI and Workflow
Users can switch effects without pauses or reprocessing.
Broad Device Compatibility
Stable performance on mid-range phones, low-end devices, and newer flagships.
Efficient Performance
Native processing reduced CPU load by 40–60% compared to Java-based implementation.
Conclusion
By combining FFmpeg, libmp3lame, SoundTouch, and a native audio engine, we developed a high-performance real-time voice changer app for Android. The solution provides fast audio processing, smooth live previews, and professional-quality output—ideal for entertainment apps, content creators, and voice-based tools.

Written by
Oliver Thomas
Oliver Thomas is a passionate developer and tech writer. He crafts innovative solutions and shares insightful tech content with clarity and enthusiasm.




