AudioRecord collects audio data and synthesis

This article introduces the use of AudioRecord in Android audio and video development. The case will be based on the previous MediaCodec recording of MP4, using AudioRecord to synthesize audio data into MP4. The series of articles on Android audio and video are as follows:

The main content of this article is as follows:

Introduction to AudioRecord
AudioRecord lifecycle
Reading audio data from AudioRecord
Direct buffer and byte order (optional)
Using AudioRecord

Introduction to AudioRecord#

AudioRecord is an audio recording tool in Android used to record audio from hardware devices, obtaining audio data through a pulling method, generally used to obtain raw audio data in PCM format. It can achieve simultaneous recording and playback, often used for real-time processing of audio data.

The parameters and descriptions for creating AudioRecord are as follows:

// Create AudioRecord
public AudioRecord (int audioSource, 
                int sampleRateInHz, 
                int channelConfig, 
                int audioFormat, 
                int bufferSizeInBytes)

audioSource: Indicates the audio source, defined in MediaRecorder.AudioSource, such as the common audio source main microphone MediaRecorder.AudioSource.MIC, etc.
sampleRateInHz: Indicates the sampling rate in hertz, meaning the number of samples per channel per second. Among common sampling rates, only 44100Hz can ensure normal use on all devices. The actual sampling rate can be obtained through getSampleRate. This sampling rate is not the playback sampling rate of the audio content; for example, audio with a sampling rate of 8000Hz can be played on a device with a sampling rate of 48000Hz, and the corresponding platform will automatically handle the sampling rate conversion, so it will not play at 6 times the speed.
channelConfig: Indicates the number of channels, defined in AudioFormat. Among common channels, only mono AudioFormat.CHANNEL_IN_MONO can ensure normal use on all devices. Others, such as AudioFormat.CHANNEL_IN_STEREO, indicate dual channels, which is stereo.
audioFormat: Indicates the format of the audio data returned by AudioRecord. For linear PCM, it reflects the size of each sample (8, 16, 32 bits) and representation (integer, floating-point). Audio formats are defined in AudioFormat, and among common audio data formats, only AudioFormat.ENCODING_PCM_16BIT can ensure normal use on all devices. Formats like AudioFormat.ENCODING_PCM_8BIT cannot guarantee normal use on all devices.
bufferSizeInBytes: Indicates the size of the buffer for writing audio data. This value cannot be less than the size of getMinBufferSize, which is the minimum buffer size required by AudioRecord; otherwise, it will cause AudioRecord initialization to fail. This buffer size does not guarantee smooth recording under load, and a larger value can be chosen if necessary.

AudioRecord Lifecycle#

The lifecycle states of AudioRecord include STATE_UNINITIALIZED, STATE_INITIALIZED, RECORDSTATE_RECORDING, and RECORDSTATE_STOPPED, corresponding to uninitialized, initialized, recording, and stopped recording, as shown in the diagram below:

Mermaid Loading...

A brief explanation:

Before creation or after release, AudioRecord enters the STATE_UNINITIALIZED state.
When creating AudioRecord, it enters the STATE_INITIALIZED state.
Calling startRecording enters the RECORDSTATE_RECORDING state.
Calling stop enters the RECORDSTATE_STOPPED state.

So how to obtain the state of AudioRecord? You can get its state through getState and getRecordingState. To ensure correct usage, you can check its state before operating on the AudioRecord object.

Reading Audio Data from AudioRecord#

AudioRecord provides three ways to read audio data, as follows:

// 1. Read audio data, audio format is AudioFormat＃ENCODING_PCM_8BIT
int read(@NonNull byte[] audioData, int offsetInBytes, int sizeInBytes)
// 2. Read audio data, audio format is AudioFormat＃ENCODING_PCM_16BIT
int read(@NonNull short[] audioData, int offsetInShorts, int sizeInShorts)
// 3. Read audio data, see later chapters
int read(@NonNull ByteBuffer audioBuffer, int sizeInBytes)

The return value of reading audio data is greater than or equal to 0. Common exceptions when reading audio data are as follows:

ERROR_INVALID_OPERATION: Indicates that AudioRecord is uninitialized.
ERROR_BAD_VALUE: Indicates that the parameter is invalid.
ERROR_DEAD_OBJECT: Indicates that no error code is returned when some audio data has been transmitted; an error code will be returned at the next read.

The above three read functions read audio data from hardware audio devices. The main difference between the first two is the audio format, which is 8-bit and 16-bit, corresponding to quantization levels of 2^8 and 2^16.

The third read function records audio data in a direct buffer (DirectBuffer). If this buffer is not a DirectBuffer, it will always return 0, meaning that the parameter audioBuffer passed when using the third read function must be a DirectBuffer; otherwise, audio data cannot be read correctly. At this time, the position of the Buffer will remain unchanged, and the audio format of the data in the buffer will depend on the format specified in AudioRecord, with the byte storage method being in native byte order.

Direct Buffer and Byte Order#

The two concepts of direct buffer and byte order have been mentioned above, and here is a brief explanation:

Direct Buffer#

DirectBuffer is something in NIO. Here is a brief look at some differences between ordinary buffers and direct buffers.

Ordinary Buffer

ByteBuffer buf = ByteBuffer.allocate(1024);
public static ByteBuffer allocate(int capacity) {
    if (capacity < 0)
        throw new IllegalArgumentException();
    return new HeapByteBuffer(capacity, capacity);
}

It can be seen that an ordinary buffer allocates a byte buffer from the heap, which is managed by the JVM, meaning it can be garbage collected at appropriate times. GC collection is accompanied by memory organization, which can affect performance to some extent.

Direct Buffer

ByteBuffer buf = ByteBuffer.allocateDirect(1024);
public static ByteBuffer allocateDirect(int capacity) {
    // Android-changed: Android's DirectByteBuffers carry a MemoryRef.
    // return new DirectByteBuffer(capacity);
    DirectByteBuffer.MemoryRef memoryRef = new DirectByteBuffer.MemoryRef(capacity);
    return new DirectByteBuffer(capacity, memoryRef);
}

The above is the implementation of DirectBuffer in Android. It can be seen that it is allocated from memory. The cost of obtaining and releasing this type of buffer is significant, but it can reside outside the garbage collection heap. It is generally allocated for large, long-lived buffers, and only when allocating this buffer can bring significant performance improvements should it be allocated. Whether it is a DirectBuffer can be determined by isDirect.

Byte Order#

Byte order refers to the way bytes are stored in memory. Byte order is mainly divided into two categories: BIG-ENDIAN and LITTLE-ENDIAN, commonly referred to as network byte order and native byte order, as follows:

Native byte order, i.e., LITTLE-ENDIAN (little-endian, low-byte order), means that low-order bytes are placed at the low address end of memory, and high-order bytes are placed at the high address end of memory. The corresponding concept is network byte order.
Network byte order generally refers to the byte order used in the TCP/IP protocol, as the byte order is defined as BIG-ENDIAN in the various layers of the TCP/IP protocol, so network byte order generally refers to BIG-ENDIAN.

Using AudioRecord#

In the previous article Camera2, MediaCodec recording mp4, only video was recorded, focusing on the use of MediaCodec. Here, based on video recording, AudioRecord will be used to add audio recording and synthesize it into the MP4 file. The key steps are as follows:

Start a thread to use AudioRecord to read audio data from hardware. Starting a thread can avoid lag. The code example is also provided at the end of the article, see AudioEncode2, as follows:

/**
 * Audio reading Runnable
 */
class RecordRunnable : Runnable{
    override fun run() {
        val byteArray = ByteArray(bufferSize)
        // Recording state -1 indicates default state, 1 indicates recording state, 0 indicates stop recording
        while (recording == 1){
            val result = mAudioRecord.read(byteArray, 0, bufferSize)
            if (result > 0){
                val resultArray = ByteArray(result)
                System.arraycopy(byteArray, 0, resultArray, 0, result)
                quene.offer(resultArray)
            }
        }
        // Custom stream end data
        if (recording == 0){
            val stopArray = byteArrayOf((-100).toByte())
            quene.offer(stopArray)
        }
    }
}

Here, it should be mentioned that if only AudioRecord is used to record audio data, when audio data is read, it can be written to a file directly.

To synthesize the read audio data into MP4, it needs to be encoded first. The audio data encoder configuration is as follows:

// Audio data encoder configuration
private fun initAudioCodec() {
    L.i(TAG, "init Codec start")
    try {
        val mediaFormat =
            MediaFormat.createAudioFormat(
                MediaFormat.MIMETYPE_AUDIO_AAC,
                RecordConfig.SAMPLE_RATE,
                2
            )
        mAudioCodec = MediaCodec.createEncoderByType(MediaFormat.MIMETYPE_AUDIO_AAC)
        mediaFormat.setInteger(MediaFormat.KEY_BIT_RATE, 96000)
        mediaFormat.setInteger(
            MediaFormat.KEY_AAC_PROFILE,
            MediaCodecInfo.CodecProfileLevel.AACObjectLC
        )
        mediaFormat.setInteger(MediaFormat.KEY_MAX_INPUT_SIZE, 8192)
        mAudioCodec.setCallback(this)
        mAudioCodec.configure(mediaFormat, null, null, MediaCodec.CONFIGURE_FLAG_ENCODE)
    } catch (e: Exception) {
        L.i(TAG, "init error:${e.message}")
    }
    L.i(TAG, "init Codec end")
}

For encoding, which is also the use of MediaCodec, you can refer to the previous two articles:

Here, the asynchronous processing mode of MediaCodec is used for encoding audio data. The code will not be pasted here, but one point to note is that when filling and releasing Buffer, conditions must be checked. If InputBuffer is not released continuously, it will lead to no available InputBuffer, causing audio encoding to fail, and also handling the end of the stream.

The synthesis of files uses MediaMuxer. Before starting MediaMuxer, it must ensure that both the video track and audio track are added.

override fun onOutputFormatChanged(codec: MediaCodec, format: MediaFormat) {
    L.i(TAG, "onOutputFormatChanged format:${format}")
    // Add audio track
    addAudioTrack(format)
    // Start MediaMuxer only if both audio and video tracks are added
    if (RecordConfig.videoTrackIndex != -1) {
        mAudioMuxer.start()
        RecordConfig.isMuxerStart = true
        L.i(TAG, "onOutputFormatChanged isMuxerStart:${RecordConfig.isMuxerStart}")
    }
}
// Add audio track
private fun addAudioTrack(format: MediaFormat) {
    L.i(TAG, "addAudioTrack format:${format}")
    RecordConfig.audioTrackIndex = mAudioMuxer.addTrack(format)
    RecordConfig.isAddAudioTrack = true
}
// ...

The usage of AudioRecord is basically as described above.