This article introduces the use of AudioRecord
in Android
audio and video development. The case will be based on the previous MediaCodec
recording of MP4
, using AudioRecord
to synthesize audio data into MP4
. The series of articles on Android
audio and video are as follows:
The main content of this article is as follows:
- Introduction to AudioRecord
- AudioRecord lifecycle
- Reading audio data from AudioRecord
- Direct buffer and byte order (optional)
- Using AudioRecord
Introduction to AudioRecord#
AudioRecord
is an audio recording tool in Android used to record audio from hardware devices, obtaining audio data through a pulling
method, generally used to obtain raw audio data in PCM
format. It can achieve simultaneous recording and playback, often used for real-time processing of audio data.
The parameters and descriptions for creating AudioRecord
are as follows:
// Create AudioRecord
public AudioRecord (int audioSource,
int sampleRateInHz,
int channelConfig,
int audioFormat,
int bufferSizeInBytes)
- audioSource: Indicates the audio source, defined in
MediaRecorder.AudioSource
, such as the common audio source main microphoneMediaRecorder.AudioSource.MIC
, etc. - sampleRateInHz: Indicates the sampling rate in hertz, meaning the number of samples per channel per second. Among common sampling rates, only 44100Hz can ensure normal use on all devices. The actual sampling rate can be obtained through
getSampleRate
. This sampling rate is not the playback sampling rate of the audio content; for example, audio with a sampling rate of 8000Hz can be played on a device with a sampling rate of 48000Hz, and the corresponding platform will automatically handle the sampling rate conversion, so it will not play at 6 times the speed. - channelConfig: Indicates the number of channels, defined in
AudioFormat
. Among common channels, only monoAudioFormat.CHANNEL_IN_MONO
can ensure normal use on all devices. Others, such asAudioFormat.CHANNEL_IN_STEREO
, indicate dual channels, which is stereo. - audioFormat: Indicates the format of the audio data returned by
AudioRecord
. For linearPCM
, it reflects the size of each sample (8, 16, 32 bits) and representation (integer, floating-point). Audio formats are defined inAudioFormat
, and among common audio data formats, onlyAudioFormat.ENCODING_PCM_16BIT
can ensure normal use on all devices. Formats likeAudioFormat.ENCODING_PCM_8BIT
cannot guarantee normal use on all devices. - bufferSizeInBytes: Indicates the size of the buffer for writing audio data. This value cannot be less than the size of
getMinBufferSize
, which is the minimum buffer size required byAudioRecord
; otherwise, it will causeAudioRecord
initialization to fail. This buffer size does not guarantee smooth recording under load, and a larger value can be chosen if necessary.
AudioRecord Lifecycle#
The lifecycle states of AudioRecord
include STATE_UNINITIALIZED
, STATE_INITIALIZED
, RECORDSTATE_RECORDING
, and RECORDSTATE_STOPPED
, corresponding to uninitialized, initialized, recording, and stopped recording, as shown in the diagram below:
A brief explanation:
- Before creation or after
release
,AudioRecord
enters theSTATE_UNINITIALIZED
state. - When creating
AudioRecord
, it enters theSTATE_INITIALIZED
state. - Calling
startRecording
enters theRECORDSTATE_RECORDING
state. - Calling
stop
enters theRECORDSTATE_STOPPED
state.
So how to obtain the state of AudioRecord
? You can get its state through getState
and getRecordingState
. To ensure correct usage, you can check its state before operating on the AudioRecord
object.
Reading Audio Data from AudioRecord#
AudioRecord
provides three ways to read audio data, as follows:
// 1. Read audio data, audio format is AudioFormat#ENCODING_PCM_8BIT
int read(@NonNull byte[] audioData, int offsetInBytes, int sizeInBytes)
// 2. Read audio data, audio format is AudioFormat#ENCODING_PCM_16BIT
int read(@NonNull short[] audioData, int offsetInShorts, int sizeInShorts)
// 3. Read audio data, see later chapters
int read(@NonNull ByteBuffer audioBuffer, int sizeInBytes)
The return value of reading audio data is greater than or equal to 0. Common exceptions when reading audio data are as follows:
- ERROR_INVALID_OPERATION: Indicates that
AudioRecord
is uninitialized. - ERROR_BAD_VALUE: Indicates that the parameter is invalid.
- ERROR_DEAD_OBJECT: Indicates that no error code is returned when some audio data has been transmitted; an error code will be returned at the next
read
.
The above three read
functions read audio data from hardware audio devices. The main difference between the first two is the audio format, which is 8-bit and 16-bit, corresponding to quantization levels of 2^8 and 2^16.
The third read
function records audio data in a direct buffer (DirectBuffer
). If this buffer is not a DirectBuffer
, it will always return 0, meaning that the parameter audioBuffer
passed when using the third read
function must be a DirectBuffer
; otherwise, audio data cannot be read correctly. At this time, the position
of the Buffer
will remain unchanged, and the audio format of the data in the buffer will depend on the format specified in AudioRecord
, with the byte storage method being in native byte order.
Direct Buffer and Byte Order#
The two concepts of direct buffer and byte order have been mentioned above, and here is a brief explanation:
Direct Buffer#
DirectBuffer
is something in NIO. Here is a brief look at some differences between ordinary buffers and direct buffers.
- Ordinary Buffer
ByteBuffer buf = ByteBuffer.allocate(1024);
public static ByteBuffer allocate(int capacity) {
if (capacity < 0)
throw new IllegalArgumentException();
return new HeapByteBuffer(capacity, capacity);
}
It can be seen that an ordinary buffer allocates a byte buffer from the heap, which is managed by the JVM, meaning it can be garbage collected at appropriate times. GC collection is accompanied by memory organization, which can affect performance to some extent.
- Direct Buffer
ByteBuffer buf = ByteBuffer.allocateDirect(1024);
public static ByteBuffer allocateDirect(int capacity) {
// Android-changed: Android's DirectByteBuffers carry a MemoryRef.
// return new DirectByteBuffer(capacity);
DirectByteBuffer.MemoryRef memoryRef = new DirectByteBuffer.MemoryRef(capacity);
return new DirectByteBuffer(capacity, memoryRef);
}
The above is the implementation of DirectBuffer
in Android. It can be seen that it is allocated from memory. The cost of obtaining and releasing this type of buffer is significant, but it can reside outside the garbage collection heap. It is generally allocated for large, long-lived buffers, and only when allocating this buffer can bring significant performance improvements should it be allocated. Whether it is a DirectBuffer
can be determined by isDirect
.
Byte Order#
Byte order refers to the way bytes are stored in memory. Byte order is mainly divided into two categories: BIG-ENDIAN and LITTLE-ENDIAN, commonly referred to as network byte order and native byte order, as follows:
- Native byte order, i.e., LITTLE-ENDIAN (little-endian, low-byte order), means that low-order bytes are placed at the low address end of memory, and high-order bytes are placed at the high address end of memory. The corresponding concept is network byte order.
- Network byte order generally refers to the byte order used in the TCP/IP protocol, as the byte order is defined as BIG-ENDIAN in the various layers of the TCP/IP protocol, so network byte order generally refers to BIG-ENDIAN.
Using AudioRecord#
In the previous article Camera2, MediaCodec recording mp4, only video was recorded, focusing on the use of MediaCodec
. Here, based on video recording, AudioRecord
will be used to add audio recording and synthesize it into the MP4
file. The key steps are as follows:
- Start a thread to use
AudioRecord
to read audio data from hardware. Starting a thread can avoid lag. The code example is also provided at the end of the article, seeAudioEncode2
, as follows:
/**
* Audio reading Runnable
*/
class RecordRunnable : Runnable{
override fun run() {
val byteArray = ByteArray(bufferSize)
// Recording state -1 indicates default state, 1 indicates recording state, 0 indicates stop recording
while (recording == 1){
val result = mAudioRecord.read(byteArray, 0, bufferSize)
if (result > 0){
val resultArray = ByteArray(result)
System.arraycopy(byteArray, 0, resultArray, 0, result)
quene.offer(resultArray)
}
}
// Custom stream end data
if (recording == 0){
val stopArray = byteArrayOf((-100).toByte())
quene.offer(stopArray)
}
}
}
Here, it should be mentioned that if only AudioRecord
is used to record audio data, when audio data is read, it can be written to a file directly.
- To synthesize the read audio data into
MP4
, it needs to be encoded first. The audio data encoder configuration is as follows:
// Audio data encoder configuration
private fun initAudioCodec() {
L.i(TAG, "init Codec start")
try {
val mediaFormat =
MediaFormat.createAudioFormat(
MediaFormat.MIMETYPE_AUDIO_AAC,
RecordConfig.SAMPLE_RATE,
2
)
mAudioCodec = MediaCodec.createEncoderByType(MediaFormat.MIMETYPE_AUDIO_AAC)
mediaFormat.setInteger(MediaFormat.KEY_BIT_RATE, 96000)
mediaFormat.setInteger(
MediaFormat.KEY_AAC_PROFILE,
MediaCodecInfo.CodecProfileLevel.AACObjectLC
)
mediaFormat.setInteger(MediaFormat.KEY_MAX_INPUT_SIZE, 8192)
mAudioCodec.setCallback(this)
mAudioCodec.configure(mediaFormat, null, null, MediaCodec.CONFIGURE_FLAG_ENCODE)
} catch (e: Exception) {
L.i(TAG, "init error:${e.message}")
}
L.i(TAG, "init Codec end")
}
For encoding, which is also the use of MediaCodec
, you can refer to the previous two articles:
Here, the asynchronous processing mode of MediaCodec
is used for encoding audio data. The code will not be pasted here, but one point to note is that when filling and releasing Buffer
, conditions must be checked. If InputBuffer
is not released continuously, it will lead to no available InputBuffer
, causing audio encoding to fail, and also handling the end of the stream.
- The synthesis of files uses
MediaMuxer
. Before startingMediaMuxer
, it must ensure that both the video track and audio track are added.
override fun onOutputFormatChanged(codec: MediaCodec, format: MediaFormat) {
L.i(TAG, "onOutputFormatChanged format:${format}")
// Add audio track
addAudioTrack(format)
// Start MediaMuxer only if both audio and video tracks are added
if (RecordConfig.videoTrackIndex != -1) {
mAudioMuxer.start()
RecordConfig.isMuxerStart = true
L.i(TAG, "onOutputFormatChanged isMuxerStart:${RecordConfig.isMuxerStart}")
}
}
// Add audio track
private fun addAudioTrack(format: MediaFormat) {
L.i(TAG, "addAudioTrack format:${format}")
RecordConfig.audioTrackIndex = mAudioMuxer.addTrack(format)
RecordConfig.isAddAudioTrack = true
}
// ...
The usage of AudioRecord
is basically as described above.