The first two articles mainly introduced the basic usage of OpenGL ES and its coordinate system mapping, as follows:
Next, we will use MediaPlayer
and OpenGL ES to implement basic video rendering and video frame correction, with the main content as follows:
- SurfaceTexture
- Rendering Video
- Frame Correction
SurfaceTexture#
SurfaceTexture
was introduced in Android 3.0, and it does not directly display image streams but captures frames from the image stream as external textures for OpenGL. The image stream mainly comes from camera previews and video decoding, allowing for secondary processing of the image stream, such as filters and effects. It can be understood that SurfaceTexture
is a combination of Surface
and OpenGL ES textures.
The Surface
created by SurfaceTexture
is the producer of data, while SurfaceTexture
is the corresponding consumer. The Surface
receives media data and sends it to SurfaceTexture
. When updateTexImage
is called, the content of the texture object created by SurfaceTexture
will be updated to the latest image frame, converting the image frame into a GL texture and binding that texture to the GL_TEXTURE_EXTERNAL_OES
texture object. updateTexImage
is only called in the OpenGL ES context thread, typically invoked in onDrawFrame
.
Rendering Video#
Everyone should be very familiar with how MediaPlayer
plays videos, so I won't elaborate here. With the introduction of SurfaceTexture
in the previous section, implementing video rendering using OpenGL ES is very simple. The vertex coordinates and texture coordinates are defined as follows:
// Vertex coordinates
private val vertexCoordinates = floatArrayOf(
1.0f, 1.0f,
-1.0f, 1.0f,
-1.0f, -1.0f,
1.0f, -1.0f
)
// Texture coordinates
private val textureCoordinates = floatArrayOf(
1.0f, 0.0f,
0.0f, 0.0f,
0.0f, 1.0f,
1.0f, 1.0f
)
The texture coordinates must correspond to the vertex coordinates. To briefly mention, the vertex coordinates use the OpenGL coordinate system, where the origin is in the center of the screen, while the texture coordinates correspond to the coordinates on the screen, where the origin is in the top left corner. Generate the texture ID and activate the binding as follows:
/**
* Generate texture ID
*/
fun createTextureId(): Int {
val tex = IntArray(1)
GLES20.glGenTextures(1, tex, 0)
if (tex[0] == 0) {
throw RuntimeException("create OES texture failed, ${Thread.currentThread().name}")
}
return tex[0]
}
/**
* Create OES texture
* Automatic conversion from YUV format to RGB
*/
fun activeBindOESTexture(textureId: Int) {
// Activate texture unit
GLES20.glActiveTexture(GLES20.GL_TEXTURE0)
// Bind texture ID to the texture target of the texture unit
GLES20.glBindTexture(GLES11Ext.GL_TEXTURE_EXTERNAL_OES, textureId)
// Set texture parameters
GLES20.glTexParameterf(GLES11Ext.GL_TEXTURE_EXTERNAL_OES, GL10.GL_TEXTURE_MIN_FILTER, GL10.GL_NEAREST.toFloat())
GLES20.glTexParameterf(GLES11Ext.GL_TEXTURE_EXTERNAL_OES, GL10.GL_TEXTURE_MAG_FILTER, GL10.GL_LINEAR.toFloat())
GLES20.glTexParameterf(GLES11Ext.GL_TEXTURE_EXTERNAL_OES, GL10.GL_TEXTURE_WRAP_S, GL10.GL_CLAMP_TO_EDGE.toFloat())
GLES20.glTexParameterf(GLES11Ext.GL_TEXTURE_EXTERNAL_OES, GL10.GL_TEXTURE_WRAP_T, GL10.GL_CLAMP_TO_EDGE.toFloat())
Log.d(TAG, "activeBindOESTexture: texture id $textureId")
}
Bind the texture ID to the texture target of the texture unit. The chosen texture target here is GL_TEXTURE_EXTERNAL_OES
, which can automatically complete the conversion from YUV format to RGB. Now let's look at the shaders, where the vertex shader receives the texture coordinates and saves them to vTextureCoordinate
for use by the fragment shader, as follows:
// Vertex shader
attribute vec4 aPosition; // Vertex coordinates
attribute vec2 aCoordinate; // Texture coordinates
varying vec2 vTextureCoordinate;
void main() {
gl_Position = aPosition;
vTextureCoordinate = aCoordinate;
}
// Fragment shader
#extension GL_OES_EGL_image_external : require
precision mediump float;
varying vec2 vTextureCoordinate;
uniform samplerExternalOES uTexture; // OES texture
void main() {
gl_FragColor = texture2D(uTexture, vTextureCoordinate);
}
The code for Shader compilation, Program linking, and usage is omitted here, as it has been introduced in previous articles, or you can directly check the source code at the end. The renderer is defined as follows:
class PlayRenderer(
private var context: Context,
private var glSurfaceView: GLSurfaceView
) : GLSurfaceView.Renderer,
VideoRender.OnNotifyFrameUpdateListener, MediaPlayer.OnPreparedListener,
MediaPlayer.OnVideoSizeChangedListener, MediaPlayer.OnCompletionListener,
MediaPlayer.OnErrorListener {
companion object {
private const val TAG = "PlayRenderer"
}
private lateinit var videoRender: VideoRender
private lateinit var mediaPlayer: MediaPlayer
private val projectionMatrix = FloatArray(16)
private val viewMatrix = FloatArray(16)
private val vPMatrix = FloatArray(16)
// For calculating video aspect ratio, see below
private var screenWidth: Int = -1
private var screenHeight: Int = -1
private var videoWidth: Int = -1
private var videoHeight: Int = -1
override fun onSurfaceCreated(gl: GL10?, config: EGLConfig?) {
L.i(TAG, "onSurfaceCreated")
GLES20.glClearColor(0f, 0f, 0f, 0f)
videoRender = VideoRender(context)
videoRender.setTextureID(TextureHelper.createTextureId())
videoRender.onNotifyFrameUpdateListener = this
initMediaPlayer()
}
override fun onSurfaceChanged(gl: GL10?, width: Int, height: Int) {
L.i(TAG, "onSurfaceChanged > width:$width,height:$height")
screenWidth = width
screenHeight = height
GLES20.glViewport(0, 0, width, height)
}
override fun onDrawFrame(gl: GL10) {
L.i(TAG, "onDrawFrame")
gl.glClear(GL10.GL_COLOR_BUFFER_BIT or GL10.GL_DEPTH_BUFFER_BIT)
videoRender.draw(vPMatrix)
}
override fun onPrepared(mp: MediaPlayer?) {
L.i(OpenGLActivity.TAG, "onPrepared")
mediaPlayer.start()
}
override fun onVideoSizeChanged(mp: MediaPlayer?, width: Int, height: Int) {
L.i(OpenGLActivity.TAG, "onVideoSizeChanged > width:$width ,height:$height")
this.videoWidth = width
this.videoHeight = height
}
override fun onCompletion(mp: MediaPlayer?) {
L.i(OpenGLActivity.TAG, "onCompletion")
}
override fun onError(mp: MediaPlayer?, what: Int, extra: Int): Boolean {
L.i(OpenGLActivity.TAG, "error > what:$what,extra:$extra")
return true
}
private fun initMediaPlayer() {
mediaPlayer = MediaPlayer()
mediaPlayer.setOnPreparedListener(this)
mediaPlayer.setOnVideoSizeChangedListener(this)
mediaPlayer.setOnCompletionListener(this)
mediaPlayer.setOnErrorListener(this)
mediaPlayer.setDataSource(Environment.getExternalStorageDirectory().absolutePath + "/video.mp4")
mediaPlayer.setSurface(videoRender.getSurface())
mediaPlayer.prepareAsync()
}
// Notify request to render
override fun onNotifyUpdate() {
glSurfaceView.requestRender()
}
fun destroy() {
mediaPlayer.stop()
mediaPlayer.release()
}
}
The VideoRender
in the above code mainly handles rendering operations, and this part of the code is quite similar to that in the previous article, so it will not be pasted here.
When using OpenGL ES for video rendering, the updateTexImage
method of SurfaceTexture
must be called to update the image frame. This method must be used in the OpenGL ES context. You can set the rendering mode of GLSurfaceView
to RENDERMODE_WHEN_DIRTY
to avoid continuous drawing. When onFrameAvailable
is called, meaning there is available data, then requestRender
can be called to reduce unnecessary consumption.
Here is the original video rendering effect image:
Frame Correction#
The video above is played in full screen, but the screen resolution and video resolution are different, causing the video frame to be stretched. This requires calculating the appropriate video frame size based on the screen resolution and video resolution. In this article, the mapping of coordinates is introduced, and the basic adaptation of triangle deformation is discussed. The video is similar; it is essentially a rectangle.
Projection mainly includes orthographic projection and perspective projection. Orthographic projection is generally used for rendering 2D images, such as ordinary video rendering, while perspective projection has the characteristic of making near objects appear larger and distant objects smaller, typically used for 3D image rendering, such as VR rendering. Therefore, we use orthographic projection to correct the image.
First, let's look at the modifications to the Shader
, mainly the changes in the vertex shader, as follows:
attribute vec4 aPosition;
attribute vec2 aCoordinate;
uniform mat4 uMVPMatrix;
varying vec2 vTextureCoordinate;
void main() {
gl_Position = uMVPMatrix * aPosition;
vTextureCoordinate = aCoordinate;
}
The key is to calculate the matrix uMVPMatrix
, which is the product of the projection matrix and the view matrix. The projection matrix is calculated using Matrix
for matrix operations in OpenGL ES. The orthographic projection uses Matrix.orthoM
to generate the projection matrix, calculated as follows:
// Calculate video scaling ratio (projection matrix)
val screenRatio = screenWidth / screenHeight.toFloat()
val videoRatio = videoWidth / videoHeight.toFloat()
val ratio: Float
if (screenWidth > screenHeight) {
if (videoRatio >= screenRatio) {
ratio = videoRatio / screenRatio
Matrix.orthoM(
projectionMatrix, 0,
-1f, 1f, -ratio, ratio, 3f, 5f
)
} else {
ratio = screenRatio / videoRatio
Matrix.orthoM(
projectionMatrix, 0,
-ratio, ratio, -1f, 1f, 3f, 5f
)
}
} else {
if (videoRatio >= screenRatio) {
ratio = videoRatio / screenRatio
Matrix.orthoM(
projectionMatrix, 0,
-1f, 1f, -ratio, ratio, 3f, 5f
)
} else {
ratio = screenRatio / videoRatio
Matrix.orthoM(
projectionMatrix, 0,
-ratio, ratio, -1f, 1f, 3f, 5f
)
}
}
The above mainly determines the appropriate projection matrix parameters based on the screen ratio and the original video ratio. This calculation is similar to image scaling, and you can calculate it yourself. The principle is that the video frame must be completely displayed within the screen. The ratio
above is the boundary of the orthographic projection frustum. Taking my phone as an example, if we calculate ratio
, for convenience, let the screen width equal the video width, with a screen resolution of 1080 * 2260 and a video resolution of 1080 * 540, then ratio
is 2260 / 540, approximately 4.18. Clearly, if we take the screen height as the reference, when the video height is 2260, the video width would be 4520, which far exceeds the screen width. Therefore, we adapt based on the video width. Now let's look at the camera position settings:
// Set camera position (view matrix)
Matrix.setLookAtM(
viewMatrix, 0,
0.0f, 0.0f, 5.0f, // Camera position
0.0f, 0.0f, 0.0f, // Target position
0.0f, 1.0f, 0.0f // Camera up direction
)
The screen outward is the z-axis, and the camera position (0, 0, 5) indicates that the camera is positioned 5 units away from the screen, along the z-axis. This value must be between the near and far planes of the frustum; otherwise, it will not be visible. For example, in this case, this value should be between 3 and 5. The target position (0, 0, 0) represents the screen, which is the plane formed by the x and y axes, and the camera up direction (0, 1, 0) indicates the positive direction along the y-axis. Finally, we calculate the projection and view transformations as follows, merging projectionMatrix
and viewMatrix
into vPMatrix
through matrix multiplication:
// Calculate projection and view transformation
Matrix.multiplyMM(vPMatrix, 0, projectionMatrix, 0, viewMatrix, 0)
To correct the image, we need to use the original size of the video, which can be obtained in the onVideoSizeChanged
callback of MediaPlayer
to initialize the matrix data. Now let's look at the effect after frame correction:
Thus, the video rendering using OpenGL ES is complete, and you can obtain the keyword 【RenderVideo】 to get the source code.