OpenGL ES rendering to play video

The first two articles mainly introduced the basic usage of OpenGL ES and its coordinate system mapping, as follows:

Next, we will use MediaPlayer and OpenGL ES to implement basic video rendering and video frame correction, with the main content as follows:

SurfaceTexture
Rendering Video
Frame Correction

SurfaceTexture#

SurfaceTexture was introduced in Android 3.0, and it does not directly display image streams but captures frames from the image stream as external textures for OpenGL. The image stream mainly comes from camera previews and video decoding, allowing for secondary processing of the image stream, such as filters and effects. It can be understood that SurfaceTexture is a combination of Surface and OpenGL ES textures.

The Surface created by SurfaceTexture is the producer of data, while SurfaceTexture is the corresponding consumer. The Surface receives media data and sends it to SurfaceTexture. When updateTexImage is called, the content of the texture object created by SurfaceTexture will be updated to the latest image frame, converting the image frame into a GL texture and binding that texture to the GL_TEXTURE_EXTERNAL_OES texture object. updateTexImage is only called in the OpenGL ES context thread, typically invoked in onDrawFrame.

Rendering Video#

Everyone should be very familiar with how MediaPlayer plays videos, so I won't elaborate here. With the introduction of SurfaceTexture in the previous section, implementing video rendering using OpenGL ES is very simple. The vertex coordinates and texture coordinates are defined as follows:

// Vertex coordinates  
private val vertexCoordinates = floatArrayOf(  
    1.0f, 1.0f,  
    -1.0f, 1.0f,  
    -1.0f, -1.0f,  
    1.0f, -1.0f  
)  
// Texture coordinates  
private val textureCoordinates = floatArrayOf(  
    1.0f, 0.0f,  
    0.0f, 0.0f,  
    0.0f, 1.0f,  
    1.0f, 1.0f  
)

The texture coordinates must correspond to the vertex coordinates. To briefly mention, the vertex coordinates use the OpenGL coordinate system, where the origin is in the center of the screen, while the texture coordinates correspond to the coordinates on the screen, where the origin is in the top left corner. Generate the texture ID and activate the binding as follows:

/**  
 * Generate texture ID  
 */  
fun createTextureId(): Int {  
    val tex = IntArray(1)  
    GLES20.glGenTextures(1, tex, 0)  
    if (tex[0] == 0) {  
        throw RuntimeException("create OES texture failed, ${Thread.currentThread().name}")  
    }  
    return tex[0]  
}  
  
/**  
 * Create OES texture  
 * Automatic conversion from YUV format to RGB  
 */  
fun activeBindOESTexture(textureId: Int) {  
    // Activate texture unit  
    GLES20.glActiveTexture(GLES20.GL_TEXTURE0)  
    // Bind texture ID to the texture target of the texture unit  
    GLES20.glBindTexture(GLES11Ext.GL_TEXTURE_EXTERNAL_OES, textureId)  
    // Set texture parameters  
    GLES20.glTexParameterf(GLES11Ext.GL_TEXTURE_EXTERNAL_OES, GL10.GL_TEXTURE_MIN_FILTER, GL10.GL_NEAREST.toFloat())  
    GLES20.glTexParameterf(GLES11Ext.GL_TEXTURE_EXTERNAL_OES, GL10.GL_TEXTURE_MAG_FILTER, GL10.GL_LINEAR.toFloat())  
    GLES20.glTexParameterf(GLES11Ext.GL_TEXTURE_EXTERNAL_OES, GL10.GL_TEXTURE_WRAP_S, GL10.GL_CLAMP_TO_EDGE.toFloat())  
    GLES20.glTexParameterf(GLES11Ext.GL_TEXTURE_EXTERNAL_OES, GL10.GL_TEXTURE_WRAP_T, GL10.GL_CLAMP_TO_EDGE.toFloat())  
    Log.d(TAG, "activeBindOESTexture: texture id $textureId")  
}

Bind the texture ID to the texture target of the texture unit. The chosen texture target here is GL_TEXTURE_EXTERNAL_OES, which can automatically complete the conversion from YUV format to RGB. Now let's look at the shaders, where the vertex shader receives the texture coordinates and saves them to vTextureCoordinate for use by the fragment shader, as follows:

// Vertex shader  
attribute vec4 aPosition; // Vertex coordinates  
attribute vec2 aCoordinate; // Texture coordinates  
varying vec2 vTextureCoordinate;  
void main() {  
    gl_Position = aPosition;  
    vTextureCoordinate = aCoordinate;  
}  
  
// Fragment shader  
#extension GL_OES_EGL_image_external : require  
precision mediump float;  
varying vec2 vTextureCoordinate;  
uniform samplerExternalOES uTexture; // OES texture  
void main() {  
    gl_FragColor = texture2D(uTexture, vTextureCoordinate);  
}

The code for Shader compilation, Program linking, and usage is omitted here, as it has been introduced in previous articles, or you can directly check the source code at the end. The renderer is defined as follows:

class PlayRenderer(
     private var context: Context,
     private var glSurfaceView: GLSurfaceView
 ) : GLSurfaceView.Renderer,
     VideoRender.OnNotifyFrameUpdateListener, MediaPlayer.OnPreparedListener,
     MediaPlayer.OnVideoSizeChangedListener, MediaPlayer.OnCompletionListener,
     MediaPlayer.OnErrorListener {
     companion object {
         private const val TAG = "PlayRenderer"
    }
    private lateinit var videoRender: VideoRender
    private lateinit var mediaPlayer: MediaPlayer
    private val projectionMatrix = FloatArray(16)
    private val viewMatrix = FloatArray(16)
    private val vPMatrix = FloatArray(16)
    // For calculating video aspect ratio, see below
    private var screenWidth: Int = -1
    private var screenHeight: Int = -1
    private var videoWidth: Int = -1
    private var videoHeight: Int = -1

    override fun onSurfaceCreated(gl: GL10?, config: EGLConfig?) {
        L.i(TAG, "onSurfaceCreated")
        GLES20.glClearColor(0f, 0f, 0f, 0f)
        videoRender = VideoRender(context)
        videoRender.setTextureID(TextureHelper.createTextureId())
        videoRender.onNotifyFrameUpdateListener = this
        initMediaPlayer()
    }

    override fun onSurfaceChanged(gl: GL10?, width: Int, height: Int) {
        L.i(TAG, "onSurfaceChanged > width:$width,height:$height")
        screenWidth = width
        screenHeight = height
        GLES20.glViewport(0, 0, width, height)
    }

    override fun onDrawFrame(gl: GL10) {
        L.i(TAG, "onDrawFrame")
        gl.glClear(GL10.GL_COLOR_BUFFER_BIT or GL10.GL_DEPTH_BUFFER_BIT)
        videoRender.draw(vPMatrix)
    }

    override fun onPrepared(mp: MediaPlayer?) {
        L.i(OpenGLActivity.TAG, "onPrepared")
        mediaPlayer.start()
    }

    override fun onVideoSizeChanged(mp: MediaPlayer?, width: Int, height: Int) {
        L.i(OpenGLActivity.TAG, "onVideoSizeChanged > width:$width ,height:$height")
        this.videoWidth = width
        this.videoHeight = height
    }

    override fun onCompletion(mp: MediaPlayer?) {
        L.i(OpenGLActivity.TAG, "onCompletion")
    }

    override fun onError(mp: MediaPlayer?, what: Int, extra: Int): Boolean {
        L.i(OpenGLActivity.TAG, "error > what:$what,extra:$extra")
        return true
    }

    private fun initMediaPlayer() {
        mediaPlayer = MediaPlayer()
        mediaPlayer.setOnPreparedListener(this)
        mediaPlayer.setOnVideoSizeChangedListener(this)
        mediaPlayer.setOnCompletionListener(this)
        mediaPlayer.setOnErrorListener(this)
        mediaPlayer.setDataSource(Environment.getExternalStorageDirectory().absolutePath + "/video.mp4")
        mediaPlayer.setSurface(videoRender.getSurface())
        mediaPlayer.prepareAsync()
    }
    // Notify request to render
    override fun onNotifyUpdate() {
        glSurfaceView.requestRender()
    }

    fun destroy() {
        mediaPlayer.stop()
        mediaPlayer.release()
    }
}

The VideoRender in the above code mainly handles rendering operations, and this part of the code is quite similar to that in the previous article, so it will not be pasted here.

When using OpenGL ES for video rendering, the updateTexImage method of SurfaceTexture must be called to update the image frame. This method must be used in the OpenGL ES context. You can set the rendering mode of GLSurfaceView to RENDERMODE_WHEN_DIRTY to avoid continuous drawing. When onFrameAvailable is called, meaning there is available data, then requestRender can be called to reduce unnecessary consumption.

Here is the original video rendering effect image:

Frame Correction#

The video above is played in full screen, but the screen resolution and video resolution are different, causing the video frame to be stretched. This requires calculating the appropriate video frame size based on the screen resolution and video resolution. In this article, the mapping of coordinates is introduced, and the basic adaptation of triangle deformation is discussed. The video is similar; it is essentially a rectangle.

Projection mainly includes orthographic projection and perspective projection. Orthographic projection is generally used for rendering 2D images, such as ordinary video rendering, while perspective projection has the characteristic of making near objects appear larger and distant objects smaller, typically used for 3D image rendering, such as VR rendering. Therefore, we use orthographic projection to correct the image.

First, let's look at the modifications to the Shader, mainly the changes in the vertex shader, as follows:

attribute vec4 aPosition;  
attribute vec2 aCoordinate;  
uniform mat4 uMVPMatrix;  
varying vec2 vTextureCoordinate;  
void main() {  
    gl_Position = uMVPMatrix * aPosition;  
    vTextureCoordinate = aCoordinate;  
}

The key is to calculate the matrix uMVPMatrix, which is the product of the projection matrix and the view matrix. The projection matrix is calculated using Matrix for matrix operations in OpenGL ES. The orthographic projection uses Matrix.orthoM to generate the projection matrix, calculated as follows:

// Calculate video scaling ratio (projection matrix)  
val screenRatio = screenWidth / screenHeight.toFloat()  
val videoRatio = videoWidth / videoHeight.toFloat()  
val ratio: Float  
if (screenWidth > screenHeight) {  
    if (videoRatio >= screenRatio) {  
        ratio = videoRatio / screenRatio  
        Matrix.orthoM(  
            projectionMatrix, 0,  
            -1f, 1f, -ratio, ratio, 3f, 5f  
        )  
    } else {  
        ratio = screenRatio / videoRatio  
        Matrix.orthoM(  
            projectionMatrix, 0,  
            -ratio, ratio, -1f, 1f, 3f, 5f  
        )  
    }  
} else {  
    if (videoRatio >= screenRatio) {  
        ratio = videoRatio / screenRatio  
        Matrix.orthoM(  
            projectionMatrix, 0,  
            -1f, 1f, -ratio, ratio, 3f, 5f  
        )  
    } else {  
        ratio = screenRatio / videoRatio  
        Matrix.orthoM(  
            projectionMatrix, 0,  
            -ratio, ratio, -1f, 1f, 3f, 5f  
        )  
    }  
}

The above mainly determines the appropriate projection matrix parameters based on the screen ratio and the original video ratio. This calculation is similar to image scaling, and you can calculate it yourself. The principle is that the video frame must be completely displayed within the screen. The ratio above is the boundary of the orthographic projection frustum. Taking my phone as an example, if we calculate ratio, for convenience, let the screen width equal the video width, with a screen resolution of 1080 * 2260 and a video resolution of 1080 * 540, then ratio is 2260 / 540, approximately 4.18. Clearly, if we take the screen height as the reference, when the video height is 2260, the video width would be 4520, which far exceeds the screen width. Therefore, we adapt based on the video width. Now let's look at the camera position settings:

// Set camera position (view matrix)  
Matrix.setLookAtM(  
    viewMatrix, 0,  
    0.0f, 0.0f, 5.0f, // Camera position  
    0.0f, 0.0f, 0.0f, // Target position  
    0.0f, 1.0f, 0.0f // Camera up direction  
)

The screen outward is the z-axis, and the camera position (0, 0, 5) indicates that the camera is positioned 5 units away from the screen, along the z-axis. This value must be between the near and far planes of the frustum; otherwise, it will not be visible. For example, in this case, this value should be between 3 and 5. The target position (0, 0, 0) represents the screen, which is the plane formed by the x and y axes, and the camera up direction (0, 1, 0) indicates the positive direction along the y-axis. Finally, we calculate the projection and view transformations as follows, merging projectionMatrix and viewMatrix into vPMatrix through matrix multiplication:

// Calculate projection and view transformation  
Matrix.multiplyMM(vPMatrix, 0, projectionMatrix, 0, viewMatrix, 0)

To correct the image, we need to use the original size of the video, which can be obtained in the onVideoSizeChanged callback of MediaPlayer to initialize the matrix data. Now let's look at the effect after frame correction:

Thus, the video rendering using OpenGL ES is complete, and you can obtain the keyword 【RenderVideo】 to get the source code.