音視頻學習 (十二) 基于 FFmpeg + OpenSLES 實現音頻萬能播放器 · 音視頻開發工程師

## 音視頻學習 (十二) 基于 FFmpeg + OpenSLES 實現音頻萬能播放器 ## 前言 > 嘮叨一句: > > 說實話現在搞 Android 開發的要求是越來越高，初級都要被淘汰的感覺。所以想要一直往 Android 的道路上走，深入**音視頻領域**是一個不錯的選擇。雖然現在跨平臺**Flutter**很火，但終究它還是一個寫 UI 的(個人看法)。程序員隨著年齡的增長不可能一直寫 UI 。但是深入**C/C++**就不一樣了。它可以給移動端甚至**Flutter**提供底層 SDK 和技術支持這難道不香嗎？所以想要或者準備學習音視頻知識的，那么該篇文章可以助你快速入門。五一假期即將結束，不知道大家玩的怎么樣，我了也趁著這個假期抽出了一天時間去爬了長城，有句話不是說，不到長城非好漢嘛。何況來北京好些年了一直沒機會去。之后利用剩下的這 4 天寫了一個音頻處理庫，目前包含如下功能: | 功能 | 是否完成 | | --- | --- | | 讀取任意格式音頻流 | Yes | | FFmpeg 音頻解碼為 PCM | Yes | | 音頻 Native OpenSL ES 渲染 | Yes | | 音量控制 | Yes | | seek 控制 | Yes | | 聲道切換 | Yes | | 變調變速 | Yes | | 變聲 | No | | 裁剪音頻輸出 MP3/PCM 等格式 | pcm-ok | | 邊播邊錄制 | No | | 音頻編碼為 AAC、MP3、WAV | No | 其實音頻處理的庫在 GitHub 搜都是一大堆，那為什么我自己還要寫一個呢？原因是我不想當伸手黨，如果每次都這樣，那自己的技術何來進步，是吧？而且自己寫的庫也便于自己修改和增加一些特殊的功能。下面我就大概來說一下具體每個功能是如何實現及做一個音頻處理的庫需要搭建怎樣的一個架構。當然我也不是一時興起就來寫一個音頻處理庫。還是有一個小目標的: * 打造一個萬能音頻播放器。不管是直播源還是網絡/本地源，只要給我一個路徑我就能播放。 * 增加一些特殊的處理比如，變調變速、變聲、裁剪... * 一切皆有可能。可以看下效果圖: ![](data:image/svg+xml;utf8,) ## 介紹 **編碼環境** FFmpeg : 4.2.2 NDK: 17c OS: MAC **實現流程:** ![](https://user-gold-cdn.xitu.io/2020/5/5/171e57b2f418c049?imageView2/0/w/1280/h/960/format/webp/ignore-error/1) **粗略架構組成:** ![](https://user-gold-cdn.xitu.io/2020/5/5/171e57b2bf3fe014?imageView2/0/w/1280/h/960/format/webp/ignore-error/1) **[kotlin](https://github.com/JetBrains/kotlin):** **Kotlin**語言是由**JetBrains**公司開發，2010 面世，2017 年正式在谷歌 I/O 大會上推薦 Kotlin 作為 Android 開發語言。 **[FFmpeg](https://github.com/FFmpeg/FFmpeg):** FFmpeg 是一套可以用來記錄、轉換數字音頻、視頻，并能將其轉化為流的開源計算機程序。采用LGPL或GPL許可證。它提供了錄制、轉換以及流化音視頻的完整解決方案。 **[SoundTouch](https://gitlab.com/soundtouch/soundtouch):** 可以在 PCM 音頻裸流基礎上對音頻變調變速 **[OpenSLES](https://developer.android.google.cn/ndk/guides/audio/opensl-for-android):** **OpenSL ES**（**嵌入式系統的開放聲音庫**）是一種免版稅，跨平臺，硬件加速的[C語言](https://en.wikipedia.org/wiki/C_(programming_language))音頻[API，](https://en.wikipedia.org/wiki/Application_programming_interface)用于2D和3D音頻。它提供對[3D位置音頻](https://en.wikipedia.org/wiki/3D_audio)和[MIDI](https://en.wikipedia.org/wiki/MIDI)播放等功能的訪問。它是為移動和游戲行業的開發人員設計的，致力于使跨多個平臺的應用程序輕松移植。 ## FFmpeg 初始化編譯 FFmpeg 可以參考我之前寫的文章[音視頻學習 (六) 一鍵編譯 32/64 位 FFmpeg 4.2.2](https://juejin.im/post/6844904048303276045) 這里我們就按照 FFmpeg 初始化流程來進行介紹 API 如下所示: ~~~ //1. 分配一個 AVFormatContext。 AVFormatContext *avformat_alloc_context(void); //對應的釋放 void avformat_free_context(AVFormatContext *s); //2. 打開輸入流，讀取頭部信息，一般包含有音頻，視頻流信息也可能有彈幕信息 int avformat_open_input(AVFormatContext **ps, const char *url, ff_const59 AVInputFormat *fmt, AVDictionary **options); //對應的關閉流信息，釋放所有內容資源 void avformat_close_input(AVFormatContext **s); //3. 讀取媒體文件的數據包以獲取流信息。返回 >=0 則成功 int avformat_find_stream_info(AVFormatContext *ic, AVDictionary **options); //3.1 拿到當前流的數量信息，一般會有音頻，視頻，或者彈幕 int number = (*AVFormatContext)->nb_streams //3.2 遍歷拿到對應的 stream //視頻流 if ((*pFormatCtx)->streams && (*pFormatCtx)->streams[i]->codecpar->codec_type == AVMEDIA_TYPE_VIDEO) //語音流 if ((*pFormatCtx)->streams[i]->codecpar->codec_type == AVMEDIA_TYPE_AUDIO) //其它流類型 enum AVMediaType { AVMEDIA_TYPE_UNKNOWN = -1, ///< Usually treated as AVMEDIA_TYPE_DATA AVMEDIA_TYPE_VIDEO, AVMEDIA_TYPE_AUDIO, AVMEDIA_TYPE_DATA, ///< Opaque data information usually continuous AVMEDIA_TYPE_SUBTITLE, AVMEDIA_TYPE_ATTACHMENT, ///< Opaque data information usually sparse AVMEDIA_TYPE_NB }; //4. 根據 AVCodecID 拿到已經注冊的解碼器 AVCodec *avcodec_find_decoder(enum AVCodecID id); //5. 分配一個 AVCodecContext AVCodecContext *avcodec_alloc_context3(const AVCodec *codec); //對應的釋放 void avcodec_free_context(AVCodecContext **avctx); //6. 給解碼器設置參數 int avcodec_parameters_to_context(AVCodecContext *codec, const AVCodecParameters *par); //7. 打開解碼器 is 0 success int avcodec_open2(AVCodecContext *avctx, const AVCodec *codec, AVDictionary **options); 復制代碼 ~~~ 以上 7 步如果沒有問題證明編解碼器打開成功，可以進行下一步操作。 ## FFmpeg 讀取音頻幀這里還是介紹 API 使用: ~~~ //1. 分配一個 AVPacket AVPacket *av_packet_alloc(void); //結果必須釋放 void av_packet_free(AVPacket **pkt); //2. 讀取待解碼數據包 int av_read_frame(AVFormatContext *s, AVPacket *pkt); 復制代碼 ~~~ 對，就是這么簡單，就調用 3 個 API 然后循環讀取，送入待解碼隊列中。 ## FFmpeg 解碼音頻為 PCM 這里相當于是讀取待解碼隊列中的數據，進行解碼為 PCM 數據 ~~~ //1. 將待解碼數據 AVPacket 送入解碼器 0 is ok int avcodec_send_packet(AVCodecContext *avctx, const AVPacket *avpkt); //2. 分配一個 AVFrame 用于接收解碼之后的數據 AVFrame *av_frame_alloc(void); //對應的釋放 API void av_frame_free(AVFrame **frame); //3. 接收解碼之后的數據 0 is ok int avcodec_receive_frame(AVCodecContext *avctx, AVFrame *frame); //對解碼之后的 PCM 進行統一重采樣。規定一些格式，避免不統一而渲染異常 //4. 根據傳入的參數來分配一個 SwrContext struct SwrContext *swr_alloc_set_opts(struct SwrContext *s, int64_t out_ch_layout, enum AVSampleFormat out_sample_fmt, int out_sample_rate, int64_t in_ch_layout, enum AVSampleFormat in_sample_fmt, int in_sample_rate, int log_offset, void *log_ctx); //4.1 對 SwrContext 進行初始化 int swr_init(struct SwrContext *s); //4.2 開始重采樣 int swr_convert(struct SwrContext *s, uint8_t **out, int out_count, const uint8_t **in , int in_count); 復制代碼 ~~~ ## OpenSLES 渲染 PCM 這里還是以流程的形式介紹 API 含義 ~~~ //1. 創建播放引擎 result = slCreateEngine(&engineObject, 0, NULL, 0, NULL, NULL); result = (*engineObject)->Realize(engineObject, SL_BOOLEAN_FALSE); result = (*engineObject)->GetInterface(engineObject, SL_IID_ENGINE, &engineEngine); //2. 創建混音器 const SLInterfaceID mids[1] = {SL_IID_ENVIRONMENTALREVERB}; const SLboolean mreq[1] = {SL_BOOLEAN_FALSE}; result = (*outputMixObject)->Realize(outputMixObject, SL_BOOLEAN_FALSE); result = (*outputMixObject)->GetInterface(outputMixObject, SL_IID_ENVIRONMENTALREVERB, &outputMixEnvironmentalReverb); if (SL_RESULT_SUCCESS == result) { result = (*outputMixEnvironmentalReverb)->SetEnvironmentalReverbProperties( outputMixEnvironmentalReverb, &reverbSettings); (void) result; } SLDataLocator_OutputMix outputMix = {SL_DATALOCATOR_OUTPUTMIX, outputMixObject}; SLDataSink audioSnk = {&outputMix, 0}; //3. 配置PCM格式信息 SLDataLocator_AndroidSimpleBufferQueue android_queue = {SL_DATALOCATOR_ANDROIDSIMPLEBUFFERQUEUE, 2}; SLDataFormat_PCM pcm = { SL_DATAFORMAT_PCM,//播放pcm格式的數據 2,//2個聲道（立體聲） static_cast<SLuint32>(getCurSampleRate(sample_rate)),//44100hz的頻率 SL_PCMSAMPLEFORMAT_FIXED_16,//位數 16位 SL_PCMSAMPLEFORMAT_FIXED_16,//和位數一致就行 SL_SPEAKER_FRONT_LEFT | SL_SPEAKER_FRONT_RIGHT,//立體聲（前左前右） SL_BYTEORDER_LITTLEENDIAN//結束標志 }; SLDataSource slDataSource = {&android_queue, &pcm}; const SLInterfaceID ids[3] = {SL_IID_BUFFERQUEUE, SL_IID_VOLUME, SL_IID_MUTESOLO}; const SLboolean req[3] = {SL_BOOLEAN_TRUE, SL_BOOLEAN_TRUE, SL_BOOLEAN_TRUE}; result = (*engineEngine)->CreateAudioPlayer(engineEngine, &pcmPlayerObject, &slDataSource, &audioSnk, sizeof(ids) / sizeof(ids[0]), ids, req); //4. 初始化播放器 result = (*pcmPlayerObject)->Realize(pcmPlayerObject, SL_BOOLEAN_FALSE); result = (*pcmPlayerObject)->GetInterface(pcmPlayerObject, SL_IID_PLAY, &pcmPlayerPlay); //5. 注冊回調緩沖區獲取緩沖隊列接口 (*pcmPlayerObject)->GetInterface(pcmPlayerObject, SL_IID_BUFFERQUEUE, &pcmBufferQueue); (*pcmBufferQueue)->RegisterCallback(pcmBufferQueue, pcmBufferCallBack, this); //6. 設置播放狀態 (*pcmPlayerPlay)->SetPlayState(pcmPlayerPlay, SL_PLAYSTATE_PLAYING); //7. 手動激活回調接口 pcmBufferCallBack(pcmBufferQueue, this); 復制代碼 ~~~ 初始化就是這 7 大步，那么渲染的話，就是在`pcmBufferCallBack`中進行設置，直接上代碼吧: ~~~ void pcmBufferCallBack(SLAndroidSimpleBufferQueueItf bf, void *pVoid) { auto audioPlayer = static_cast<BaseAudioChannel *>(pVoid); if (!audioPlayer) return; if (audioPlayer->status && audioPlayer->status->exit) LOGE("looper pcmBufferCallBack start"); //拿到 PCM 原始數據 int size = audioPlayer->getPCMData(); //對 PCM 做變速變調操作。 size = audioPlayer->setSoundTouchData(); ... //8. 放入緩存，開始播放聲音 (*audioPlayer->pcmBufferQueue)->Enqueue(audioPlayer->pcmBufferQueue, audioPlayer->out_pcm_buffer, size); ... } 復制代碼 ~~~ 對，沒錯。第八步就是真正將 PCM 放入 OpenSL ES 緩沖隊列中，這里要注意，一點要等它的上一幀渲染完在放入下一幀 PCM 數據。 ## 功能點實現: ### 聲道選擇聲道操作直接操作的是 OpenSLES 接口，具體 API 如下: ~~~ //1. 得到音頻聲道通道接口 (*pcmPlayerObject)->GetInterface(pcmPlayerObject, SL_IID_MUTESOLO, &pcmChannelModePlay); //2. 設置音頻通道 /** * 設置音頻通道 * @param channelMode */ void BaseAudioChannel::setChannelMode(int channelMode) { this->mChannelMode = channelMode; if (pcmChannelModePlay != NULL) { if (channelMode == 0)//右聲道 { (*pcmChannelModePlay)->SetChannelMute(pcmChannelModePlay, 1, false); (*pcmChannelModePlay)->SetChannelMute(pcmChannelModePlay, 0, true); } else if (channelMode == 1)//左聲道 { (*pcmChannelModePlay)->SetChannelMute(pcmChannelModePlay, 1, true); (*pcmChannelModePlay)->SetChannelMute(pcmChannelModePlay, 0, false); } else if (channelMode == 2)//立體聲通道為 2 也就是我們重采樣設置的 AV_CH_LAYOUT_STEREO { (*pcmChannelModePlay)->SetChannelMute(pcmChannelModePlay, 1, false); (*pcmChannelModePlay)->SetChannelMute(pcmChannelModePlay, 0, false); } } } 復制代碼 ~~~ ### 音量控制聲音音量控制這里還是基于的是 OpenSLES 接口，對應 API 如下: ~~~ //1. 拿到音頻聲音控制接口 (*pcmPlayerObject)->GetInterface(pcmPlayerObject, SL_IID_VOLUME, &pcmVolumePlay); //2. 設置聲音 /** * 平滑設置當前音量 * @param volume */ void BaseAudioChannel::setVolume(int percent) { this->curVolume = percent; if (pcmVolumePlay != NULL) { if (percent > 30) { (*pcmVolumePlay)->SetVolumeLevel(pcmVolumePlay, (100 - percent) * -20); } else if (percent > 25) { (*pcmVolumePlay)->SetVolumeLevel(pcmVolumePlay, (100 - percent) * -22); } else if (percent > 20) { (*pcmVolumePlay)->SetVolumeLevel(pcmVolumePlay, (100 - percent) * -25); } else if (percent > 15) { (*pcmVolumePlay)->SetVolumeLevel(pcmVolumePlay, (100 - percent) * -28); } else if (percent > 10) { (*pcmVolumePlay)->SetVolumeLevel(pcmVolumePlay, (100 - percent) * -30); } else if (percent > 5) { (*pcmVolumePlay)->SetVolumeLevel(pcmVolumePlay, (100 - percent) * -34); } else if (percent > 3) { (*pcmVolumePlay)->SetVolumeLevel(pcmVolumePlay, (100 - percent) * -37); } else if (percent > 0) { (*pcmVolumePlay)->SetVolumeLevel(pcmVolumePlay, (100 - percent) * -40); } else { (*pcmVolumePlay)->SetVolumeLevel(pcmVolumePlay, (100 - percent) * -100); } } } 復制代碼 ~~~ ### 語調語速設置語調語速功能這里用的開源的[SoundTouch](https://gitlab.com/soundtouch/soundtouch)，具體實現如下: ~~~ int BaseAudioChannel::setSoundTouchData() { int num = 0; while (status && !status->exit) { if (finished) { finished = false; if (this->mBufSize > 0 && this->out_pcm_buffer) { pthread_mutex_lock(&mutexSpeed); soundTouch->putSamples(reinterpret_cast<const SAMPLETYPE *>(this->out_pcm_buffer), this->oldSize); num = soundTouch->receiveSamples(reinterpret_cast<SAMPLETYPE *>(this->out_pcm_buffer), this->mBufSize / 4); pthread_mutex_unlock(&mutexSpeed); } else { soundTouch->flush(); } } if (num == 0) { finished = true; continue; } return num * 2 * 2; } return 0; } 復制代碼 ~~~ ### seek 指定在某個時間段播放 seek 功能直接調取的 FFmpeg API ，操作如下: ~~~ void BaseDecodec::seek(int number) { if (duration <= 0) { return; } if (number >= 0 && number <= number) { int64_t rel = number * AV_TIME_BASE; avcodec_flush_buffers(this->avCodecContext); avformat_seek_file(this->avFormatContext, -1, INT64_MIN, rel, INT64_MAX, 0); } } 復制代碼 ~~~ ### 截取 PCM 截取 PCM 原理其實很簡單，比如一段音頻的總長為 500s ，我想要截取 300 - 400s 部分，那么我首先 seek(300)作為起點，如果解碼幀的時間到了 500 那么就直接退出就 OK 了，是不是很簡單。這里我說下怎么獲取時間: ~~~ //這是基本的時間單位（以秒為單位）表示其中的幀時間戳。 this->time_base = (*pFormatCtx)->streams[i]->time_base; //1. 初始化 FFmpeg 讀取流 header 信息可以獲取 int audioDuration = (*pFormatCtx)->streams[i]->duration / AV_TIME_BASE; //2. 讀取待解碼 AVPacket 包獲取時間 int readCurAudioTime = avPacket->pts * av_q2d(time_base); //3. 解碼獲取時間 int decodeAudioCurTime = avFrame->pts * av_q2d(time_base); 復制代碼 ~~~ ## 總結到這里我們的音頻處理庫就講解完了，對于音視頻感興趣的可以作為學習資料，因為我本人不喜歡弄 UI ，不然我可以仿照一個網易云音樂的 UI + 我自己的音頻處理庫來做一個音頻 APP 。當然你可以這樣來搞。文章中所有的代碼已上傳[GitHub](https://github.com/yangkun19921001/AudioManager) ## 關于我 * Email: yang1001yk@gmail.com * 個人博客:[www.devyk.top](https://www.devyk.top/) * GitHub:[github.com/yangkun1992…](https://github.com/yangkun19921001) * 掘金博客:[juejin.im/user/336855…](https://juejin.im/user/3368559355637566)