基于百度飞桨的本地语音转文字系统

发表于 2022-11-13 更新于 2024-11-09 分类于计算机AI识别

搭建过程主要是借鉴了CSDNTransistor_Red写的文章https://blog.csdn.net/qq\_42859445/article/details/126172504

当然本人比较小白，一些基础常识性的东西不了解导致走了很多弯路，只是Python版本就装了8遍，这个项目是基于Python的。

百度飞桨官方安装步骤这里基本上写的官方的环境要求，我这里是Nvidia GT1030显卡Win10 64bit系统，截止到2022.11.12支持最新的Python版本为3.9，我就是装了新的Python版本导致一直报错装不上。

下载链接文件的cp39-cp39意思是支持Python3.9版本，因为这个搞不懂下载了很多也搞不明白 paddlepaddle_gpu-2.4.0rc0.post117-cp39-cp39-win_amd64.whl

还有一个就是用Python pip命令安装的时候会提示你的pip版本不够高，要求升级pip版本。但是由于Python的服务器在国外，导致一直升级不上，虽然只有200kb却老是出错，其实要是搞明白原因安装还是很简单的。我安装的是Python3.9 64bit，这里只支持64bit的Python，升级pip的话就是去用清华源来进行升级，包括后绪安装的一些依赖包都是指定清华源来安装的。

安装方法

官方连接

临时使用清华源来升级pip版本

1	python -m pip install -i https://pypi.tuna.tsinghua.edu.cn/simple --upgrade pip

将清华源来作为默认pip源命令

1 2	python -m pip install --upgrade pip pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple

这些官网上也有详细的方法就不过多介绍了。

安装好正确的Python版本，再升级最新的pip版本，就可以开心的用官方的命令来执行安装了。

显卡的CUDA版本可以通过命令 **_nvidia-smi_** 获取

向这里虽然我的是CUDA11.8，但是官网最新11.6的也可以用。

然后直接用官网首页生成命令来安装**PaddlePaddle**就可以了，比方我的11.6生成的命令

1	python -m pip install paddlepaddle-gpu==2.3.2.post116 -f https://www.paddlepaddle.org.cn/whl/windows/mkl/avx/stable.html

装完这个之后需要安装C++ 编译环境，Windows 系统就需要安装 Visual Studio 来完成 C++ 编译环境的安装。
点击链接下载

安装完之后打开软件在可用里面找到C++的桌面开发安装才可以

装好C++环境，然后就可以直接运行命令安装**PaddleSpeech**

安装PaddleSpeech

pip安装
官方建议
我们建议在安装 paddlepaddle 的时候使用百度源 https://mirror.baidu.com/pypi/simple ，而在安装 paddlespeech 的时候使用清华源 https://pypi.tuna.tsinghua.edu.cn/simple 。

1 2	pip install pytest-runner pip install paddlespeech

源码编译

git clone https://github.com/PaddlePaddle/PaddleSpeech.git
cd PaddleSpeech
pip install pytest-runner
pip install .

原作用的是源码编译，而我直接用的pip命令安装，经过一段时间的跑吗安装就好了。

然后原作者直接测试然后CUDA报错，还需要安装CUDA计算的相关软件

安装CUDA与cuDNN

这里也是借鉴原作附的教程连接安装的，版本有一些对不上，我安装的是最新的11.8的。下载安装完成之后也是遇到与原作者相同的问题还需要安装一个dll文件

安装所需要的库文件

这里我直接把文件附一下吧下载

注意是把压缩包里的zlibwapi.dll文件放到C:\windows\system32下面，而我白痴的把整个文件夹解压放进去了，，，，，，，

测试样例

安装完成之后尝试一下基本的案例

ASR （Automatic Speech Recognition）

from paddlespeech.cli.asr.infer import ASRExecutor
asr = ASRExecutor()
result = asr(audio_file="data/zh.wav")
print(result)

这个代码的data/zh.wav 是文件位置，需要非常短的小的，文件才可以。后期原作者Transistor_Red也给了解决办法。

原文链接https://blog.csdn.net/qq\_42859445/article/details/126202593

这里先附上最终代码，在写下我遇到的问题

from paddlespeech.cli.asr.infer import ASRExecutor
import csv
# import moviepy.editor as mp
import auditok
import os
import paddle
from paddlespeech.cli.text.infer import TextExecutor

import soundfile
import librosa
import warnings
import time
warnings.filterwarnings('ignore')

'''
音频切分
'''
# 输入类别为audio
def qiefen(path, ty='audio', mmin_dur=1, mmax_dur=100000, mmax_silence=1, menergy_threshold=55):
    audio_file = path
    audio, audio_sample_rate = soundfile.read(
        audio_file, dtype="int16", always_2d=True)

    audio_regions = auditok.split(
        audio_file,
        min_dur=mmin_dur,  # minimum duration of a valid audio event in seconds
        max_dur=mmax_dur,  # maximum duration of an event
        # maximum duration of tolerated continuous silence within an event
        max_silence=mmax_silence,
        energy_threshold=menergy_threshold  # threshold of detection
    )

    for i, r in enumerate(audio_regions):
        # Regions returned by `split` have 'start' and 'end' metadata fields
        print(
            "Region {i}: {r.meta.start:.3f}s -- {r.meta.end:.3f}s".format(i=i, r=r))

        epath = ''
        file_pre = str(epath.join(audio_file.split('.')[0].split('/')[-1]))

        mk = 'change'
        if (os.path.exists(mk) == False):
            os.mkdir(mk)
        if (os.path.exists(mk + '/' + ty) == False):
            os.mkdir(mk + '/' + ty)
        if (os.path.exists(mk + '/' + ty + '/' + file_pre) == False):
            os.mkdir(mk + '/' + ty + '/' + file_pre)
        num = i
        # 为了取前三位数字排序
        s = '000000' + str(num)

        file_save = mk + '/' + ty + '/' + file_pre + '/' + \
                    s[-3:] + '-' + '{meta.start:.3f}-{meta.end:.3f}' + '.wav'
        filename = r.save(file_save)
        print("region saved as: {}".format(filename))
    return mk + '/' + ty + '/' + file_pre
'''
语音转文本
直接调用ASRExecutor进行语音到文本转换。
需要注意的是，此处 force_yes=True， 即强行进行音频频率转换，PaddleSpeech使用16000hz频率。如force_yes=False，则需要手动确认
'''
asr_executor = ASRExecutor()
def audio2txt(path):
    # 返回path下所有文件构成的一个list列表
    print(f"path: {path}")
    filelist = os.listdir(path)
    # 保证读取按照文件的顺序
    filelist.sort(key=lambda x: int(os.path.splitext(x)[0][:3]))
    # 遍历输出每一个文件的名字和类型
    words = []
    for file in filelist:
        print(path + '/' + file)
        text = asr_executor(
            audio_file=path + '/' + file,
            device=paddle.get_device(), force_yes=True) # force_yes参数需要注意
        words.append(text)
    return words

'''
保存
'''
def txt2csv(txt_all):
    with open('result.csv', 'w+', encoding='utf-8') as f:
        f_csv = csv.writer(f)
        for row in txt_all:
            f_csv.writerow([row])


#增加标点
#
# 拿到新生成的音频的路径
def add_punctuation(source_path='result.csv'):
    texts = ''
    with open(source_path, 'r') as f:
        text = f.readlines()
    f_=open("Final.txt","w+")
    count=0
    text_executor = TextExecutor()
    for i in range(len(text)):
        text[i] = text[i].replace('\n', '')
        if(text[i]):
            count+=1
            texts = texts + text[i]
        if(count>=5):
            print(texts)
            count=0
            result = text_executor(
                text=texts,
                task='punc',
                model='ernie_linear_p3_wudao',
                device=paddle.get_device(),
                # force_yes=True
            )
            texts=''
            f_.write(result+'\n')
    if(texts):
        result = text_executor(
            text=texts,
            task='punc',
            model='ernie_linear_p3_wudao',
            device=paddle.get_device(),
            # force_yes=True
        )
        texts = ''
        f_.write(result)
    f_.close()
    f.close()



if __name__ == '__main__':
    time1=time.time()
    source_path = 'E:/FFOutput/音频1.wav'
    # 划分音频
    path = qiefen(path=source_path, ty='audio',
                  mmin_dur=0.5, mmax_dur=50, mmax_silence=0.5, menergy_threshold=55)
    # 音频转文本  需要GPU
    txt_all = audio2txt(path)
    # 存入csv
    txt2csv(txt_all)
    add_punctuation()
    time2=time.time()
    cost=time2-time1
    print("#"*10+"Cost total time:{}s".format(cost)+"#"*10)
————————————————
版权声明：本文为CSDN博主「Transistor_Red」的原创文章，遵循CC 4.0 BY-SA版权协议，转载请附上原文出处链接及本声明。
原文链接：https://blog.csdn.net/qq_42859445/article/details/126202593

我是在Python自带的IDLE (Python 3.9 64-bit)里面运行的。

然后运行报错

百度了一下是缺少一些库

然后直接cmd运行pip命令安装了一下就成功运行了

一个官网是https://www.cnpython.com/pypi/auditok，命令如下

1	pip install auditok

还有一个官网是https://pypi.org/project/wxPython/#files，命令如下

1	pip install wxPython

这里注意原作者代码中源文件的位置是E:/FFOutput/音频1.wav，可自行更改，运行完成后会在文档里面生成一个**result.csv**文件，里面就是转换的文字啦。