音声認識・音声合成の変更点

追加された行はこの色です。
削除された行はこの色です。
音声認識・音声合成へ行く。
音声認識・音声合成の差分を削除
#author("2024-03-29T17:59:39+09:00","default:irrp","irrp")
#author("2024-04-03T08:54:22+09:00","default:irrp","irrp")
→音声処理関連

→自然言語処理

→画像認識／検出／トラッキング

→画像生成

#contents


*楽曲生成 [#gb42fa95]
-[[Suno AIで、音楽を自動生成する。>http://cedro3.com/ai/suno-ai/]] 2023.12

-[[AIで作曲できるツールが想像以上にクオリティ高くて驚く→有名コピペに音を付けられる一方倫理的な問題も - Togetter>https://togetter.com/li/2275669]] 2023.12



* 文字起こし／音声認識 [#h82fe814]
-[[[電話無人対応] Amazon Bedrock + Whisperで、名前のヒアリング精度を確認してみた[Amazon Connect] | DevelopersIO>https://dev.classmethod.jp/articles/amazon-connect-bedrock-whisper-name/]] 2024.4

-[[話した言葉を文字起こしするアプリの作成 #Python - Qiita>https://qiita.com/_mattu34_/items/fe2b5ccc5ab3b40c832a]] 2024.3

-[[【初心者向け】Pythonで簡単に音声認識精度をチェック！ | ジコログ>https://self-development.info/%e3%80%90%e5%88%9d%e5%bf%83%e8%80%85%e5%90%91%e3%81%91%e3%80%91python%e3%81%a7%e7%b0%a1%e5%8d%98%e3%81%ab%e9%9f%b3%e5%a3%b0%e8%aa%8d%e8%ad%98%e7%b2%be%e5%ba%a6%e3%82%92%e3%83%81%e3%82%a7%e3%83%83/]] 2024.2

-[[[Python]&#160;日本語をローマ字に変換する｜こはた>https://note.com/kohaku935/n/na39ef05d322c]] 2021

-[[ローマ字表記（ヘボン式・日本式・訓令式）の違いについて│旅する応用言語学>https://www.nihongo-appliedlinguistics.net/wp/archives/8135]] 2023.7

-[[自動文字起こしサービスである、OpenAIの「Whisper API」とAWSの「Amazon Transcribe」の精度を比較してみた | DevelopersIO>https://dev.classmethod.jp/articles/openai-whisper-api-amazon-transcribe/]] 2023.10
--Whisper APIとAmazon Transcribeを精度という点のみで比較すると、Whisper APIに軍配が上がります。
--しかし、Whisper APIは音声からの適切な変換に優れている一方で、句読点の欠如など読みにくさも見受けられました。
--それに対し、Amazon Transcribeでは句読点が挿入され読みやすくなっていますが、いくつかの音声内容が適切に文字起こしできていない箇所が存在します。

-[[ChatGPT, Python, Whisper APIを活用し、動画ファイルから議事録を自動生成 - Qiita>https://qiita.com/haku_104/items/c8e151feb7b53e551f16]] 2023.4

-[[Windowsで音声文字起こし　MyWhisper（マイウィスパー） - umiyuki - BOOTH>https://umiyuki.booth.pm/items/4663311]] 2023.4
--[[umiyuki/MyWhisper: WindowsでWhisper文字起こしできるアプリ>https://github.com/umiyuki/MyWhisper]] 

-[[ChatGPTによる構造化データの音声入力インターフェースが賢すぎる - Qiita>https://qiita.com/miyanaga/items/5538aabc5ac23782a97f]] 2023.3

-[[音声文字起こし技術で業務効率化: Google Text to Speech と OpenAI Whisper の活用 - STORES Product Blog>https://product.st.inc/entry/2023/03/17/153328]] 2023.3

-[[超高精度な国産音声認識AI「ReazonSpeech」が無償公開されたので文字起こし機能を使ってみた - GIGAZINE>https://gigazine.net/news/20230120-reazonspeech/]] 2023.1

-[[アマゾンのAWSで音声の文字起こしサービスを無料で試してみた | Ledge.ai>https://ledge.ai/amazon-transcribe-try/]] 2021.11
-- Amazon Transcribe

-[[アマゾンのAWSでテキストを解析してみた>https://ledge.ai/amazon-comprehend-try/]]
--Amazon Comprehendは、機械学習を使用してテキスト内のインサイトや関係性を検出できる自然言語処理（NLP）サービスです。キーフレーズ抽出、感情分析、実体認識、トピック形成、言語検出 API の利用ができ、アプリケーションへの統合もできます。


**Whisper [#mafa5a3a]
-[[AWS Lambda でOpenAI の Whisper API を 認識精度の改善も含めて試してみた | DevelopersIO>https://dev.classmethod.jp/articles/aws-lambda-openai-whisper-api/]] 2023.10

-[[Azure OpenAI Whisperの対応コーデックについて(ogaで起こせたよ) - APC 技術ブログ>https://techblog.ap-com.co.jp/entry/2023/09/22/160125]] 2023.9

-[[OpenAIのWhisperとChatGPTのAPIでGoogle Colab上で簡易なボイスボットを作る | 株式会社AI Shift>https://www.ai-shift.co.jp/techblog/3297]] 2023.3

-[[議事録作成の手間を解消？音声ファイルをChatGPTとWhisperで自動要約 / 開発者向けブログ・イベント | GMO Developers>https://developers.gmo.jp/31939/]] 2023.4

-[[Whisperで文字起こしをした議事録の発話者の名前を自動的に判定する！ - Qiita>https://qiita.com/sakasegawa/items/50d76ead3038e735e4fe]] 2023.4

-[[議事録作成の手間を解消？音声ファイルをChatGPTとWhisperで自動要約 - GMOインターネットグループ グループ研究開発本部（次世代システム研究室）>https://recruit.gmo.jp/engineer/jisedai/blog/eliminating-meeting-minutes-creation-hassles-automatic-summary-of-audio-files-using-chatgpt-and-whisper/]] 2023.3

-[[OpenAIが公開したChatGPTとWhisperのAPIをUnityでサクッと触れるようにした - Synamon’s Engineer blog>https://synamon.hatenablog.com/entry/openai_api_unity]] 2023.3

-[[文字起こしAI「Whisper」を誰でも簡単に使えるようにした超高精度文字起こしアプリ「writeout.ai」使い方まとめ、オープンソースでローカルでも動作OK - GIGAZINE>https://gigazine.net/news/20230309-writeout-ai/]] 2023.3
--[[beyondcode/writeout.ai: Transcribe and translate your audio files - for free>https://github.com/beyondcode/writeout.ai]] 2023.3

-[[OpenAIのWhisper APIの25MB制限に合うような調整を検討する | DevelopersIO>https://dev.classmethod.jp/articles/openai-api-whisper-about-data-limit/]] 2023.3

-[[Whisperを使ったリアルタイム音声認識と字幕描画方法の紹介 | さくらのナレッジ>https://knowledge.sakura.ad.jp/34497/#OBS_WebSocket]] 2023.2

-[[Whisper + GPT-3 で会議音声からの議事録書き出し&サマリ自動生成をやってみる！ - Qiita>https://qiita.com/sakasegawa/items/3855472a8566ea302a99]] 2023.2

-[[GitHub - ggerganov/whisper.cpp: Port of OpenAI's Whisper model in C/C++>https://github.com/ggerganov/whisper.cpp]] 2022.12

-[[PCで再生中の音声をWhisperでリアルタイムに文字起こしする - TadaoYamaokaの開発日記>https://tadaoyamaoka.hatenablog.com/entry/2022/10/15/175722]] 2022.10

-[[音声認識モデルwhisperの全モデル文字起こし比較 - 毎日がEveryday、日々 Day by Day>https://ysdyt.hatenablog.jp/entry/whisper]] 2022

-[[OpenAIリリースのWhisperをCPUだけで動かすために色々試した話 | DevelopersIO>https://dev.classmethod.jp/articles/openai_whisper_only_local_cpu/]] 10

-[[【Whisper】Webアプリ（GUIデモ）のインストール | ジコログ>https://self-development.info/%e3%80%90whisper%e3%80%91web%e3%82%a2%e3%83%97%e3%83%aa%ef%bc%88gui%e3%83%87%e3%83%a2%ef%bc%89%e3%81%ae%e3%82%a4%e3%83%b3%e3%82%b9%e3%83%88%e3%83%bc%e3%83%ab/]] 2022.10

-[[音声認識モデル”Whisper”をストリーミング処理対応させる方法 | DevelopersIO>https://dev.classmethod.jp/articles/whisper-streaming/]] 2022.10
-[[文字起こしAIで誰でも無料でYoutubeの字幕ファイルを作る方法 - ニートの言葉>https://blog.takuya-andou.com/entry/youtube_whisper2]] 2022.10
-[[OpenAIの音声認識Whisperを使って好きな洋画やアニメから英語教材を自作する - Qiita>https://qiita.com/daipop/items/de7791f49f86097ce4f0]] 2022.10

-[[高精度な文字起こしAIでYoutubeの字幕を作ってみた - ニートの言葉>https://blog.takuya-andou.com/entry/youtube_whisper]] 2022.9

-[[【Python】AI音声認識Whisperを使ったSRT字幕ファイルの自動作成 | ジコログ>https://self-development.info/%e3%80%90python%e3%80%91ai%e9%9f%b3%e5%a3%b0%e8%aa%8d%e8%ad%98whisper%e3%82%92%e4%bd%bf%e3%81%a3%e3%81%9fsrt%e5%ad%97%e5%b9%95%e3%83%95%e3%82%a1%e3%82%a4%e3%83%ab%e3%81%ae%e8%87%aa%e5%8b%95%e4%bd%9c/]] 2022.9

-[[無料でOpenAIの「Whisper」を使って録音ファイルから音声認識で文字おこしする方法まとめ - GIGAZINE>https://gigazine.net/news/20220929-openai-whisper-install-and-usage/]] 2022.9

-[[OpenAIがリリースした高精度な音声認識モデル”Whisper”を使って、オンライン会議の音声を書き起こししてみた | DevelopersIO>https://dev.classmethod.jp/articles/whisper-trial-japanese/]] 2022.9

-[[ほぼ完璧な文字起こしができるAI音声認識Whisperのインストール | ジコログ>https://self-development.info/%e3%81%bb%e3%81%bc%e5%ae%8c%e7%92%a7%e3%81%aa%e6%96%87%e5%ad%97%e8%b5%b7%e3%81%93%e3%81%97%e3%81%8c%e3%81%a7%e3%81%8d%e3%82%8bai%e9%9f%b3%e5%a3%b0%e8%aa%8d%e8%ad%98whisper%e3%81%ae%e3%82%a4%e3%83%b3/]] 2022.9

-[[OpenAIの音声認識Whisperがすごいので，Youtube用に字幕生成してみた - Qiita>https://qiita.com/walnut-pro/items/69864b0a074bd773711f]] 2022.9
-[[OpenAIの音声認識Whisperがすごいので，Google Colabで試してみた（Webアプリを作ってデモ編） - Qiita>https://qiita.com/walnut-pro/items/4b57c3cb7a9446f63c21]] 2022.9
-[[OpenAIの音声認識Whisperがすごいので，Google Colabで試してみた - Qiita>https://qiita.com/walnut-pro/items/0124a5a0c83c9b4e2669]] 2022.9




* 読み上げ／音声合成／音声言語処理／音声分離 [#q61b527a]
-[[3秒の音声があれば本人そっくりの声で日本語・英語・中国語合成できる「VALL-E X」はやはり脅威。MSが非公開にした技術のOSS版を試して実感した（CloseBox） | テクノエッジ TechnoEdge>https://www.techno-edge.net/article/2023/08/28/1812.html]] 2023.8

-[[Azure : Speech to Text と OpenAI で動画・音声からテキストを生成 - Qiita>https://qiita.com/yoshioterada/items/78f8e80228e790b0b5d2]] 2023.6

-[[【AIボイスチェンジャー】確実にわかるRVCの使い方 | ジコログ>https://self-development.info/%e3%80%90ai%e3%83%9c%e3%82%a4%e3%82%b9%e3%83%81%e3%82%a7%e3%83%b3%e3%82%b8%e3%83%a3%e3%83%bc%e3%80%91%e7%a2%ba%e5%ae%9f%e3%81%ab%e3%82%8f%e3%81%8b%e3%82%8brvc%e3%81%ae%e4%bd%bf%e3%81%84%e6%96%b9/]] 2023.4

-[[無料で自由に使える簡単操作のボーカルリムーバー | ジコログ>https://self-development.info/%e7%84%a1%e6%96%99%e3%81%a7%e8%87%aa%e7%94%b1%e3%81%ab%e4%bd%bf%e3%81%88%e3%82%8b%e7%b0%a1%e5%8d%98%e6%93%8d%e4%bd%9c%e3%81%ae%e3%83%9c%e3%83%bc%e3%82%ab%e3%83%ab%e3%83%aa%e3%83%a0%e3%83%bc%e3%83%90/]] 2023.4

-[[文章から音楽を生成するRiffusionのインストール | ジコログ>https://self-development.info/%e6%96%87%e7%ab%a0%e3%81%8b%e3%82%89%e9%9f%b3%e6%a5%bd%e3%82%92%e7%94%9f%e6%88%90%e3%81%99%e3%82%8briffusion%e3%81%ae%e3%82%a4%e3%83%b3%e3%82%b9%e3%83%88%e3%83%bc%e3%83%ab/]] 2022.12

-[[【藤本健のDigital Audio Laboratory】AIでボーカル・ドラムを取り出す、無料音声分離「Demucs」を試す-AV Watch>https://av.watch.impress.co.jp/docs/series/dal/1460920.html]] 2022.12

-[[AIが音楽に変える！「text2music」でツイートから音楽を作ってみよう - Qiita>https://qiita.com/rayuron/items/b7238b6de52ecab55a21]] 2022.12
-[[img2musicで、画像から音楽を生成する>http://cedro3.com/ai/img2music/]] 2022.10

-[[テキストから音楽を作成するMubert-Text-to-Musicのインストール | ジコログ>https://self-development.info/%e3%83%86%e3%82%ad%e3%82%b9%e3%83%88%e3%81%8b%e3%82%89%e9%9f%b3%e6%a5%bd%e3%82%92%e4%bd%9c%e6%88%90%e3%81%99%e3%82%8bmubert-text-to-music%e3%81%ae%e3%82%a4%e3%83%b3%e3%82%b9%e3%83%88%e3%83%bc%e3%83%ab/]] 2022.10

-[[GitHub - MubertAI/Mubert-Text-to-Music: A simple notebook demonstrating prompt-based music generation via Mubert API>https://github.com/MubertAI/Mubert-Text-to-Music]] 2022.10

-[[AudioGen: Textually Guided Audio Generation>https://felixkreuk.github.io/text2audio_arxiv_samples/]] 2022.9

-[[【Wav2LipによるAI動画編集】動画の人物を無理やりしゃべらせる | ジコログ>https://self-development.info/%e3%80%90wav2lip%e3%81%ab%e3%82%88%e3%82%8bai%e5%8b%95%e7%94%bb%e7%b7%a8%e9%9b%86%e3%80%91%e5%8b%95%e7%94%bb%e3%81%ae%e4%ba%ba%e7%89%a9%e3%82%92%e7%84%a1%e7%90%86%e3%82%84%e3%82%8a%e3%81%97%e3%82%83/]] 2022.9

-[[【Pythonで音声合成（テキスト読み上げ）】gTTSのインストール | ジコログ>https://self-development.info/%e3%80%90python%e3%81%a7%e9%9f%b3%e5%a3%b0%e5%90%88%e6%88%90%ef%bc%88%e3%83%86%e3%82%ad%e3%82%b9%e3%83%88%e8%aa%ad%e3%81%bf%e4%b8%8a%e3%81%92%ef%bc%89%e3%80%91gtts%e3%81%ae%e3%82%a4%e3%83%b3%e3%82%b9/]] 2022.9

-[[AIで音楽をボーカル・ドラム・ベース・その他に分離できる「Demucs」【レビュー】 - 窓の杜>https://forest.watch.impress.co.jp/docs/review/1437871.html]] 2022.9

-[[音声読み上げアプリ作成 PySimpleGUI, gTTS, Python | みやしんのプログラミングスキル通信>https://miyashinblog.com/text_to_speech_appli/]] 2022.8

-[[How To Transcribe Your Podcast with Python - DEV Community>https://dev.to/deepgram/how-to-transcribe-your-podcast-with-python-32i1]] 2022.8

-[[Python の SpeechRecognizer を用いて音声認識（SpeechRecognizer，Python を使用）（Windows 上）>https://www.kkaneko.jp/tools/win/speechrecog.html]] 2022.8

-[[Creating Your Own Voice Assistant in Python - DEV Community>https://dev.to/codesphere/creating-your-own-voice-assistant-in-python-jfm]] 2022.7

-[[[M1] 音声認識ツール Voskを動かす [Node] | DevelopersIO>https://dev.classmethod.jp/articles/vosk/]] 2022.7
-[[日本語音声のマイク入力をオフラインでリアルタイム音声認識：「VOSK」を JavaScript（Node.js）で扱う - Qiita>https://qiita.com/youtoy/items/649dcad9ecccf75a9d01]] 2022.6

-[[ZOOMの日本語音声を無料で英語に翻訳した字幕をつける。 - Qiita>https://qiita.com/shigeshigeshige/items/ffffd4ea9e29895c5135]] 2022.5
--AzureのSpeech translationの無料枠を使用

-[[VOICEPEAKの音声にほぼドンピシャの字幕ファイルを作成するPythonスクリプト - Qiita>https://qiita.com/mochi_gu_ma/items/a5a9d59865062c7479d3]] 2022.3

-[[入力文字読み上げソフト『VOICEPEAK』を試してみた | DevelopersIO>https://dev.classmethod.jp/articles/tried-using-voicepeak/]] 2022.3

-[[読み上げテキスト>http://www.vector.co.jp/soft/cmt/winnt/art/se201341.html]]

-[[青空ろーどく>http://sites.google.com/site/aozorarohdoku/]]
--青空文庫の読み上げ

-http://www35.atwiki.jp/softalk/
--Softalk テキスト読み上げソフト（ゆっくりしていってね！の声とも言われる）

-[[ボカロ(作るところから)はじめました>http://d.hatena.ne.jp/yaneurao/20140420#p1]] 2014.4.20
-[[青空文庫や六法のオーディオブックを無料で作る方法>http://denspe.blog84.fc2.com/blog-entry-104.html]]
-[[Microsoft Speech Platform の日本語音声合成エンジン>http://denspe.blog84.fc2.com/blog-entry-103.html]]

-[[Windows10,WSL2でESPNetのVITS学習レシピを実行する【音声合成】 - Qiita>https://qiita.com/seichi25/items/bde466744f9b3190b0d3]] 2022.3
-[[パソコンにしゃべらせてみよう>http://www.geocities.co.jp/dwakahara/synthesizer/synthesizer.htm]]
-[[AquesTalk>http://www.a-quest.com/products/aquestalk.html]]
--Softalkなどで使われているテキスト読み上げ機能のライブラリ


** OpenAI Text-To-Speech [#t4725ec4]
-[[Google Colab で OpenAI API の Text-to-Speech を試す｜npaka>https://note.com/npaka/n/nba4af88eb3cf]] 2023.11

-[[GPTのAPIとText2Speechを組み合わせてAIとの会話体験を実装してみる | DevelopersIO>https://dev.classmethod.jp/articles/gpt-api-and-text2speech-talk-with-gpt/]] 2023.11

-[[OpenAI Text-to-Speech（TTS）API の使い方や料金について｜ChatGPT研究所>https://chatgpt-lab.com/n/n37d40b690344]] 2023.11

-[[Google Colab で OpenAI API の Text-to-Speech を試す｜npaka>https://note.com/npaka/n/nba4af88eb3cf]] 2023.11



** Amazon Polly [#ib3c05d5]
-[[Amazon PollyのSSMLを利用し、住所を自然な日本語の発音になるようチューニングしてみた | DevelopersIO>https://dev.classmethod.jp/articles/amazon-polly-ssml-address/]] 2024.3

-[[ChatGPT + Amazon Polly + Android で AI 音声アシスタントを作り、一番おすすめのうどん屋を聞く - Intelligent Technology's Technical Blog>https://iti.hatenablog.jp/entry/2023/05/24/132328]] 2023.5

-[[[初心者向け] Amazon Polly を使って ChatGPT を体感してみる | DevelopersIO>https://dev.classmethod.jp/articles/introduction-chatgpt-polly-python/]] 2023.3
-[[Amazon Polly に歌わせて VTuber デビューさせてみた - builders.flash☆ - 変化を求めるデベロッパーを応援するウェブマガジン | AWS>https://aws.amazon.com/jp/builders-flash/202301/amazon-polly-vtuber/?awsf.filter-name=*all]] 2023.1
-[[Amazon Pollyを使ってAIに音声を読み上げしてもらおう！ - M&Aクラウド開発者ブログ>https://tech.macloud.jp/entry/2023/01/31/135029]] 2023.1
-[[AI音声のAmazon Pollyを使ってみた! | DevelopersIO>https://dev.classmethod.jp/articles/trying_out_amazon_polly_ai_voice_by_hugo_obuchi/]] 2022.9
音声認識・音声合成 の変更点

音声認識・音声合成の変更点