#] #] ********************* #] "$d_SysMaint"'audio/0_audio notes.txt' # www.BillHowell.ca 13Jul2018 initial # see also "$d_bin"'audio cut.sh' #24************************24 # List of operators, generated with : # $ grep "^#]" "$d_SysMaint"'audio/0_audio notes.txt' | sed 's/^#\]/ /' # ********************* "$d_SysMaint"'audio/0_audio notes.txt' +-----+ Tricks 02Apr2022 To [record, extract, convert] audio, use bash scripts : "$d_bin"'video production/audio [capture, catenate, convert, ?extract?].sh' 03Apr2022 record voice on Suse (ThinkPad) - too much latency on LMDE (Dell64) 09Jun2023 Transcribe via online speechnotes: Next time specify 10 different speakers, not 4 20Nov2023 cut voice file : see "$d_bin"'audio cut.sh' +-----+ 09Jun2023 Thomy Nilsson - Thinking About Consciousness and your "image-ination" (youTube) 09Jun2023 search "Linux Mint avconv and why isn't it available?" 02Apr2022 command line record mp3 file - DON'T use arecord followed by lame 02Apr2022 use Mpg123 to Play MP3 From Command Line 01Apr2022 search "Linux command line audio player mp3" use vlc from command line, nvlc 01Apr2022 audio format - [mp3, wav] formats NOT supported by LibreOffice Impress !! 31Mar2022 audio concatenate, ffmpeg or just playfileList in audacious 01Dec2021 search "Linux ffmpeg and how do I combine and audio inputs?" 22Oct2021 search "Linux speech recognition software" 22Oct2021 search "Linux and speech to text software for audio files" LMDE software manager download : Python-pocketsphinx 21Nov2019 No audio after using alsamixer & pulse audio volumn control 04Nov2019 garbled-echoed audio 13Jul2018 Samsung call recordings - .amr file format (Adaptive Multi Rate (AMR) speech codec) #24************************24 #] +-----+ #] Tricks #] 02Apr2022 To [record, extract, convert] audio, use bash scripts : #] "$d_bin"'video production/audio [capture, catenate, convert, ?extract?].sh' #] 03Apr2022 record voice on Suse (ThinkPad) - too much latency on LMDE (Dell64) #] 09Jun2023 Transcribe via online speechnotes: Next time specify 10 different speakers, not 4 #] 20Nov2023 cut voice file : see "$d_bin"'audio cut.sh' #] +-----+ #24************************24 08********08 #] ??Jun2023 08********08 #] ??Jun2023 08********08 #] ??Jun2023 08********08 #] ??Jun2023 08********08 #] ??Jun2023 08********08 #] ??Jun2023 08********08 #] ??Jun2023 08********08 #] ??Jun2023 08********08 #] 06Jun202 general audio recordings from browser vids, Edo Kaal see "$d_bin"'5_google speech-to-text notes.txt' 08********08 #] 09Jun2023 Thomy Nilsson - Thinking About Consciousness and your "image-ination" (youTube) https://www.youtube.com/watch?v=4xpCBqPTYl4 KEI Network 18 views Jun 8, 2023 The KEInetwork host, Perry Kinkaide, interviewed Dr. Thomy Nilsson, Professor Emeritis, University of Prince Edward Island about consciousness and how the brain's visual system processes sensory input. The discussion supported by informative illustrations, notes the distinction between being awake - conscious, and attention with some reference as well to "free will" and a "conscience." 01:04:00 1964 lines, 01:05:47 1968? vision experiment, Infero-Temporal Cortex, spacing of parallel lines & Fourier Transforms 01:18:00 Resonant glutamate-Z9F interaction Sony digicorder 01:01:58-01:49:47, started recording ~11:29 chop off "dead end" of recording : 48:26 "$d_PROJECTS"'Personal & Events/Sony digicorde/230609_1157 Nilsson, KEI- Thinking About Consciousness.mp3' LMDE Software Manager download avconv : nyet ffmpeg only now... $ ffmpeg -i "$d_PROJECTS"'Personal & Events/Sony digicorder/230609_1157 Nilsson, KEI- Thinking About Consciousness.mp3' -c:a copy -ss 00:00:05 -to 00:48:26 "$d_temp"'230609_1157 Nilsson, KEI- Thinking About Consciousness.mp3' Transcribe via online speechnotes (Google sign-in) ~10$/70Mb upload ~25min? very long transcribe ~13:30 to 13:48 (email notification is great!) (speechnotes.co says roughly 1/2 of upload time...) >> Howell: pretty good results. Next time specify 10 different speakers, not 4 08********08 #] 09Jun2023 search "Linux Mint avconv and why isn't it available?" install WinFF graphical, hope [ffmpeg, avconv] are in package $ ffmpeg -i "$d_PROJECTS"'Personal & Events/Sony digicorder/230609_1157 Nilsson, KEI- Thinking About Consciousness.mp3' -c:a copy -ss 00:00:05 -to 00:48:26 "$d_temp"'230609_1157 Nilsson, KEI- Thinking About Consciousness.mp3' >> worked great +-----+ https://www.linuxquestions.org/questions/showthread.php?p=5398372#post5398372 +-----+ Old 03-07-2015, 08:42 PM #2 SteveM777 LQ Newbie I solved it. sudo add-apt-repository ppa:jon-severinsson/ffmpeg sudo apt-get update sudo apt-get install ffmpeg +-----+ 06-15-2015, 04:25 PM #3 SaintDanBert Senior Member I did a lot of searching online and found another implementation: Code: prompt$ sudo apt-add-repository ppa:samrog131/ppa prompt$ sudo apt-get update prompt$ sudo apt-get install ffmpeg-real prompt$ sudo ln -sf /opt/ffmpeg/bin/ffmpeg /usr/bin/ffmpeg NOTE -- that the PPA is different, and that it wants a different application name, ffmpeg-real. The resulting app reports as follows: 08********08 #] 02Apr2022 command line record mp3 file - DON'T use arecord followed by lame # [.mp3, .wav] formats NOT supported by LibreOffice Impress !! # .mp3 not supported by arecord # so arecord .wav using Yeti mic, then convert to .mp3 with ffmpeg see "$d_bin"'video production/audio capture.sh' : audio_record() { d_wav="$d_webRawe"'Bill Howells videos/220331 Hydrogen future Alberta/voice audio/z_Archive' d_mp3="$d_webRawe"'Bill Howells videos/220331 Hydrogen future Alberta/voice audio/' arecord -t wav -r 19200 "$d_wav$1"'.wav' ffmpeg "$d_wav$1"'.wav' -vn -ar 44100 -ac 2 -b:a 192k "$d_mp3$1"'.mp3' } +-----+ https://jordilin.wordpress.com/2006/07/28/howto-recording-audio-from-the-command-line/ arecord -f cd -t raw | lame -x -r – out.mp3 Arecord captures the audio that goes through your computer and pipes it to the lame encoder, so you encode the audio directly to an mp3 file. You can specify more options to the lame encoder such as the bitrate with lame -x -b bitrate. Without specifying the bitrate it encodes to 128kbps constant bit rate cbr. If you want to record for an specific amount of time then: arecord -f cd -d numberofseconds -t raw | lame -x -r – out.mp3 >> what a mess, just record wav and convert with ffmpeg to mp3 as I did 08********08 #] 02Apr2022 use Mpg123 to Play MP3 From Command Line https://linuxhint.com/play_mp3_files_commandline/ This is a very simple tool to play an MP3 file. It doesn’t come pre-installed with most of the distro. To install it, use your package manager’s search function to find for mpg123. It’s highly likely that you’ll find it by the exact name. Assuming that you have the tool installed, let’s get started. For playing an MP3 file, the command structure for this tool goes like this. $ mpg123 # mpg123 "$d_webRawe"'Bill Howells videos/220331 Hydrogen future Alberta/voice audio/Howell - hydrogen future Alberta, 2.0 hydrogen safey intro.mp3' >> AWESOME! use this 08********08 #] 01Apr2022 search "Linux command line audio player mp3" use vlc from command line, nvlc >> OK cd just play directly from audacious, but that is crappy 08********08 #] 01Apr2022 audio format - [mp3, wav] formats NOT supported by LibreOffice Impress !! see "$d_bin"'video production/audio convert.sh' https://www.hifiberry.com/docs/software/record-audio-on-your-raspberry-pi/ Record audio on your Raspberry Pi - HiFiBerry [Search domain hifiberry.com] https://www.hifiberry.com › docs › software › record-audio-on-your-raspberry-pi arecord -d 60 -c 2 recording.wav This will record one minute of audio and save it to a WAV file. While arecord does not support file formats like MP3, OGG or FLAC, you can use additional tools (e.g. SoX) to convert this recording to another format. https://ourcodeworld.com/articles/read/1435/how-to-convert-a-wav-file-to-mp3-using-ffmpeg $ ffmpeg -i input-file.wav -vn -ar 44100 -ac 2 -b:a 192k output-file.mp3 >> try $ ffmpeg -i "$d_voice"'Howell - hydrogen future Alberta, 1.0 intro.wav' -vn -ar 44100 -ac 2 -b:a 192k "$d_voice"'Howell - hydrogen future Alberta, 1.0 intro.mp3' 08********08 #] 31Mar2022 audio concatenate, ffmpeg or just playfileList in audacious https://stackoverflow.com/questions/66803400/how-to-concatenate-a-list-of-wav-files I have a list of wav files in file.txt 1.wav 2.wav ... I used the below command to perform the action. ffmpeg -f concat -safe 0 -i file.txt -c copy output.wav However I get [concat @ 0x5574a8046900] Line 1: unknown keyword 1.wav: Invalid data found when processing input Why does this not work? asked Mar 25, 2021 at 16:08 user avatar Joey Joestar +-+ Based on this wiki page, you need to modify your file.txt as follows: file '1.wav' file '2.wav' # ... etc The command itself, should work fine. Excerpts from the page: ... This demuxer reads a list of files and other directives from a text file and demuxes them one after the other, as if all their packets had been muxed together. ... Instructions Create a file mylist.txt with all the files you want to have concatenated in the following form (lines starting with a # are ignored): # this is a comment file '/path/to/file1.wav' file '/path/to/file2.wav' file '/path/to/file3.wav' edited Mar 25, 2021 at 16:35 answered Mar 25, 2021 at 16:28 user avatar kevinnls 08********08 #] 01Dec2021 search "Linux ffmpeg and how do I combine and audio inputs?" +-----+ https://unix.stackexchange.com/questions/60980/how-to-merge-two-audio-input-source-using-avconv I routinely do this with ffmpeg, which has similar filters—so maybe this helps you. This assumes screen capture & pc audio in screen.avi, microphone capture in mic.wav. ffmpeg -i screen.avi -i mic.wav -filter_complex '[0:1][1:0]amix=inputs=2:duration=first[all_audio]' -map 0:0 -map '[all_audio]' -vcodec libx264 -crf 28 -preset slow -acodec mp3 out.avi I think -map picks channels to go into the output, so if I was debugging your original command line, I think you should lose -map 1:0 -map 2:0 (which is mapping the unmerged inputs into your output) and should instead name the output of the amix plugin (for instance to all_audio as in my example) and have a -map '[all_audio]' (to map the merged audio into your output). But I do not know avconv. edited Jul 16 '14 at 7:58, Anthon answered Jul 16 '14 at 7:37, Colin Phipps +-----+ https://linuxpip.org/ffmpeg-combine-audio-video/ FFmpeg recipe : combine separate audio and video by diehard Hey! I’m Daan. I work as a SysAdmin in the Netherlands. Whenever I find free time, I write about IT, Linux, Open Source and hardware on this site. &&&&&&&&& 01Dec2021 Howell blog comment : This is fantastic - thanks for the help. I still have to reduce the "choppiness" of the audio, but that will have to wait for another day. +--+ Combine video with multiple audio using FFmpeg >> I didn't copy this, bcould be VERY helpful with video productions!! +--+ Replace audio in files with another audio stream using FFmpeg With the -map option, you are able to select any stream, no matter it's video, audio, subtitle or metadata, to include in the output file. Just like replacing audio stream with a separate audio file, you can leverage -map option to simultaneously extract audio stream in another video file and combine with the video stream to make another file. A command line example would look like this : ffmpeg -i INPUT_FILE1.mp4 -i INPUT_FILE2.mp4 -c:v copy -c:a copy -map 0:v:0 -map 1:a:0 OUTPUT_FILE.mp4 -map 0:v:0 means that select the first input file (INPUT_FILE.mp4, index number 0), then select the first (0) video stream. The first number (0) is the index of the first input file, the latter is the index number of the video stream. -map 1:a:0, similarly, means select from the second input file (INPUT_FILE2.mp4, index number 1) the first audio stream (index number 0) to include in the output file. To re-encode the audio or video stream with a different codec, replace copy with the name of an audio encoder, such as aac or libvorbis. +--+ Extract video from file and combine with another audio file using FFmpeg If your video file already contains one or more audio stream and you want to replace the default audio with another (external) audio file, you need to use FFmpeg's -map option. The -map option is used to choose which streams from the input/inputs to be included in the output/outputs. The -map option can also be used to exclude specific streams with negative mapping. Let's suppose we have a video file with audio named video.mp4 and an audio file named audio.aac encoded with the AAC codec. The example will replace the audio in video.mp4 with audio.aac and output OUTPUT_FILE.mp4. ffmpeg -i INPUT_FILE.mp4 -i AUDIO.aac -c:v copy -c:a copy -map 0:v:0 -map 1:a:0 OUTPUT_FILE.mp4 -map 0:v:0 means that select the first input file (INPUT_FILE.mp4), then select the first (0) video stream. The first number (0) is the index of the first input file, the latter is the index number of the video stream. -map 1:a:0, similarly, means select from the second file (index number 1) the first audio stream (index number 0) to include in the output file. +--+ Combine separate video and audio using FFmpeg If you have separate audio and video file, and the video file contains no audio, you can use this command to combine them into one video file. Please do note that the container (file extension) must accept the video and audio codec. To ensure compatibility, we recommend using MP4 or MKV as the container. Let's suppose our video file name is INPUT_FILE.mp4 and the audio file name is AUDIO.wav. If you want to combine them AND re-encode the audio to AAC format, you can use this command : ffmpeg -i INPUT_FILE.mp4 -i AUDIO.wav -c:v copy -c:a aac OUTPUT_FILE.mp4 -i INPUT_FILE.mp4 specify INPUT_FILE.mp4 as an input source Similarly, -i AUDIO.wav tells FFmpeg to take AUDIO.wav as an input source. -c:v copy is a short form of -codec:v copy which means copy the video stream from the source files to the destination file. -c:a aac means select all the audio stream from source files, then encode it with AAC encoder. In case you don't want any audio conversion, just drop the aac part in the command and replace it with copy, so it would look like this ffmpeg -i INPUT_FILE.mp4 -i AUDIO.aac -c:v copy -c:a copy OUTPUT_FILE.mp4 08********08 #] 22Oct2021 search "Linux speech recognition software" Hmm, I don't want to do research simply converting my Sony digicorder files to text. Look at these, and Google : Online large files 3) Speechmatics - large vocabulary transcription in the cloud, US and UK English, high accuracy. >> https://www.speechmatics.com/product/features/ web congestion? couldn't access? 4) Vocapia Speech to Text API - not very user friendly, but a good technology >> https://www.vocapia.com/speech-to-text-services.html 5) Otter Voice Meeting Notes - very good quality of transcription Try mSamsung cellphone!? +-----+ https://lilyspeech.com/loc/?pg_title=speech-recognition-linux Powered by Google's 99.5% accurate Chrome speech to text service and the AutoHotkey language. https://unix.stackexchange.com/questions/256138/is-there-any-decent-speech-recognition-software-for-linux https://www.quora.com/What-is-the-best-speech-recognition-software-for-Linux?share=1 +--+ Gary V Deutschmann Sr, CEO - Classic Haus Limited, L.C. Answered 8 months ago · Author has 3.3K answers and 931.3K answer views The group I talk with most often claim only two are super. DeepSpeech, and wave2letter++ I’ve never used either of those, nor do I use speech recognition anymore. So I can only go by what others have said in meetings. +--+ Profile photo for Nickolay Shmyrev Nickolay Shmyrev, Vosk Developer Updated Sep 14, 2021 · Upvoted by Korakot Chaovavanich, Master Degree in Speech and NLP · Author has 388 answers and 1.2M answer views What are the top ten speech recognition APIs? Update September 2021 Looks like Google has big troubles to deploy a new technology, while everyone else is updated. From my estimation major providers won over 30% accuracy and Google stays in 2019. Even open source libraries grow out Google accuracy. So Google is not recommended anymore. Microsoft and Amazon show very good results. For Chinese Alibaba has really good recognition rates, one can consult for details. GitHub - speechio/leaderboard: The project has been moved to www.github.com/SpeechColab/Leaderboard speechio/leaderboard: The project has been moved to www.github.com/SpeechColab/Leaderboard Update for 2020 My biased list for February 2020 (a bit different from 2017, significantly different from 2015) Online short utterance 1) Google Speech API - best speech technology. Supports variety of languages, has speaker separation. Disadvantages - not very stable, can drop words sometimes if it can’t recognizer 2) Microsoft Cognitive Services - Bing Speech API, same from Microsoft, many different nice addons like voice authentication There are also offerings from Amazon, Facebook and many others. Online large files 3) Speechmatics - large vocabulary transcription in the cloud, US and UK English, high accuracy. 4) Vocapia Speech to Text API - not very user friendly, but a good technology 5) Otter Voice Meeting Notes - very good quality of transcription Offline Proprietary 6) Speech Engine_IFLYTEK CO.,LTD. not very well known Chinese company, but it continuously excels in competitions. 7) Picovoice/cheetah - amazingly lightweight but accurate system for mobile applications and embedded systems. Open Source 8) Vosk - - offline speech recognition based on Kaldi, due to low resource requirements can be used on mobile. Supports 7 major languages out of the box. Works on RPi, Android phones, etc. Has speaker identification support. 9) Kaldi - speech recognition toolkit for research. Quite complicated but very powerful. 10) facebookresearch/wav2letter Facebook is very innovative in speech lately, not very useful in production but very interesting from research point of view. You need a lot of computing power to use this though. There are many others worth attention in open source domain - espnet, essen, NeMO from Nvidia. 08********08 #] 22Oct2021 search "Linux and speech to text software for audio files" Sony digicorder duckduckgo not working today? #] LMDE software manager download : Python-pocketsphinx http://cmusphinx.sourceforge.net/ +-----+ https://www.ubuntupit.com/best-open-source-speech-recognition-tools-for-linux/ Mehdi Hasan,?date? 1. Kaldi, John Hopkins University Kaldi is a special kind of speech recognition software, started as a part of a project at John Hopkins University. This toolkit comes with an extensible design and written in C++ programming language. It provides a flexible and comfortable environment to its users with a lot of extensions to enhance the power of Kaldi. 2. CMUSphinx, Carnegie Mellon University CMUS Sphinx comes with a group of featured-enriched systems with several pre-built packages related to speech recognition. It is an open source program, developed at Carnegie Mellon University. You will get this speaker-independent recognition tool in several languages, including French, English, German, Dutch, and more. 3. DeepSpeech, Mozilla (?) ?link - I forgot? DeepSpeech uses TensorFlow framework to make the voice transformation more comfortable. It supports NVIDIA GPU, which helps to perform quicker inference. You can use the DeepSearch inference in three different ways; The Python package, Node.JS package, or Command-line client. Each time you want to run this software to your system, you’ll need to activate the virtual environment by Python command. It needs a Linux or Mac environment to run this application 4. Wav2Letter++ WavLetter++ is a modern and popular speech recognition tool, developed by the Facebook AI Research team. It is another open source program under the BCD license. This superfast voice recognition software was built in C++ and introduced with a lot of features. It provides the facility of language modeling, machine translation, speech synthesis, and more to its users in a flexible environment. 5. Julius Julius is comparatively an older open source voice recognition software developed by Lee Akinobu. This tool is written in the C programming language by the developers of Kawahara Lab, Kyoto University. It is a high-performance speech recognition application having a large vocabulary. You can use it in both English and Japanese languages. It can be a great choice if you want to use it for academic and research purposes. 6. Simon Simon comes with a modern and easy-to-use speech recognition software, developed by Peter Grasch. It is another open source program under the GNU General Public License. You are free to use Simon in both Linux and Windows systems. Also, it provides the flexibility to work with any language you want. 7. Mycroft Mycroft comes with an easy-to-use open source voice assistant for converting voice to text. It is regarded as one of the most popular Linux speech recognition tools in modern time, written in Python. It allows users to make the best use of this tool in a science project or enterprise software application. Also, it can be used as a practical assistant, that can tell you the time, date, weather, and more like these. 8. OpenMindSpeech Open Mind Speech is one of the essential Linux speech recognition tools aims to convert your speech to text for free. It is a part of Open Mind Initiative, runs its operation, especially for developers. This program was introduced with different names like VoiceControl, SpeechInput, and FreeSpeech before getting the present name. 9. SpeechControl Speech Control is a free speech recognition application, suitable for any Ubuntu distro. It comes with a graphical user interface based on Qt. Though it is still in its early development stage, you can use it for your simple project. 10. Deepspeech.pytorch Deepspeech.pytorch is another mentionable open source speech recognition application which is ultimately implementation of DeepSpeech2 for PyTorch. It contains a set of powerful networks based DeepSpeech2 architecture. With many helpful resources, it can be used as one of the essential Linux speech recognition tools for research and project development. +-----+ https://fosspost.org/open-source-speech-recognition/ ?author, date? Project DeepSpeech https://github.com/mozilla/DeepSpeech This project is made by Mozilla, the organization behind the Firefox browser. It’s a 100% free and open source speech-to-text library that also implies the machine learning technology using TensorFlow framework to fulfill its mission. In other words, you can use it to build training models by yourself to enhance the underlying speech-to-text technology and get better results, or even to bring it to other languages if you want. You can also easily integrate it to your other machine learning projects that you are having on TensorFlow. Sadly it sounds like the project is currently only supporting English by default. It’s also available in many languages such as Python (3.6). Kaldi https://kaldi-asr.org/ Kaldi is an open source speech recognition software written in C++, and is released under the Apache public license. It works on Windows, macOS and Linux. Its development started back in 2009. Kaldi’s main features over some other speech recognition software is that it’s extendable and modular: The community is providing tons of 3rd-party modules that you can use for your tasks. Kaldi also supports deep neural networks, and offers an excellent documentation on its website. While the code is mainly written in C++, it’s “wrapped” by Bash and Python scripts. So if you are looking just for the basic usage of converting speech to text, then you’ll find it easy to accomplish that via either Python or Bash. You may also wish to check Kaldi Active Grammar, which is a Python pre-built engine with English trained models already ready for usage. Julius https://github.com/julius-speech/julius Probably one of the oldest speech recognition software ever, as its development started in 1991 at the University of Kyoto, and then its ownership was transferred to as an independent project in 2005. A lot of open source applications use it as their engine (Think of KDE Simon). Julius main features include its ability to perform real-time STT processes, low memory usage (Less than 64MB for 20000 words), ability to produce N-best/Word-graph output, ability to work as a server unit and a lot more. This software was mainly built for academic and research purposes. It is written in C, and works on Linux, Windows, macOS and even Android (on smartphones). Currently it supports both English and Japanese languages only. The software is probably available to install easily using your Linux distribution’s repository; Just search for julius package in your package manager. Wav2Letter++ https://github.com/facebookresearch/wav2letter If you are looking for something modern, then this one is for you. Wav2Letter++ is an open source speech recognition software that was released by Facebook’s AI Research Team just 2 months ago. The code is released under the BSD license. Facebook is describing its library as “the fastest state-of-the-art speech recognition system available”. The concepts on which this tool is built makes it optimized for performance by default; Facebook’s also-new machine learning library FlashLight is used as the underlying core of Wav2Letter++. Wav2Letter++ needs you first to build a training model for the language you desire by yourself in order to train the algorithms on it. No pre-built support of any language (including English) is available. It’s just a machine-learning-driven tool to convert speech to text. It was written in C++, hence the name (Wav2Letter++). DeepSpeech2 https://github.com/PaddlePaddle/DeepSpeech Researchers at the Chinese giant Baidu are also working on their own speech-to-text engine, called DeepSpeech2. It’s an end-to-end open source engine that uses the “PaddlePaddle” deep learning framework for converting both English & Mandarin Chinese languages speeches into text. The code is released under BSD license. The engine can be trained on any model and for any language you desire. The models are not released with the code. You’ll have to build them yourself, just like the other software. DeepSpeech2‘s source code is written in Python, so it should be easy for you to get familiar with it if that’s the language you use. OpenSeq2Seq https://nvidia.github.io/OpenSeq2Seq/html/speech-recognition.html Developed by NVIDIA for sequence-to-sequence models training. While it can be used for way more than just speech recognition, it is a good engine nonetheless for this use case. You can either build your own training models using it, or use Jasper, Wave2Letter+ and DeepSpeech2 models which are shipped by default. It supports parallel processing using multiple GPUs/Multiple CPUs, besides a heavy support for some NVIDIA technologies like CUDA and its strong graphics cards. Fairseq https://github.com/pytorch/fairseq Another sequence-to-sequence toolkit. Developed by Facebook and written in Python and the PyTorch framework. Also supports parallel training. Can be even used for translation and more complicated language processing tasks. Vosk https://alphacephei.com/vosk/ One of the newest open source speech recognition systems, as its development just started in 2020. Unlike other systems in this list, Vosk is quite ready to use after installation, as it supports 10 languages (English, German, French, Turkish…) with portable 50MB-sized models already available for users (There are other larger models up to 1.4GB if you need). It also works on Raspberry Pi, iOS and android devices, and provides a streaming API which allows you to connect to it to do your speech recognition tasks online. Vosk has bindings for Java, Python, JavaScript, C# and NodeJS. Athena https://github.com/athena-team/athena An end-to-end speech recognition engine which implements ASR (Automatic speech recognition). Written in Python and licensed under the Apache 2.0 license. Supports unsupervised pre-training and multi-GPUs processing. Built on the top of TensorFlow. Visit Athena source code. ESPnet https://espnet.github.io/espnet/ Written in Python on the top of PyTorch. Also supports end-to-end ASR. It follows Kaldi style for data processing, so it would be easier to migrate from it to ESPnet. The main marketing point for ESPnet is the state-of-art performance it gives in many benchmarks, and its support for other language processing tasks such as text-to-speech (STT), machine translation (MT) and speech translation (ST). Licensed under the Apache 2.0 license. What is the Best Open Source Speech Recognition System? If you are building a small application which you want to be portable everywhere, then Vosk is your best option, as it is written in Python and works on iOS, android and Raspberry pi too, and supports up to 10 languages. It also provides a huge training dataset if you shall need it, and a smaller one for portable applications. If, however, you want to train and build your own models for much complex tasks, then any of Fairseq, OpenSeq2Seq, Athena and ESPnet should be more than enough for your needs, and they are the most modern state-of-the-art toolkits. As for Mozilla’s DeepSpeech, it lacks a lot of features behind its other competitors in this list, and isn’t really cited a lot in speech recognition academic research like the others. And its future is concerning after the recent Mozilla restructure, so one would want to stay away from it for now. Traditionally, Julius and Kaldi are also very much cited in the academic literature. Alternatively, you may try these open source speech recognition libraries to see how they work for you in your use case. Blogs M.Hanny Sabbagh April 11, 2019 Lootosee All these projects seem pretty useless if they are Those projects are simply not for regular people, they are for programmers and those who are building a system that requires speech renegotiation, then they can use those systems instead of the proprietary ones. +--+ https://www.goodfirms.co/blog/best-free-open-source-speech-recognition-software wouldn't load ... 08********08 #] 21Nov2019 No audio after using alsamixer & pulse audio volumn control restarted alsamixer & pavuc : >> didn't help pavuc : I turned off Built-in audio EMU20k2 [Sound Blaster X-Fi Titanium Series] - Analog Stereo Duplex 08********08 #] 04Nov2019 garbled-echoed audio https://zillowtech.com/ubuntu-no-sound.html Now run the following command in the terminal: alsaloop sudo alsa force-reload >> this worked for me. The problem might have been cause with an LMDE update today, which somehow reset alsamixer? 08********08 #] 13Jul2018 Samsung call recordings - .amr file format (Adaptive Multi Rate (AMR) speech codec) I already have this : Libopencore-amrwb0 Adaptive multi-rate - wideband speech codec - shared library Score: Installed This library contains an implementation of the 3GPP TS 26.173 specification for the Adaptive Multi-Rate - Wideband (AMR-WB) speech decoder. The implementation is derived from the OpenCORE framework, part of the Google Android project. This package contains the decoder shared library. $ man # enddoc