#] #] ********************* #] "$d_web"'System_maintenance/voice-to-txt/0_voice-to-txt notes.txt' - ??? # www.BillHowell.ca 05Jun2023 initial # view in text editor, using constant-width font (eg courier), tabWidth = 3 #48************************************************48 #24************************24 # Table of Contents, generate with : # $ grep "^#]" "$d_web"'System_maintenance/voice-to-txt/0_voice-to-txt notes.txt' | sed "s/^#\]/ /" # ********************* "$d_web"'System_maintenance/voice-to-txt/0_voice-to-txt notes.txt' - ??? 05Jun2023 search "software for voice -> text?" Brian Turner: Best speech-to-text apps of 2023 Andrea Hernandez: The Best 7 Free and Open Source Speech Recognition Software Solutions Mehedi Hasan: Top 10 Best Open Source Speech Recognition Tools for Linux DK Bose: Speech recognition in Ubuntu: convert from audio to text Franck Dernoncourt: Is there any decent speech recognition software for Linux? #24************************24 # Setup, ToDos, #08********08 #] ??Jun2023 #08********08 #] ??Jun2023 #08********08 #] ??Jun2023 #08********08 #] ??Jun2023 #08********08 #] 05Jun2023 SpeechNotes transcription https://speechnotes.co/files/guide/ 18$CAD for 2 hours transcription of voice files test example: OK, no good with 2nd voice (sounding distant) #08********08 #] 05Jun2023 search "voice file to text" +-----+ https://flixier.com/tools/convert-audio-to-text Sound to text Are you looking for a way to generate transcripts of your voice overs, podcasts or meetings quickly and easily? Look no further! The Flixier free audio to text converter helps you generate transcripts of your audio recordings and conversations quickly and easily in minutes. And the best part is that it all runs in your web browser so you don’t have to worry about downloading or installing anything to your computer. Just log in, upload your audio or video file, click the Transcribe button and sit back while our software gives you a perfect transcript of the audio that you can then edit and save to your device! How to convert audio to text: 1 Upload To start converting your audio to text with Flixier, just click the Transcribe or Get Started buttons above. Then, drag your audio (or video!) files over to the browser window or press the “click to upload” butto 2 Transcribe After the file has uploaded just click the “Generate” button, your file will be processed and the transcription will show up on the left side of the screen. If needed you can also make changes to the text before you download it. 3 Save To download your audio transcript just click the Download button on the lower left part of the screen. You can choose between downloading a text file or subtitle file from the dropdown above the download button. >> 300$US/yr!!! 08********08 #] 05Jun2023 search "software for voice -> text?" +-----+ https://www.techradar.com/news/best-speech-to-text-app #] Brian Turner: Best speech-to-text apps of 2023 last updated 29 days ago Free, paid and online voice recognition apps and services +--+ Best paid for speech to text apps 1. Dragon Anywhere Best mobile speech-to-text app Today's Best Deals Anywhere 1 month US$14.99/mth Anywhere 12 months US$149.99/year Visit Site at Nuance Reasons to buy +High quality speech recognition +Syncs with desktop Dragon software +Excellent recognition +Fully functional app Reasons to avoid -Dictation limited to within the app 2. Dragon Professional Business-grade speech-to-text solution Today's Best Deals View at Amazon View at Newegg Canada View at Amazon Reasons to buy +Powerful features +Designed for pros +160 wpm dictation Reasons to avoid -Outdated UI -Weak recording transcription Should you be looking for a business-grade dictation application, your best bet is Dragon Professional. Aimed at pro users, the software provides you with the tools to dictate and edit documents, create spreadsheets, and browse the web using your voice. 3. Otter The big little speech to text app Today's Best Deals VISIT WEBSITE Reasons to buy +Free tier +Team collaboration +Export options +Live captioning Reasons to avoid -No live chat support Otter is a cloud-based speech to text program especially aimed for mobile use, such as on a laptop or smartphone. The app provides real-time transcription, allowing you to search, edit, play, and organize as required. 4. Verbit The smart speech to text service Today's Best Deals VISIT WEBSITE Reasons to buy +Enterprise service +Team working +Smart AI Reasons to avoid -Not always live Verbit aims to offer a smarter speech to text service, using AI for transcription and captioning. The service is specifically targeted at enterprise and educational establishments. 5. Speechmatics Leading speech recognition technology Today's Best Deals VISIT WEBSITE Reasons to buy +Supports different accents +Media captioning +Keyword triggers Reasons to avoid -No free option -No out-of-the-box solutions Speechmatics offers a machine learning solution to converting speech to text, with its automatic speech recognition solution available to use on existing audio and video files as well as for live use. Unlike some automated transcription software which can struggle with accents or charge more for them, Speechmatics advertises itself as being able to support all major British accents, regardless of nationality. That way it aims to cope with not just different American and British English accents, but also South African and Jamaican accents. 6. Braina Pro A virtual assistant for your PC Today's Best Deals VISIT WEBSITE Reasons to buy +Powerful digital assistant +Nifty Android app for remotely controlling PC Reasons to avoid -Subscription only (no one-off purchase) Braina Pro is speech recognition software which is built not just for dictation, but also as an all-round digital assistant to help you achieve various tasks on your PC. It supports dictation to third-party software in not just English but almost 90 different languages, with impressive voice recognition chops. 7. Amazon Transcribe Cloud-based speech to text technology Today's Best Deals VISIT WEBSITE Reasons to buy +Vocabulary editing +Audio for apps +Recognizes speakers and channels Reasons to avoid -Not idea for consumers Amazon Transcribe is as big cloud-based automatic speech recognition platform developed specifically to convert audio to text for apps. It especially aims to provide a more accurate and comprehensive service than traditional providers, such as being able to cope with low-fi and noisy recordings, such as you might get in a contact center. Amazon Transcribe uses a deep learning process that automatically adds punctuation and formatting, as well as process with a secure livestream or otherwise transcribe speech to text with batch processing. As well as offering time stamping for individual words for easy search, it can also identify different speaks and different channels and annotate documents accordingly to account for this. There are also some nice features for editing and managing transcribed texts, such as vocabulary filtering and replacement words which can be used to keep product names consistent and therefore any following transcription easier to analyze. Overall, Amazon Transcribe is one of the most powerful platforms out there, though it’s aimed more for the business and enterprise user rather than the individual. 8. Microsoft Azure Speech to Text Part of the Azure platform's Cognitive Services Today's Best Deals VISIT WEBSITE Reasons to buy +Real time transcription +Customization for proper nouns +Handles multiple speakers Reasons to avoid -Complicated set-up Microsoft's Azure cloud service offers advanced speech recognition as part of the platform's speech services to deliver the Microsoft Azure Speech to Text functionality. This feature allows you to simply and easily create text from a variety of audio sources. There are also customization options available to work better with different speech patterns, registers, and even background sounds. You can also modify settings to handle different specialist vocabularies, such as product names, technical information, and place names. The Microsoft's Azure Speech to Text feature is powered by deep neural network models and allows for real-time audio transcription that can be set up to handle multiple speakers. As part of the Azure cloud service, you can run Azure Speech to Text in the cloud, on premises, or in edge computing. In terms of pricing, you can run the feature in a free container with a single concurrent request for up to 5 hours of free audio per month. 9. IBM Watson Speech to Text Today's Best Deals VISIT WEBSITE Reasons to buy +Machine learning +Batch conversions +Range of output options +Smart formatting Reasons to avoid -More expensive than AWS/Google -Multi-speaker recognition is hit and miss IBM's Watson Speech to Text works is the third cloud-native solution on this list, with the feature being powered by AI and machine learning as part of IBM's cloud services. While there is the option to transcribe speech to text in real-time, there is also the option to batch convert audio files and process them through a range of language, audio frequency, and other output options. You can also tag transcriptions with speaker labels, smart formatting, and timestamps, as well as apply global editing for technical words or phrases, acronyms, and for number use. As with other cloud services Watson Speech to Text allows for easy deployment both in the cloud and on-premises behind your own firewall to ensure security is maintained. https://www.techradar.com/reviews/watson-speech-to-text-review Watson Speech to Text: Plans and pricing You can use Watson Speech to Text to process up to 500 minutes of audio for free per month. If you want to convert more than that, you’ll need to pay for each audio minute, and the rate changes based on the duration of audio processed. Costs range from $0.01 to $0.02 per minute, and there’s an add-on charge of $0.03 per minute if you require IBM’s Custom Language Model. Premium quote-only Watson plans are available too, and these grant access to enhanced data privacy features and uptime guarantees. Unlike consumer-facing voice-to-text apps, Watson’s services are designed to be accessed through APIs and code embedded in other systems. For this reason, there’s no real Watson “interface”. Instead, Watson can be accessed through three different internet protocols. These are WebSockets, REST API, and Watson Developer Cloud +--+ Best free speech to text apps 1. Google Gboard Easily accessible text to speech Today's Best Deals VISIT WEBSITE Reasons to buy +Free +Easy to use +Additional features Reasons to avoid -No shortcut commands If you already have an Android mobile device, then if it's not already installed then download Google Keyboard from the Google Play store and you'll have an instant text-to-speech app. Although it's primarily designed as a keyboard for physical input, it also has a speech input option which is directly available. And because all the power of Google's hardware is behind it, it's a powerful and responsive tool. If that's not enough then there are additional features. Aside from physical input ones such as swiping, you can also trigger images in your text using voice commands. Additionally, it can also work with Google Translate, and is advertised as providing support for over 60 languages. Even though Google Keyboard isn't a dedicated transcription tool, as there are no shortcut commands or text editing directly integrated, it does everything you need from a basic transcription tool. And as it's a keyboard, it means should be able to work with any software you can run on your Android smartphone, so you can text edit, save, and export using that. Even better, it's free and there are no adverts to get in the way of you using it. 2. Just Press Record (iOS only!) A cloud-based transcription tool Today's Best Deals VISIT WEBSITE Reasons to buy +Easy-to-use +Underpinned by the cloud +Multilingual Reasons to avoid -No Android app If you want a dedicated dictation app, it’s worth checking out Just Press Record. It’s a mobile audio recorder that comes with features such as one tap recording, transcription and iCloud syncing across devices. The great thing is that it’s aimed at pretty much anyone and is extremely easy to use. When it comes to recording notes, all you have to do is press one button, and you get unlimited recording time. However, the really great thing about this app is that it also offers a powerful transcription service. Through it, you can quickly and easily turn speech into searchable text. Once you’ve transcribed a file, you can then edit it from within the app. There’s support for more than 30 languages as well, making it the perfect app if you’re working abroad or with an international team. Another nice feature is punctuation command recognition, ensuring that your transcriptions are free from typos. This app is underpinned by cloud technology, meaning you can access notes from any device (which is online). You’re able to share audio and text files to other iOS apps too, and when it comes to organizing them, you can view recordings in a comprehensive file. 3. Speechnotes (does transcription!! use [lap, desk]top!??) Powered by Google technology Today's Best Deals VISIT WEBSITE Reasons to buy +Built-in Google voice recognition tech +Recognizes punctuation marks +Easy to use Reasons to avoid -No iOS app Speechnotes is yet another easy to use dictation app. A useful touch here is that you don’t need to create an account or anything like that; you just open up the app and press on the microphone icon, and you’re off. The app is powered by Google voice recognition tech. When you’re recording a note, you can easily dictate punctuation marks through voice commands, or by using the built-in punctuation keyboard. To make things even easier, you can quickly add names, signatures, greetings and other frequently used text by using a set of custom keys on the built-in keyboard. There’s automatic capitalization as well, and every change made to a note is saved to the cloud. When it comes to customizing notes, you can access a plethora of fonts and text sizes. The app is free to download from the Google Play Store, but you can make in-app purchases to access premium features (there's also a browser version for Chrome). 4. Transcribe (iOS only! nyet) Artificial intelligence-powered dictation software Today's Best Deals VISIT WEBSITE Reasons to buy +AI tech +Recognizes videos and voice memos +User-friendly Reasons to avoid -No Android option Marketed as a personal assistant for turning videos and voice memos into text files, Transcribe is a popular dictation app that’s powered by AI. It lets you make high quality transcriptions by just hitting a button. The app can transcribe any video or voice memo automatically, while supporting over 80 languages from across the world. While you can easily create notes with Transcribe, you can also import files from services such as Dropbox. Once you’ve transcribed a file, you can export the raw text to a word processor to edit. The app is free to download, but you’ll have to make an in-app purchase if you want to make the most of these features in the long-term. There is a trial available, but it’s basically just 15 minutes of free transcription time. Transcribe is only available on iOS, though Mobile speech to text apps to consider Aside from what has already been covered above, there are an increasing number of apps available across all mobile devices for working with speech to text, not least because Google's speech recognition technology is available for use. iTranslate Translator is a speech-to-text app for iOS with a difference, in that it focuses on translating voice languages. Not only does it aim to translate different languages you hear into text for your own language, it also works to translate images such as photos you might take of signs in a foreign country and get a translation for them. In that way, iTranslate is a very different app, that takes the idea of speech-to-text in a novel direction, and by all accounts, does it well. ListNote Speech-to-Text Notes is another speech-to-text app that uses Google's speech recognition software, but this time does a more comprehensive job of integrating it with a note-taking program than many other apps. The text notes you record are searchable, and you can import/export with other text applications. Additionally there is a password protection option, which encrypts notes after the first 20 characters so that the beginning of the notes are searchable by you. There's also an organizer feature for your notes, using category or assigned color. The app is free on Android, but includes ads. Voice Notes is a simple app that aims to convert speech to text for making notes. This is refreshing, as it mixes Google's speech recognition technology with a simple note-taking app, so there are more features to play with here. You can categorize notes, set reminders, and import/export text accordingly. SpeechTexter is another speech-to-text app that aims to do more than just record your voice to a text file. This app is built specifically to work with social media, so that rather than sending messages, emails, Tweets, and similar, you can record your voice directly to the social media sites and send. There are also a number of language packs you can download for offline working if you want to use more than just English, which is handy. Also consider reading these related software and app guides: +-----+ https://www.goodfirms.co/speech-recognition-software/blog/best-free-open-source-speech-recognition-software #] Andrea Hernandez: The Best 7 Free and Open Source Speech Recognition Software Solutions Published on :March 30, 2019 1 Simon Simon is considered very flexible speech recognition software meant for the free and open source. It allows customization for any applications wherever speech recognition is required. It can work with any dialect and is not bound to any language. It can replace the mouse and keyboard. Simon makes use of KDE libraries, CMU SPHINX or Julius together with the HTK and it runs on Windows and Linux. One can open the URLs and programs, type configurable text snippets, control the mouse and keywords and simulate shortcuts. It turns audio into text and allows voice commands. You can check out Simon if you would like to talk to your computer. >> Howell: really for voice-commands it seems?? 2 Kaldi Kaldi is an open source speech recognition software that is freely available under the Apache License. In John Hopkins University, the development fired up at a workshop in 2009 that called “Low Development Cost, High-Quality Speech Recognition for New Languages and Domains.” On May 14, 2011, the code for Kaldi was released after working on the project for a few years. Quickly Kaldi gained a reputation for its ease to work with. It is written in C++ and is intended to be used mainly for acoustic modeling research. 3 CMUSphinx The short form of CMUSphinx is Sphinx. It is a speaker-independent large vocabulary continuous speech recognizer that is released under the BSD style license. This is a group of speech recognition systems which is developed by the Carnegie Mellon University. 4 Mozilla An open source voice recognition tool is released by the Mozilla that it states is “close to the human level performance.” It is free speech recognition software for developers to plug into their projects. Mozilla Senior Vice President of Emerging Technologies Sean White wrote in a blog post that “We at Mozilla believe technology should be open and accessible to all, and that includes voice.” 5 Julius ulius is measured as the free high-performance and two-pass large vocabulary continuous speech recognition decoder software (LVCSR) for speech-related developers and researchers. It carries out multi-model decoding, a recognition utilizing some LMs and AMs concurrently with a single processor. At run time it supports the “hot plugging” of arbitrary modules. 6 Dictation Bridge Dictation Bridge is a free and open source dictation solution for NVDA and Jaws. It is a gateway between NVDA, Jaws screen readers, either Dragon Naturally Speaking or Windows Speech Recognition. Both Windows Speech Recognition and Dragon can be controlled by Jaws users. In Dragon and Windows Speech Recognition (WSR) it can echoes back the dictated text. It serves as an extensive collection of verbal commands that can control screen readers and perform a variety of other tasks with Dragon products. +-----+ https://www.ubuntupit.com/best-open-source-speech-recognition-tools-for-linux/ #] Mehedi Hasan: Top 10 Best Open Source Speech Recognition Tools for Linux Updated: February 25, 2022 1. Kaldi - Kaldi is a special kind of speech recognition software, started as a part of a project at John Hopkins University. This toolkit comes with an extensible design and written in C++ programming language. It provides a flexible and comfortable environment to its users with a lot of extensions to enhance the power of Kaldi. Besides the speech recognition system, it also supports deep neural networks and linear transforms. 2. CMUSphinx - CMUS Sphinx comes with a group of featured-enriched systems with several pre-built packages related to speech recognition. It is an open source program, developed at Carnegie Mellon University. You will get this speaker-independent recognition tool in several languages, including French, English, German, Dutch, and more. 3. DeepSpeech - DeepSpeech is an open source speech recognition engine to convert your speech to text. It is a free application by Mozilla. To run DeepSearch project to your device, you will need Python 3.r or above. Also, it needs a Git extension file, namely Git Large File Storage. It is used for versioning large files while you run it to your system. DeepSpeech uses TensorFlow framework to make the voice transformation more comfortable. It supports NVIDIA GPU, which helps to perform quicker inference. You can use the DeepSearch inference in three different ways; The Python package, Node.JS package, or Command-line client. Each time you want to run this software to your system, you’ll need to activate the virtual environment by Python command. It needs a Linux or Mac environment to run this application. 4. Wav2Letter++ - WavLetter++ is a modern and popular speech recognition tool, developed by the Facebook AI Research team. It is another open source program under the BCD license. This superfast voice recognition software was built in C++ and introduced with a lot of features. It provides the facility of language modeling, machine translation, speech synthesis, and more to its users in a flexible environment. 5. Julius - Julius is comparatively an older open source voice recognition software developed by Lee Akinobu. This tool is written in the C programming language by the developers of Kawahara Lab, Kyoto University. It is a high-performance speech recognition application having a large vocabulary. You can use it in both English and Japanese languages. It can be a great choice if you want to use it for academic and research purposes. 6. Simon - Simon comes with a modern and easy-to-use speech recognition software, developed by Peter Grasch. It is another open source program under the GNU General Public License. You are free to use Simon in both Linux and Windows systems. Also, it provides the flexibility to work with any language you want. 7. Mycroft - Mycroft comes with an easy-to-use open source voice assistant for converting voice to text. It is regarded as one of the most popular Linux speech recognition tools in modern time, written in Python. It allows users to make the best use of this tool in a science project or enterprise software application. Also, it can be used as a practical assistant, that can tell you the time, date, weather, and more like these. 8. OpenMindSpeech - Open Mind Speech is one of the essential Linux speech recognition tools aims to convert your speech to text for free. It is a part of Open Mind Initiative, runs its operation, especially for developers. This program was introduced with different names like VoiceControl, SpeechInput, and FreeSpeech before getting the present name. 9. SpeechControl - Speech Control is a free speech recognition application, suitable for any Ubuntu distro. It comes with a graphical user interface based on Qt. Though it is still in its early development stage, you can use it for your simple project. 10. Deepspeech.pytorch - Deepspeech.pytorch is another mentionable open source speech recognition application which is ultimately implementation of DeepSpeech2 for PyTorch. It contains a set of powerful networks based DeepSpeech2 architecture. With many helpful resources, it can be used as one of the essential Linux speech recognition tools for research and project development. +-----+ https://askubuntu.com/questions/1142862/speech-recognition-in-ubuntu-convert-from-audio-to-text #] DK Bose: Speech recognition in Ubuntu: convert from audio to text Asked 4 years ago Modified 4 years ago Viewed 2k times I am using the following commands: ffmpeg -i file.mp3 -ar 16000 -ac 1 file.wav pocketsphinx_continuous -infile file.wav 2> pocketsphinx.log > result.txt All things work fine, but I want to know if there is a script of commands to convert all audio files in a loop for example, and if it is possible to support other languages in pocketsphinx such as Arabic and French and how to do that. soundffmpegconvert edited May 13, 2019 at 11:40 DK Bose 40.8k2222 gold badges120120 silver badges212212 bronze badges asked May 13, 2019 at 11:37 sahir kumar >> Howell: pocketsphinx_continuous?? don't see other references to this/ +-----+ https://unix.stackexchange.com/questions/256138/is-there-any-decent-speech-recognition-software-for-linux #] Franck Dernoncourt: Is there any decent speech recognition software for Linux? asked Jan 18, 2016 at 18:04 Asked 7 years, 4 months ago Modified 17 days ago Viewed 80k times The short version of the question: I am looking for a speech recognition software that runs on Linux and has decent accuracy and usability. Any license and price is fine. It should not be restricted to voice commands, as I want to be able to dictate text. More details: I have unsatisfyingly tried the following: CMU Sphinx CVoiceControl Ears Julius Kaldi (e.g., Kaldi GStreamer server) IBM ViaVoice (used to run on Linux but was discontinued years ago) NICO ANN Toolkit OpenMindSpeech RWTH ASR shout silvius (built on the Kaldi speech recognition toolkit) Simon Listens ViaVoice / Xvoice Wine + Dragon NaturallySpeaking + NatLink + dragonfly + damselfly https://github.com/DragonComputer/Dragonfire: only accepts voice commands All the above-mentioned native Linux solutions have both poor accuracy and usability (or some don't allow free-text dictation but only voice commands). By poor accuracy, I mean an accuracy significantly below the one the speech recognition software I mentioned below for other platforms have. As for Wine + Dragon NaturallySpeaking, in my experience it keeps crashing, and I don't seem to be the only one to have such issues unfortunately. On Microsoft Windows I use Dragon NaturallySpeaking, on Apple Mac OS X I use Apple Dictation and DragonDictate, on Android I use Google speech recognition, and on iOS I use the built-in Apple speech recognition. Baidu Research released yesterday the code for its speech recognition library using Connectionist Temporal Classification implemented with Torch. Benchmarks from Gigaom are encouraging as shown in the table below, but I am not aware of any good wrapper around to make it usable without quite some coding (and a large training data set): System Clean (94) Noisy (82) Combined (176) Apple Dictation 14.24 43.76 26.73 Bing Speech 11.73 36.12 22.05 Google API 6.64 30.47 16.72 wit.ai 7.94 35.06 19.41 Deep Speech 6.56 19.06 11.85 Table 4: Results (%WER) for 3 systems evaluated on the original audio. All systems are scored only on the utterances with predictions given by all systems. The number in the parentheses next to each dataset, e.g. Clean (94), is the number of utterances scored. There exist some very alpha open-source projects: https://github.com/mozilla/DeepSpeech (part of Mozilla's Vaani project: http://vaani.io (mirror)) https://github.com/pannous/tensorflow-speech-recognition Vox, a system to control a Linux system using Dragon NaturallySpeaking: https://github.com/Franck-Dernoncourt/vox_linux + https://github.com/Franck-Dernoncourt/vox_windows https://github.com/facebookresearch/wav2letter https://github.com/espnet/espnet http://github.com/tensorflow/lingvo (to be released by Google, mentioned at Interspeech 2018) I am also aware of this attempt at tracking states of the arts and recent results (bibliography) on speech recognition. as well as this benchmark of existing speech recognition APIs. I am aware of Aenea, which allows speech recognition via Dragonfly on one computer to send events to another, but it has some latency cost: enter image description here I am also aware of these two talks exploring Linux option for speech recognition: 2016 - The Eleventh HOPE: Coding by Voice with Open Source Speech Recognition (David Williams-King) 2014 - Pycon: Using Python to Code by Voice (Tavis Rudd) software-recspeech-recognition edited Dec 18, 2020 at 20:17 Roman Riabenko 2,10533 gold badges1414 silver badges3838 bronze badges asked Jan 18, 2016 at 18:04 Franck Dernoncourt +-----+ https://www.voicetyper.com/ Windows only # enddoc