#] ********************* #] "$d_SysMaint"'audio/5_google speech-to-text notes.txt' # www.BillHowell.ca 06Jan2024 initial # view in text editor, using constant-width font (eg courier), tabWidth = 3 #48************************************************48 #24************************24 # Table of Contents, generate with : # $ grep "^#]" "$d_SysMaint"'audio/5_google text-to-speech notes.txt' # #24************************24 # Setup, ToDos, #08********08 #] ??Jan2024 #08********08 #] ??Jan2024 #08********08 #] ??Jan2024 #08********08 #] 07Jan2024 find links to Google transcript setup example links : setup Google transcripts for large audio files (>1 minute) : Google calls it "Asynchronous Recognition" 've lost track of the round-about process to setup [payment, setting], activate, bucket] https://cloud.google.com/speech-to-text/?hl=en https://console.cloud.google.com/storage/browser/howell/transcripts;tab=objects?project=prefab-range-410420&pageState=(%22StorageObjectListTable%22:(%22f%22:%22%255B%255D%22))&prefix=&forceOnObjectsSortingFiltering=false use transcriptions after setup : https://console.cloud.google.com/speech/transcriptions/list?project=prefab-range-410420 #08********08 #] 06Jan2024 $ bash "$d_bin"'speech to text.sh' 20:32$ "$d_bin"'speech to text.sh' /home/bill/web/bin/speech to text.sh: 26: source: not found /home/bill/web/bin/speech to text.sh: 43: cannot create : Directory nonexistent /home/bill/web/bin/speech to text.sh: 44: cannot create : Directory nonexistent /home/bill/web/bin/speech to text.sh: 45: cannot create : Directory nonexistent /home/bill/web/bin/speech to text.sh: 46: cannot create : Directory nonexistent >> what? environment variables OK >> oops - was missing word 'bash' in command 20:33$ bash "$d_bin"'speech to text.sh' pJson= /home/bill/web/ProjMini/Kaal- Structured Atom Model/vidAudios/transcripts_Edwin Kaal The Proton-Electron Atom.mp3.cut_transcript_65b68b98-0000-2f49-9bfb-582429cb079c.json /home/bill/web/bin/speech to text.sh: line 43: : No such file or directory /home/bill/web/bin/speech to text.sh: line 44: : No such file or directory /home/bill/web/bin/speech to text.sh: line 45: : No such file or directory /home/bill/web/bin/speech to text.sh: line 46: : No such file or directory 08********08 #] 06Jun202 general audio recordings from browser vids, Edo Kaal see "$d_bin"'video production/audio capture.sh' d_wav="$d_web"'ProjMini/Kaal- Structured Atom Model/vidAudios/' d_mp3="$d_wav" audio_record 'Edwin Kaal: The Proton-Electron Atom' start simultaneously : video $ bash "$d_bin"'video production/audio capture.sh' -> browserVid >> oops - didn't get back to stop in time, must cut $ bash "$d_bin"'audio cut.sh' audio_cut() { # no change in codec, -to is stop position (time) : d_audio="$d_web"'ProjMini/Kaal- Structured Atom Model/vidAudios/' pSource="$d_audio"'Edwin Kaal: The Proton-Electron Atom.mp3' pOutput="$pSource"'.cut.mp3' ffmpeg -i "$pSource" -ss 00:00:00 -to 00:41:25 "$pOutput" } >> works OK "$d_SysMaint"'audio/5_google text-to-speech notes.txt' see 06Jan2024 try google speech-to-text on 'Edwin Kaal: The Proton-Electron Atom.mp3.cut.mp3' #08********08 #] 06Jan2024 try google speech-to-text on 'Edwin Kaal: The Proton-Electron Atom.mp3.cut.mp3' $ bash "$d_bin"'audio cut.sh' # use 'https://cloud.google.com/try-speech-to-text' to product text https://cloud.google.com/speech-to-text/ uploaded "$d_web"'ProjMini/Kaal- Structured Atom Model/vidAudios/Edwin Kaal: The Proton-Electron Atom.mp3.cut.mp3 >> error message : Edwin Kaal: The Proton-Electron Atom.mp3.cut.mp3: Your audio is longer than the 1 minute limit. Try a shorter file. What!?!? can't do useful voice file!!! with simple approach +-----+ https://cloud.google.com/speech-to-text/docs/speech-to-text-requests Speech requests Speech-to-Text has three main methods to perform speech recognition. These are listed below: Synchronous Recognition (REST and gRPC) sends audio data to the Speech-to-Text API, performs recognition on that data, and returns results after all audio has been processed. Synchronous recognition requests are limited to audio data of 1 minute or less in duration. Asynchronous Recognition (REST and gRPC) sends audio data to the Speech-to-Text API and initiates a Long Running Operation. Using this operation, you can periodically poll for recognition results. Use asynchronous requests for audio data of any duration up to 480 minutes. Streaming Recognition (gRPC only) performs recognition on audio data provided within a gRPC bi-directional stream. Streaming requests are designed for real-time recognition purposes, such as capturing live audio from a microphone. Streaming recognition provides interim results while audio is being captured, allowing result to appear, for example, while a user is still speaking. Requests contain configuration parameters as well as audio data. The following sections describe these type of recognition requests, the responses they generate, and how to handle those responses in more detail. I need "Asynchronous Recognition" - <= 480 minutes search "google speech-to-text Asynchronous Recognition" >> nyet +-----+ https://cloud.google.com/speech-to-text/docs/speech-to-text-requests All Speech-to-Text API synchronous recognition requests must include a speech recognition config field (of type RecognitionConfig). A RecognitionConfig contains the following sub-fields: encoding - (required) specifies the encoding scheme of the supplied audio (of type AudioEncoding). If you have a choice in codec, prefer a lossless encoding such as FLAC or LINEAR16 for best performance. (For more information, see Audio Encodings.) The encoding field is optional for FLAC and WAV files where the encoding is included in the file header. sampleRateHertz - (required) specifies the sample rate (in Hertz) of the supplied audio. (For more information on sample rates, see Sample Rates below.) The sampleRateHertz field is optional for FLAC and WAV files where the sample rate is included in the file header. languageCode - (required) contains the language + region/locale to use for speech recognition of the supplied audio. The language code must be a BCP-47 identifier. Note that language codes typically consist of primary language tags and secondary region subtags to indicate dialects (for example, 'en' for English and 'US' for the United States in the above example.) (For a list of supported languages, see Supported Languages.) maxAlternatives - (optional, defaults to 1) indicates the number of alternative transcriptions to provide in the response. By default, the Speech-to-Text API provides one primary transcription. If you wish to evaluate different alternatives, set maxAlternatives to a higher value. Note that Speech-to-Text will only return alternatives if the recognizer determines alternatives to be of sufficient quality; in general, alternatives are more appropriate for real-time requests requiring user feedback (for example, voice commands) and therefore are more suited for streaming recognition requests. profanityFilter - (optional) indicates whether to filter out profane words or phrases. Words filtered out will contain their first letter and asterisks for the remaining characters (e.g. f***). The profanity filter operates on single words, it does not detect abusive or offensive speech that is a phrase or a combination of words. speechContext - (optional) contains additional contextual information for processing this audio. A context contains the following sub-fields: boost - contains a value that assigns a weight to recognizing a given word or phrase. phrases - contains a list of words and phrases that provide hints to the speech recognition task. For more information, see the information on speech adaptation. Audio is supplied to Speech-to-Text through the audio parameter of type RecognitionAudio. The audio field contains either of the following sub-fields: content contains the audio to evaluate, embedded within the request. See Embedding Audio Content below for more information. Audio passed directly within this field is limited to 1 minute in duration. uri contains a URI pointing to the audio content. The file must not be compressed (for example, gzip). Currently, this field must contain a Google Cloud Storage URI (of format gs://bucket-name/path_to_audio_file). See Passing Audio reference by a URI below. More information on these request and response parameters appears below. +-----+ https://cloud.google.com/speech-to-text/docs/async-recognize +-----+ https://cloud.google.com/storage/docs/creating-buckets Used : MyAccount, enabled billing Bucket name : BillHowell Name: howell Location: us (multiple regions in United States) Location type: Multi-region Set a default class: Standard Enforce public access prevention on this bucket Access control Uniform Choose how to protect object data Protection tools: None Data encryption : Google-managed encryption key >> it was created https://console.cloud.google.com/storage/browser/howell;tab=objects?project=prefab-range-410420&prefix=&forceOnObjectsSortingFiltering=false Objects -> UpLoad files -> auto gives window "Uploads and My First Project operations" "$d_web"'ProjMini/Kaal- Structured Atom Model/vidAudios/Edwin Kaal: The Proton-Electron Atom.mp3.cut.mp3' >> very long upload (39.8 Mb) <10 minutes? +-----+ https://cloud.google.com/speech-to-text/docs/transcribe-console Audio configuration 1. Open the Speech-to-Text overview. https://console.cloud.google.com/speech 2. Click Create transcription. : I chose "Try this API" +--+ https://console.cloud.google.com/storage/browser/howell;tab=objects?project=prefab-range-410420&prefix=&forceOnObjectsSortingFiltering=false I clicked down-arrow beside filename : https://ff6ea6f6a3fe32a1a35b7b40346cac236dd4808f2106e47dacb40dc-apidata.googleusercontent.com/download/storage/v1/b/howell/o/Edwin%20Kaal:%20The%20Proton-Electron%20Atom.mp3.cut.mp3?jk=AanfhSD1zev6enWgFjJ4mJwRl5NuZyhvdYg4nNz3yLS8rOqXoSezNKfAhPoiFsdPaCPvLXICkqw2swxUvfZvEULDYfmjyusyWwvHXBzKz7UeoWvZFd9HUEpR6qCjiOnEAg0Yf37jACVJOsU9Wh_wGGxle7AzuQKkCiW0lhZWN7sW_qN81zNKZN9DXB9fIqb32Zmj9Gp_iftySREPevDS_yeTiB4uOf1wiNMY4Y6ivnll7GYHu8IobJ0drqRfjMnze9SVmRlTTD2X3GJrbYysu53rvuDG1Ohl8qICub9p1G803VlMu1icutmbFNx2NZoyIFcLS9VQedi0UfTrz9MfnfoSI4XAByvaIc-hJQwnQsmmpG0wFetW1QS_IoaraNtXoAVIEu63c8811Hv2J3dy3t7mb7hlVnEXKDWV0rmZC31lb_xDtXLSs2VUCxl7Hdsi8GYoAtdtBlOxbu8oGVAMBRhV6LSzUIkZcCBlbjEIqGmclK8Vo-dyYeps6bjKJfeaxVg2pO8x8hYiHxxebu3s0bZ3BnZz618IUUjPMpmVGHoM5ZL9pweu4I9Bl4hm6jlj4x0lw58CUi4jwHMXHJrmR1bbYnRQxTH0n1HM6cqwifv43J5GvTX-QxQU_ETeQhXwNBBQRzYOPMFY8F_JJs45XgV9BlA2he1cU1J9XeGMgOY6mzF8glGt4_H0oaGG3ZoZbCFEHoCEeI6H4wEM5D9XhweaBR9XWSBQV-WW4Vs3_t_GOmMAlopM7mcEHKc1VMkdkiEsMv-B_G1xoiX6WEKbTiQ4lmvtZo-XjnTSj9vd1_PVifkoEMw01ZH7X5WBmwvPYZhR3oWxzybzmImlG3DvPSkX8ZSSC6-hDV7HlfUnlEJo825ElW_BFzQz4ASsa5ccNsPuM0xb-i98tzATyNSG7Z9C1uZtRmRco0SZUsCvRg2Y6eKbRGl1-S_hKTsHVqzp8u2w4DDFSSIhT6E4r9HYD79zxGcfkoFnPEg1JKOUcuLJucGeek5Yb5g3Gbn3OL3oP6kVmJC1bdmUmTFDSCEr34DKbkkcttoKETp_Nh-xO8ZnH-tivdGgFyQwIUsla_Eo4SjcucQQS1iW02QpY8_P_1ka7efBTFlgDl7ZlH3yx619imLW8mcGKH00FA&isca=1 >> audio is very choppy, many dead spots >> I closed that window +--+ try to find text-to-speech https://console.cloud.google.com/speech/overview?project=prefab-range-410420 Avanced transcription, powered by Google’s AI Accurately convert speech into text and find the best configuration for your audio >> I activated my cloud >> Enable API : done >> Create Transcription: clicked button >> Create a new workspace Selected "howell" folder click "Create" workspace = howell +-----+ https://console.cloud.google.com/speech/transcriptions/list?project=prefab-range-410420 1. Audio Configuration Choose an audio file : Local upload Audio file : selected local "$d_web"'ProjMini/Kaal- Structured Atom Model/vidAudios/Edwin Kaal: The Proton-Electron Atom.mp3.cut.mp3' had to switch to cloud, then it accepted Enable separate recognition per channel : not selected 2. Transcription options API version: 1 Spoken language: US English Transcription model: long region: global did not use advanced : [3 alternative spoken languages, Filter profanities, Automatic punctuation, etc] 3. Model adaptation did not turn this on : COULD be useful in future 14:54 clicked "Submit" or something 14:57 finished "Uploads and My First Project operations" translator file still cycling - presumably translating should take 1/2 of voice file length 42:02 = 21 minutes <15:18 seems to be finished? where are trascripts stored? hit "refresh" button on bucket details webPage (same as ran transcription) got 3 new [dor, fil]s : audio-files/ generated_workspace_file.json 2 KB text/plain Jan 6, 2024, 3:17:11 PM Standard Jan 6, 2024, 3:17:11 PM Not public transcripts/ click on "transcripts" folder https://console.cloud.google.com/storage/browser/_details/howell/transcripts/Edwin%20Kaal:%20The%20Proton-Electron%20Atom.mp3.cut_transcript_65b68b98-0000-2f49-9bfb-582429cb079c.json?pageState=(%22StorageObjectListTable%22:(%22f%22:%22%255B%255D%22))&project=prefab-range-410420 what do I do with a json file? click "download arrow" save file, no choice, it wrote to ~//home/bill/Downloads/transcripts_Edwin Kaal The Proton-Electron Atom.mp3.cut_transcript_65b68b98-0000-2f49-9bfb-582429cb079c.json nemo filMgr cut, paste to "$d_web"'ProjMini/Kaal- Structured Atom Model/vidAudios/transcripts_Edwin Kaal The Proton-Electron Atom.mp3.cut_transcript_65b68b98-0000-2f49-9bfb-582429cb079c.json' json contents are way over the top!! need a script to assemble transcript!! try $ cat "$d_web"'ProjMini/Kaal- Structured Atom Model/transcripts_Edwin Kaal The Proton-Electron Atom.mp3.cut_transcript_65b68b98-0000-2f49-9bfb-582429cb079c.json' | sed s'|\"word\":\"\(.*\)\"\,\"confidence\"|\1 |g' >"$d_web"'ProjMini/Kaal- Structured Atom Model/vidAudios/Edwin Kaal: The Proton-Electron Atom, transcript.txt' https://unix.stackexchange.com/questions/121718/how-to-parse-json-with-shell-scripting-in-linux How to parse JSON with shell scripting in Linux? Asked 9 years, 9 months ago Modified 8 months ago Viewed 522k times >> other editors, apps, not bash cut into lines, then pick cherrys: $ cat "$d_web"'ProjMini/Kaal- Structured Atom Model/vidAudios/transcripts_Edwin Kaal The Proton-Electron Atom.mp3.cut_transcript_65b68b98-0000-2f49-9bfb-582429cb079c.json' | tr '{' \\n | sed s'|\"word\":\"\(.*\)\"\,\"confidence\"|\1 |g' >"$d_web"'ProjMini/Kaal- Structured Atom Model/Kaal: The Proton-Electron Atom, transcript.txt' $ preCde='\"startOffset\":\"36.700s\"\,\"endOffset\":\"36.800s\"\,\"word\":\"' $ pstCde='\"\,\"confidence\":0.94012266\}\,' $ cat "$d_web"'ProjMini/Kaal- Structured Atom Model/vidAudios/transcripts_Edwin Kaal The Proton-Electron Atom.mp3.cut_transcript_65b68b98-0000-2f49-9bfb-582429cb079c.json' | tr '{' \\n | sed s"|$preCde\(.*\)$pstCde|\1 |g' >"$d_web"'ProjMini/Kaal- Structured Atom Model/Kaal: The Proton-Electron Atom, transcript.txt' >> oops, missing an `" $ cat "$d_web"'ProjMini/Kaal- Structured Atom Model/vidAudios/transcripts_Edwin Kaal The Proton-Electron Atom.mp3.cut_transcript_65b68b98-0000-2f49-9bfb-582429cb079c.json' | tr '{' \\n | sed s"|$preCde\(.*\)$pstCde|\1 |g" >"$d_web"'ProjMini/Kaal- Structured Atom Model/Kaal: The Proton-Electron Atom, transcript.txt' $ cat "$d_web"'ProjMini/Kaal- Structured Atom Model/vidAudios/transcripts_Edwin Kaal The Proton-Electron Atom.mp3.cut_transcript_65b68b98-0000-2f49-9bfb-582429cb079c.json' | tr '{' \\n | sed s"|\(.*\)\"word\":\"\(.*\)\"\,\"confidence\(.*\)|\2 |g" >"$d_web"'ProjMini/Kaal- Structured Atom Model/Kaal: The Proton-Electron Atom, transcript.txt' >> good, now make into lines of 20 words at a time, with timestamps? # enddoc