"$d_Qndfs""email analysis notes.txt" www.BillHowell.ca 12Jun2021 initial see also "$d_Qndfs""dictionaries/dictionary notes.txt" +-----+ ToDos 16Jun2021 -disrupted words! - simple stats on associations like Robert Hecht-Nielson, but damaged words but : higher-dimesions of context, calculus of wwods, fractional order claculus of words 18Jun2021 getHead_from_blankLines - this only chops up 1st line following [From:, To:, Cc:] ; getHead_from_blankLines - it appears that `, sometimes is the head separator, rather than semi-colon ; 02Jul2021 replace disclaimer with shorter note {|NIAID disclaimer|}, can re-substitute at a later stage 02Jul2021 Data [prep, clean] is vastly more [creative, challenging, dimensional] than linguistics. ALL of linguistics, including [sensory, brain] processes (of which Cognition and Consciousness are only a small part) is only a subset of the data prep challenge (recurrent definitions, undefined concepts and ,...). Perhaps arrogance, and more to the point, stupidity, prevent us seeing that context (or maybe it is just me). This perspective is reminiscent of theat distortion that "statistics explains all". 02Jul2021 First stage protect {|phrases, acronyms, parenthesized|} 02Jul2021 All normal words have capitalised version - more efficient like that 24************************24 08******08 29Jun2021 pDicAll use in pTxt_pDic_extract_pfrag_pFragSubs see "$d_Qndfs""dictionary notes.txt" +-----+ qnial> loaddefs link d_Qndfs 'emails - convert pdfCompilation to text.ndf' qnial> pEmails_doALL 08********08 18Jun2021 The problem seems to be that the initial sedExpr arent doing their jobs!! +-----+ infinite loops from +.....+ fullHeadL := 'From: ' 'Date: ' 'To: ' 'Cc: ' 'Subject: ' ; head := 'From: ' ; ... IF (OR ('From: ' 'To: ' 'Cc: ' EACHLEFT = head) ) ... IF ('Subject: ' = head) THEN out_header := l ; ENDIF ; +.....+ >> I had already checked [sed_messyStuff, sed_formatHeads, sed_orgAcronym] sed_messyStuff seemed OK sed_formatHeads not OK sed_orgAcronym notOK >> focus on sed_formatHeads for now, as part of sed_pdftotext NUTSSS!!!! pEmails_doALL change : +.....+ pPdf_convertTo_pTxt p_emailsPdf p_1stClean sed_messyStuff ; +.....+ To : +.....+ pPdf_convertTo_pTxt p_emailsPdf p_1stClean sed_pdftotext ; +.....+ qnial> pEmails_doALL -> pPdf_convertTo_pTxt : >> didn't work Added pattern (I somehow lost this, probably while checking) : sed_orgAcronym := '(NIH/NIAIO)' ';s/[({]N[1IJlTf]H\/N[I1JlTf]A[I1JlTf]O[D0][)}]/(NIH\/NIAIO)' qnial> pEmails_doALL -> pPdf_convertTo_pTxt : sed: -e expression #1, char 106: unknown option to `s' >> I removed sed_formatHeads, same error >> I removed sed_orgAcronym - works with [sed_messyStuff, sed_formatHeads] Somehow, I put an error into sed_orgAcronym I put back into sed_messyStuff 'strange sequences of punctuation' ';s/[ !:-_=~\.\"\*]\{5,}//' >> doesn't work, but leave it in for now sed_orgAcronym problem >> couldn't solve, leave it for later sed_formatHeads isn't removing spaces sed_formatHeads change, 18Jun2021 wouldn't work : +.....+ 'Date: split' ';s/D[ ]\?a[ ]\?t[ ]\?e[ ]\?:/Date: /I' 'Sent: split' ';s/S[ ]\?e[ ]\?n[ ]\?t[ ]\?:/Date: /I' 'From: split' ';s/F[ ]\?r[ ]\?o[ ]\?m[ ]\?:/From: /I' 'To: split' ';s/T[ ]\?o[ ]\?:/To: /I' 'Cc: split' ';s/C[ ]\?c[ ]\?:/Cc: /I' 'Subject: split' ';s/S[ ]\?u[ ]\?b[ ]\?j[ ]\?e[ ]\?c[ ]\?t[ ]\?:/Subject: /I' +.....+ To : +.....+ 'Date: split' ';s/Date:/Date: /I' 'Sent: split' ';s/Sent:/Date: /I' 'From: split' ';s/From:/From: /I' 'To: split' ';s/To:/To: /I' 'Cc: split' ';s/Cc:/Cc: /I' 'Subject: split' ';s/Subject:/Subject: /I' 'Date: split' ';s/Date :/Date: /I' 'Sent: split' ';s/Sent :/Date: /I' 'From: split' ';s/From :/From: /I' 'To: split' ';s/To :/To: /I' 'Cc: split' ';s/Cc :/Cc: /I' 'Subject: split' ';s/Subject :/Subject: /I' +.....+ >> nyet, maybe it's the I option? 18Jun2021 search "maximum length of a sed expression" https://www.unix.com/shell-programming-and-scripting/263413-sed-255-character-limitation.html sed 255 Character Limitation for : sed_pdftotext := link sed_messyStuff sed_formatHeads ; qnial> gage shape sed_pdftotext [,] 274 >> OKK, as I suspected I have to split the expressions! qnial> loaddefs link d_Qndfs 'email analysis - Fauci corona virus header.ndf' qnial> EACH (gage shape) sed_messyStuff sed_formatHeads sed_orgAcronym1 sed_orgAcronym2 sed_orgAcronym3 sed_orgAcronym4 qnial> EACH (gage shape) sed_messyStuff sed_formatHeads sed_orgAcronym1 sed_orgAcronym2 sed_orgAcronym3 sed_orgAcronym4 65 234 153 125 86 129 >> combine [sed_orgAcronym2, sed_orgAcronym3] & renumber +-----+ p_messy := link d_temp 'p_messy temp.txt' ; p_formatHeads := link d_temp 'p_formatHeads temp.txt' ; p_orgAcronym1 := link d_temp 'p_orgAcronym1 temp.txt' ; p_orgAcronym2 := link d_temp 'p_orgAcronym2 temp.txt' ; p_orgAcronym3 := link d_temp 'p_orgAcronym3 temp.txt' ; pEmails_doALL IS { NONLOCAL d_emails reference_Fauci title_contacts title_Subjects p_contacts p_emailsPdf p_fixBodys p_fixHeads p_labelDates p_pdftotext p_subjects sed_getContacts sed_dEmails sed_getSubject p_messyStuff p_formatHeads p_orgAcronym1 p_orgAcronym2 sed_messyStuff sed_formatHeads sed_orgAcronym1 sed_orgAcronym2 sed_orgAcronym3 % pPdf_convertTo_pTxt p_emailsPdf p_pdftotext ; % messyStuff p_pdftotext p_messyStuff sed_messyStuff ; % formatHeads p_messyStuff p_formatHeads sed_formatHeads ; orgAcronym1 p_formatHeads p_orgAcronym1 sed_orgAcronym1 ; orgAcronym2 p_orgAcronym1 p_orgAcronym2 p_orgAcronym2 ; orgAcronym3 p_orgAcronym2 p_orgAcronym3 sed_orgAcronym3 ; % pEmails_labelDates_pOut p_orgAcronym3 p_labelDates ; % pEmails_fixHeads_pout p_labelDates p_fixHeads ; % host link 'cp -p "' p_fixHeads '" "' p_emailsClean '"' ; +-----+ qnial> pEmails_doALL -> pPdf_convertTo_pTxt : -> p_messyStuff : sed: -e expression #1, char 65: Invalid range end -> p_formatHeads : ?formatHeads error, file unknown, one of : /media/bill/ramdisk/p_messy temp.txt /media/bill/ramdisk/p_pdftotext temp.txt -> p_orgAcronym1 : -> p_orgAcronym2 : sed: can't find label for jump to `emp.txt' -> p_orgAcronym3 : qnial> pEmails_doALL -> p_messyStuff : sed: -e expression #1, char 65: Invalid range end -> p_formatHeads : sed: -e expression #1, char 66: unknown option to `s' -> p_orgAcronym1 : -> p_orgAcronym2 : sed: can't find label for jump to `emp.txt' -> p_orgAcronym3 : qnial> 60 take sed_messyStuff s/ //;s/<\(.*\)@/<@/;s/<@[ ]*\(.*\)>/<@\1>/;s/[!:-_=~\.\*]\{ qnial> 60 drop sed_messyStuff 5,}// qnial> 60 drop sed_formatHeads 5,}//s/ //;s/<\(.*\)@/<@/ first, work on messyStuff 'strange sequences of punctuation' ';s/[ !:-_=~\.\"\*]\{5,\}//' >> I was missing `\ in front of `} qnial> pEmails_doALL -> p_messyStuff : sed: -e expression #1, char 68: Invalid range end qnial> 65 drop sed_messyStuff \}// 'strange sequences of punctuation' ';s/!:-_=~\{5,\}//' >> OK, works without [.,*] qnial> pEmails_doALL -> p_formatHeads : sed: -e expression #1, char 61: unknown option to `s' qnial> 60 drop sed_formatHeads s/ //;s/<\(.*\)@/<@/ >> this is in sed_messyStuff, not sed_formatHeads??? I had 'tble' iead of 'tbl' in several places >> OK, now it runs qnial> pEmails_doALL -> p_orgAcronym1 : sed: -e expression #1, char 63: unknown option to `s' -> p_orgAcronym2 : sed: can't find label for jump to `emp.txt' -> p_orgAcronym3 : qnial> 60 drop sed1_orgAcronym >> it's empty???? Why? Maybe is a significant chr length for QNial symbols? '(NIH/NIAIO)' 's/[({]N[I1JlTf]H\/N[I1JlTf]A[I1JlTf]O[D0][)}]/(NIH\/NIAIO)' had extra space at beginning of sedExpr from time they were unified! qnial> pEmails_doALL -> p_orgAcronym1 : sed: -e expression #1, char 62: unknown option to `s' -> p_orgAcronym2 : -> p_orgAcronym3 : time for a break +-----+ oe code 18Jun2021 search "maximum length of a sed expression" https://www.unix.com/shell-programming-and-scripting/263413-sed-255-character-limitation.html sed 255 Character Limitation for : sed_pdftotext := link sed_messyStuff sed_formatHeads ; qnial> gage shape sed_pdftotext [,] 274 >> OKK, as I suspected I have to split the expressions! IF flag_debug THEN write 'loading organization sed_pdftotext' ; ENDIF ; #] sed_pdftotext reformat, pdftotext % sed_pdftotext := link sed_messyStuff sed_formatHeads sed_orgAcronym ; % sed_pdftotext := link sed_messyStuff sed_orgAcronym ; sed_pdftotext := link sed_messyStuff sed_formatHeads ; 08********08 18Jun2021 doesn't work : getHead_from_blankLines IS OP finn fout +-----+ run a break trace +--+ WHILE (NOR ('Subject: ' = head) (isfault (line := readfile finn)) ) DO IF (NOR (str_isOf_chrSet line chrs_emailBlanks) (str_isBlank line) ) THEN IF (OR (fullHeadL EACHLEFT = line)) THEN IF flag_break THEN BREAK ; ENDIF ; head := line ; ELSE IF (OR ('Date: ' 'To: ' 'Cc: ' EACHLEFT = head) ) THEN lineL := link lineL (EACHRIGHT link head (line str_cutBy_chr `;)) ; IF flag_break THEN BREAK ; ENDIF ; IF (isstring lineL) THEN lineL := link lineL [link head line] ; ENDIF ; ELSE lineL := [link head line] ; ENDIF ; ENDIF ; ENDIF ; ENDWHILE ; +--+ qnial> pEmails_doALL -> pEmails_fixHeads_pout : >> conditions not even triggered +--+ IF flag_break THEN BREAK ; ENDIF ; WHILE (NOR ('Subject: ' = head) (isfault (line := readfile finn)) ) DO IF (NOR (str_isOf_chrSet line chrs_emailBlanks) (str_isBlank line) ) THEN IF (OR (fullHeadL EACHLEFT = line)) THEN IF flag_break THEN BREAK ; ENDIF ; head := line ; ELSE IF (OR ('Date: ' 'To: ' 'Cc: ' EACHLEFT = head) ) THEN lineL := link lineL (EACHRIGHT link head (line str_cutBy_chr `;)) ; IF flag_break THEN BREAK ; ENDIF ; IF (isstring lineL) THEN lineL := link lineL [link head line] ; ENDIF ; ELSE lineL := [link head line] ; ENDIF ; ENDIF ; ENDIF ; ENDWHILE ; +--+ qnial> pEmails_doALL -> pEmails_fixHeads_pout : >> conditions still not even triggered problem is in pEmails_fixHeads_pout +--+ WHILE (NOT isfault (line := readfile finn)) DO IF flag_break THEN BREAK ; ENDIF ; IF (fromStd = (9 take line)) THEN IF (str_isOf_chrSet (9 drop line) chrs_emailBlanks) THEN IF (line = fromStd) THEN getHead_from_blankLines finn fout ; ELSE writefile fout line ; ENDIF ; ENDIF ; ELSE writefile fout line ; ENDIF ; ENDWHILE ; +--+ >> problem : getHead_from_blankLines change : +.....+ IF (OR ('Date: ' 'To: ' 'Cc: ' EACHLEFT = head) ) +.....+ To : +.....+ IF (OR ('From: ' 'To: ' 'Cc: ' EACHLEFT = head) ) +.....+ Also put in out_header check if line for Subject: has been used +--+ out_header := o ; % ; IF flag_break THEN BREAK ; ENDIF ; WHILE (NOR out_header (isfault (line := readfile finn)) ) DO IF (NOR (str_isOf_chrSet line chrs_emailBlanks) (str_isBlank line) ) THEN IF (OR (fullHeadL EACHLEFT = line)) THEN head := line ; ELSE % catenate 2nd+ addressees ; IF (OR ('From: ' 'To: ' 'Cc: ' EACHLEFT = head) ) THEN lineL := link lineL (head EACHRIGHT link (line str_cutBy_chr chr_semicolon)) ; ELSE lineL := link lineL [head link line] ; ENDIF ; ENDIF ; IF ('Subject: ' = head) THEN out_header := l ; ENDIF ; ENDIF ; ENDWHILE ; fout EACHRIGHT writefile ' ' ' ' '{|:********:|}' [lineL] ' ' ; +--+ qnial> pEmails_doALL -> pEmails_fixHeads_pout : >> getHead_from_blankLines is STILL not called?!?!? Oops pEmails_fixHeads_pout change : +.....+ IF (str_isOf_chrSet (9 drop line) chrs_emailBlanks) THEN +.....+ To : +.....+ IF ('From: ' = line) THEN +.....+ Hah! my own joke is on me! From: originals have not yet been put into stdForm pEmails_fixHeads_pout change : +.....+ IF ('From: ' = line) THEN getHead_from_blankLines finn fout ; +.....+ To : +.....+ IF ('From:' = line) THEN getHead_from_blankLines finn fout ; +.....+ qnial> pEmails_doALL -> pEmails_fixHeads_pout : ^C >> nuts, infinite loop or something... getHead_from_blankLines change out_header : +.....+ out_header := o ; % ; IF flag_break THEN BREAK ; ENDIF ; WHILE (NOR out_header (isfault (line := readfile finn)) ) DO IF (NOR (str_isOf_chrSet line chrs_emailBlanks) (str_isBlank line) ) THEN IF (OR (fullHeadL EACHLEFT = line)) THEN head := line ; ELSE % catenate 2nd+ addressees ; IF (OR ('From: ' 'To: ' 'Cc: ' EACHLEFT = head) ) THEN lineL := link lineL (head EACHRIGHT link (line str_cutBy_chr chr_semicolon)) ; ELSE lineL := link lineL [head link line] ; ENDIF ; ENDIF ; IF ('Subject: ' = head) THEN out_header := l ; ENDIF ; ENDIF ; ENDWHILE ; +.....+ To : +.....+ out_header := o ; % ; IF flag_break THEN BREAK ; ENDIF ; WHILE (NOR out_header (isfault (line := readfile finn)) ) DO IF (NOR (str_isOf_chrSet line chrs_emailBlanks) (str_isBlank line) ) THEN IF (OR (fullHeadL EACHLEFT = line)) THEN head := line ; ELSE % catenate 2nd+ addressees ; IF (OR ('From: ' 'To: ' 'Cc: ' EACHLEFT = head) ) THEN lineL := link lineL (head EACHRIGHT link (line str_cutBy_chr chr_semicolon)) ; ELSE lineL := link lineL [head link line] ; ENDIF ; ENDIF ; IF ('Subject: ' = head) THEN out_header := l ; ENDIF ; ENDIF ; ENDWHILE ; +.....+ >> I forgot that ALL headers are without spaces in raw file : getHead_from_blankLines change : +.....+ fullHeadL := 'From: ' 'Date: ' 'To: ' 'Cc: ' 'Subject: ' ; head := 'From:' ; IF (OR ('From: ' 'To: ' 'Cc: ' EACHLEFT = head) ) IF ('Subject: ' = head) THEN out_header := l ; ENDIF ; +.....+ To : +.....+ fullHeadL := 'From:' 'Date:' 'To:' 'Cc:' 'Subject:' ; head := 'From:' ; IF (OR ('From:' 'To:' 'Cc:' EACHLEFT = head) ) IF ('Subject:' = head) THEN out_header := l ; ENDIF ; +.....+ >> still nothing!!?? Wow! I totally suck at programming. SHHITE! WRONG! These were correct BUT sed_formatHeads failed! fullHeadL := 'From: ' 'Date: ' 'To: ' 'Cc: ' 'Subject: ' ; head := 'From:' ; IF (OR ('From: ' 'To: ' 'Cc: ' EACHLEFT = head) ) IF ('Subject: ' = head) THEN out_header := l ; ENDIF ; >> go back to pPdf_convertTo_pTxt did ANY of the sedExprs work? chrs_emailBlanks := link ' ;.!:-_=~"[]E' chr_apo chr_tab ; >> I though that I had gotten rid of ["'.]? probably only in getHead_from_blankLines? But chrs_emailBlanks is ONLY used by QNial, not by sed. Leave as is for now. Doesn't look like the the sedExprs worked. +--+ sed_messyStuff : search Example OK 's/ //' OK ';s/<\(.*\)@/<@/' <.+@ nothing OK ';s/<@[ ]*\(.*\)>/<@\1>/' <@.+> <@cepi.net>, <@gl oba ldownsynd rome.org> looks good, but only removes a single space after `@ sed_formatHeads ';s/D[ ]*a[ ]*t[ ]*e[ ]*:/Date: /I' OK 'D[ ]\+a[ ]+t[ ]\+e[ ]\+:' nothing OK 'D ate:' nothing (case sensitive search) sed_orgAcronym ';s/[({]N[1IJlTf]H\/CC\/[D0]LM[)}]/(NIH\/CC\/DLM)/' NO '[({]N[1IJlTf]H\/CC\/[D0]LM[)}]' {NIH/CC/DLM) +--+ >> so it appears that all but sed_orgAcronym worked sed_xtraHeads := 'extra Date:' ' s/|\(.+\)\{5,\}Date: /\1\nDate: /I' 'extra From:' ';s/|\(.+\)\{5,\}From: /\1\nFrom: /I' 'extra To: ' ';s/|\(.+\)\{5,\}To: /\1\nTo: /I' 'extra Cc: ' ';s/|\(.+\)\{5,\}Cc: /\1\nCc: /I' 'extra Subject:' ';s/|\(.+\)\{5,\}Subject: /\1\nSubject: /I' >> these make no sense - AH HAH for leading spaces in lines?? nyet, these are retained sed_formatHeads is NOT specific to begin-of-line dangerous deletion of legitimate content preceding heads!!! >> get rid of it pEmails_fixHeads_pout is even being run? >> failof args? oops - forgot to remove optr change : +.....+ pEmails_fixHeads_pout IS OP pinn pout sedExpr ... IF (NOT AND (EACH path_exists ("p_old pout) ("p_new pout))) THEN EACH write '?pEmails_fixHeads_pout error, file unknown, one of : ' pout pout '' ; ELSE host link 'sed "' sedExpr '" "' pout '" >"' pout '" ' ; ENDIF ; +.....+ To : +.....+ pEmails_fixHeads_pout IS OP pinn pout +.....+ qnial> pEmails_doALL -> pEmails_fixHeads_pout : ------------------------------------------------------------- Break debug loop: enter debug commands, expressions or type: resume to exit debug loop executes the indicated debug command current call stack : pemails_doall pemails_fixheads_pout ------------------------------------------------------------- -->[stepv] foff o -->[stepv] resume ^C >> it went into pemails_fixheads_pout but not getHead_from_blankLines Why? Hmm getHead_from_blankLines change : +.....+ IF ('Subject:' = head) THEN out_header := l ; ENDIF ; +.....+ To : +.....+ IF ('Subject:' = head) THEN out_header := l ; ENDIF ; +.....+ qnial> pEmails_doALL -> pEmails_fixHeads_pout : ------------------------------------------------------------- Break debug loop: enter debug commands, expressions or type: resume to exit debug loop executes the indicated debug command current call stack : pemails_doall pemails_fixheads_pout ------------------------------------------------------------- -->[stepv] foff o -->[stepv] resume ^C >> sstoesn't go into getHead_from_blankLines oops - I needed to pEmails_fixHeads_pout change : +.....+ IF ('Subject:' = head) THEN out_header := l ; ENDIF ; +.....+ To : +.....+ IF ('Subject:' = head) THEN out_header := l ; ENDIF ; +.....+ <|:*********14:10 pEmails_doALL >> still won't go into getHead_from_blankLines optr change : +.....+ fullHeadL := 'From: ' 'Date: ' 'To: ' 'Cc: ' 'Subject: ' ; head := 'From:' ; ... IF (OR ('From: ' 'To: ' 'Cc: ' EACHLEFT = head) ) ... IF ('Subject: ' = head) THEN out_header := l ; ENDIF ; +.....+ To : +.....+ fullHeadL := 'From:' 'Date:' 'To:' 'Cc:' 'Subject:' ; head := 'From:' ; ... IF (OR ('From:' 'To:' 'Cc:' EACHLEFT = head) ) ... IF ('Subject:' = head) THEN out_header := l ; ENDIF ; +.....+ > I thought that I had fixed 'Subject: '??? >> pEmails_fixHeads_pout temp.txt now 5.9Mb, so it may haorked? *********:|> pEmails_fixHeads_pout temp.txt : +--+ From: Haskins, Melinda (NIH/NIAID) [El Sent: Thursday, March 5, 2020 9:53 AM (b)( ------~= > To: Fauci, Anthony (NIH/NIAID) [E] Cc:Selgrade, Sara (NIH/NIAID) [E] ; Crawford, Chase (NIH/NIAID) [E] (b)(6)>; Conrad, Patricia (NIH/NIAID) [E] (b)(6) Subject: Please review: House Oversight Letter on Coronavirus Diagnost ics +--+ >> spaces are not put in >> OK - at present, headLines are not treated +--+ Se nt : Thursday, March 5, 2020 9:53 AM +--+ >> sed_formatHeads is not fixing heads NUTS!!! reverse all changes <|:*********14:10 ... *********:|> >> it's because I wasn't running [pPdf_convertTo_pTxt, pEmails_labelDates_pOut] The problem seems to be that the initial sedExpr arent doing their jobs!! +-----+ loaddefs link d_Qndfs 'emails - generic optrs.ndf' getHead_from_blankLines optr change : +.....+ +.....+ To : +.....+ +.....+ pEmails_doALL : { NONLOCAL p_emailsClean p_emailsPdf p_fixBodys p_fixHeads p_labelDates p_pdftotext d_emails p_contacts p_subjects sed_dEmails sed_getContacts sed_getSubject sed_messyStuff sed_xtraHeads reference_Fauci title_contacts title_Subjects ; pPdf_convertTo_pTxt p_emailsPdf p_1stClean sed_messyStuff ; pEmails_labelDates_pOut p_1stClean p_labelDates ; pEmails_fixHeads_pout p_labelDates p_fixHeads sed_xtraHeads ; host link 'cp -p "' p_fixHeads '" "' p_emailsClean '"' ; pEmails_get_pContacts p_emailsClean p_contacts sed_getContacts title_contacts reference_Fauci ; pEmails_get_pSubjects p_emailsClean p_subjects sed_getSubject title_Subjects reference_Fauci ; +-----+ olde code IF flag_debug THEN write 'loading sed_xtraHeads' ; ENDIF ; #] sed_xtraHeads - newli\n) for "internal" [Date, To, From, Cc, Subject], pEmails_fixHeads_pout sed_xtraHeads := 'extra Date:' ' s/|\(.+\)\{5,\}Date: /\1\nDate: /I' 'extra From:' ';s/|\(.+\)\{5,\}From: /\1\nFrom: /I' 'extra To: ' ';s/|\(.+\)\{5,\}To: /\1\nTo: /I' 'extra Cc: ' ';s/|\(.+\)\{5,\}Cc: /\1\nCc: /I' 'extra Subject:' ';s/|\(.+\)\{5,\}Subject: /\1\nSubject: /I' ; n_cols := 2 ; n_rows := (floor ((gage shape sed_xtraHeads) / 2)) ; sed_xtraHeads := link second cols (n_rows n_cols reshape sed_xtraHeads) ; 08********08 17Jun2021 p_emailsClean not generated +-----+ getHead_from_blankLines : I now want the head to be linked to each "extra" line found - hard to do with anyhead possibly having multiple lines - only Subject: force write of remining intervening lines - but mixups occur in [To, Cc] as well just it! #] +-----+ #] getHead_from_blankLines change : #] +.....+ getHead_from_blankLines IS OP finn fout { LOCAL head headL lineL testHeadL ; NONLOCAL chrs_emailBlanks fullHeadL ; % ; testHeadL := 'Date: ' 'To: ' 'Cc: ' 'Subject: ' ; headL := ['From: '] ; head := 'From: ' ; lineL := null ; % ; WHILE (NOR ('<:done:>' = testHeadL) (isfault (line := readfile finn)) ) DO IF (NOR (str_isOf_chrSet line chrs_emailBlanks) (str_isBlank line) ) THEN IF (OR (in_list := (testHeadL EACHLEFT subStr_in_str line))) THEN % head will be used below for Subject: condition ; head := in_list sublist testHeadL ; headL := link headL head ; testHeadL := (NOT in_list) sublist testHeadL ; ELSE IF (null = headL) THEN i_lastLine := (gage shape lineL) - 1 ; lineL@i_lastLine := link lineL@i_lastLine line ; ELSE lineL := link lineL [line] ; line lineL := [first, rest] lineL ; head headL := [first, rest] headL ; ENDIF ; IF (AND ('Subject: ' EACHRIGHT (NOT in) testHeadL headL)) THEN testHeadL := '<:done:>' ; ENDIF ; ENDIF ; ENDIF ; IF (AND ('Subject: ' EACHRIGHT (NOT in) testHeadL headL)) THEN testHeadL := '<:done:>' ; ENDIF ; ENDIF ; ENDIF ; ENDWHILE ; % ; headLine := headL EACHBOTH link lineL ; % note that these file handles are NOT closed - they will still be used by pEmails_fixHeads_pout, ; % which will call this optr getHead_from_blankLines many times ; % EACH write headLine ; % write ' ' ; fout EACHRIGHT writefile headLine ; fout writefile ' ' ; } #] +.....+ #] To : #] +.....+ getHead_from_blankLines IS OP finn fout { LOCAL head lineL ; NONLOCAL chrs_emailBlanks fullHeadL ; % ; % 16Jul2021 this no loger tracks "depletion" of testHeadL ; head := 'From: ' ; lineL := null ; % ; WHILE (NOR ('Subject: ' = head) (isfault (line := readfile finn)) ) DO IF (NOR (str_isOf_chrSet line chrs_emailBlanks) (str_isBlank line) ) THEN IF (OR (in_list := (testHeadL EACHLEFT subStr_in_str line))) THEN head := line ; ELSE IF (OR ('Date: ' 'To: ' 'Cc: ' EACHLEFT = head) ) THEN lineL := head EACHRIGHT link (line str_cutBy_chr `;) ; IF (isstring lineL) THEN lineL := [LineL] ; ENDIF ; ELSE lineL := [link head line] ; ENDIF ; % ; % EACH write [lineL] ; % write ' ' ; fout EACHRIGHT writefile [lineL] ; fout writefile ' ' ; lineL := null ; % ; % [Subject: will only apture one non-[blank, messy] line ; ENDIF ; ENDIF ; ENDWHILE ; } #] +.....+ >> MUCH simpler now, but will it work? >> watch out for "missing Subject:" massive numbers of Subject:-labelled lines! I may have to put in a two-Subject:-line limit or something double-headers due to multiple-heads-in-a-line, which may cause "swi-back" to older head this will be easy to see, maybe easy to fix if its a problem qnial> pEmails_doALL -> pPdf_convertTo_pTxt : -> pEmails_fixHeads_pout : sed: -e expression #1, char 41: Unmatched \{ >> oops qnial> pEmails_doALL -> pPdf_convertTo_pTxt : -> pEmails_fixHeads_pout : >> pEmails_labelDates_pOut temp.txt - now 12.3 Mb, up from 4Mb !?? I must have been losing a lot! >> pEmails_fixHeads_pout temp.txt no content, complete failure moved to end of getHead_from_blankLines % EACH write [lineL] ; % write ' ' ; fout EACHRIGHT writefile [lineL] ; fout writefile ' ' ; lineL := null ; qnial> pEmails_doALL -> pEmails_fixHeads_pout : >> still nothing! might be : sed_xtraHeads 'extra Date:' ' s/|\(.*\)\{5,\}Date: /\1\nDate: /I' this allows NO characters in \(.*\)? how about \(.+\)? >> didn't help (str_isOf_chrSet line chrs_emailBlanks) chrs_emailBlanks := link ' ;.!:-_=~"[]E' chr_apo chr_tab ; >I can't see a problem here 08********08 17Jun2021 sedExpr - proved to be a problem, was missing link +-----+ qnial> pEmails_doALL +--+ -> pPdf_convertTo_pTxt : -> pEmails_labelDates_pOut : ?pdf_convertToCleanTxt_path error, file unknown, one of : /media/bill/ramdisk/210609 Leopold - Anthony Fauci emails, NIH Freedom Of Information Act, 1st clean.txt /media/bill/ramdisk/pEmails_labelDates_pOut temp.txt -> pEmails_fixHeads_pout : -> pEmails_get_pContacts : -> pEmails_get_pSubjects : +--+ >> path problems as usual, sigh,,,, Again, qnial> pEmails_doALL -> pPdf_convertTo_pTxt : >> still have mistakes in paths!!! OK - paths should work, but sed_messyStuff doesn't Why?? un-backslashed [."], sed_messyStuff I added backslashes : 'strange sequences of punctuation' ';s/[ \.!:-_=~\"\t]\{5,}//' >> still doesn't work, sed_messyStuff double-backslash \\t : 'strange sequences of punctuation' ';s/[ \.!:-_=~\"\\t]\{5,}//' >> still doesn't work, sed_messyStuff remove space!, also - forgot to backslash `. in sedExpr! : 'strange sequences of punctuation' ';s/[\.!:-_=~\"\\t]\{5,}//' >> still doesn't work, sed_messyStuff get rid of leading space in first sedExpr 'annoying formfeed' 's/ //' >> still doesn't work, sed_messyStuff get rid of tab \\t 'strange sequences of punctuation' ';s/[\.!:-_=~\"]\{5,}//' >> still doesn't work, sed_messyStuff get rid of [\.,\"] 'strange sequences of punctuation' ';s/!:-_=~]\{5,}//' I'm stumped, sed_messyStuff get rid of entire line : 'strange sequences of punctuation' ';s/!:-_=~]\{5,}//' >> still no good - just play with sedExpr $ sed 's/ //;s/<\(.*\)@/<@/;s/<@[ ]*\(.*\)>/<@\1>/' "$p_pdftotext" >"$p_1stClean" $ sed 's///;s/<\(.*\)@/<@/;s/<@[ ]*\(.*\)>/<@\1>/' "$p_pdftotext" >"$p_1stClean" bash: : No such file or directory >> of course, these are QNial variables! qnial> p_pdftotext /media/bill/ramdisk/pdftotext temp.txt qnial> p_1stClean /media/bill/ramdisk/210609 Leopold - Anthony Fauci emails, NIH Freedom Of Information Act, 1st clean.txt >> big OOPS, should be in d_Fauci! comment-out to just do sed : % host link 'pdftotext "' pinn '" "' p_pdftotext '"' ; host link 'sed "' sedExpr '" "' p_pdftotext '" >"' pout '" ' ; fix p_1stClean := link d_Fauci f_1stClean ; try append instead of write : host link 'sed "' sedExpr '" "' p_pdftotext '" >>"' pout '" ' ; $ sed 's/ //;s/<\(.*\)@/<@/;s/<@[ ]*\(.*\)>/<@\1>/' '/media/bill/ramdisk/pdftotext temp.txt' >"$d_webRawe"'Pandemics, health, and the Sun/corona virus/Fauci covid emails/210609 Leopold - Anthony Fauci emails, NIH Freedom Of Information Act, 1st clean.txt' sed: -e expression #1, char 0: no previous regular expression $ sed "s/ //;s/<\(.*\)@/<@/;s/<@[ ]*\(.*\)>/<@\1>/" '/media/bill/ramdisk/pdftotext temp.txt' >"$d_webRawe"'Pandemics, health, and the Sun/corona virus/Fauci covid emails/210609 Leopold - Anthony Fauci emails, NIH Freedom Of Information Act, 1st clean.txt' sed: -e expression #1, char 0: no previous regular expression $ sed "s/ //;s/<\(.*\)@/<@/;s/<@[ ]*\(.*\)>/<@\1>/" '/media/bill/ramdisk/pdftotext temp.txt' >>"$d_webRawe"'Pandemics, health, and the Sun/corona virus/Fauci covid emails/210609 Leopold - Anthony Fauci emails, NIH Freedom Of Information Act, 1st clean.txt' sed: -e expression #1, char 0: no previous regular expression Try cat $ cat '/media/bill/ramdisk/pdftotext temp.txt' | sed "s/ //;s/<\(.*\)@/<@/;s/<@[ ]*\(.*\)>/<@\1>/" >>"$d_webRawe"'Pandemics, health, and the Sun/corona virus/Fauci covid emails/210609 Leopold - Anthony Fauci emails, NIH Freedom Of Information Act, 1st clean.txt' sed: -e expression #1, char 0: no previous regular expression $ cat '/media/bill/ramdisk/pdftotext temp.txt' | sed "s/ //" >>"$d_webRawe"'Pandemics, health, and the Sun/corona virus/Fauci covid emails/210609 Leopold - Anthony Fauci emails, NIH Freedom Of Information Act, 1st clean.txt' sed: -e expression #1, char 0: no previous regular expression >> why isn't it reading the sedExpr???? NUTS!!! - don't use `/ as sed separator!!! IDIOT! $ sed 's| ||;s|<\(.*\)@|<@|;s|<@[ ]*\(.*\)>|<@\1>|' '/media/bill/ramdisk/pdftotext temp.txt' >"$d_webRawe"'Pandemics, health, and the Sun/corona virus/Fauci covid emails/210609 Leopold - Anthony Fauci emails, NIH Freedom Of Information Act, 1st clean.txt' sed: -e expression #1, char 0: no previous regular expression >> nyet - there aren't any paths in sedExpr! $ sed 's/ //;s/<\(.*\)@/<@/;s/<@[ ]*\(.*\)>/<@\1>/' '/media/bill/ramdisk/pdftotext temp.txt' >"$d_webRawe"'Pandemics, health, and the Sun/corona virus/Fauci covid emails/210609 Leopold - Anthony Fauci emails, NIH Freedom Of Information Act, 1st clean.txt' take a break... I'm flummoxed missing apo? (I may only just introducd it now?) : 'get rid of spaces in emlAddr(no good!)' ;s/<@[ ]*\(.*\)>/<@\1>/' loading organization sed_pdftotext ?undefined identifier: := LINK SED_FORMAT1 <***> SED_FORMATHEADS SED_ORGACRONYM ; >> I only just noticed this qnial> pEmails_doALL -> pPdf_convertTo_pTxt : it still didn't do pPdf_convertTo_pTxt p_emailsPdf p_1stClean sed_messyStuff ; >> why?? what did I corrupt? at least the pdftotext was running, but not now! qnial> p_1stClean /media/bill/Dell2/Website - raw/Pandemics, health, and the Sun/corona virus/Fauci covid emails/210609 Leopold - Anthony Fauci emails, NIH Freedom Of Information Act, 1st clean.txt >> IDIOT! I had commented that out : % host link 'pdftotext "' pinn '" "' p_pdftotext '"' ; -->[nextv] post pinn pout sedExpr +------------------------------------------------------------------------------------------------------------- |/media/bill/Dell2/Website - raw/Pandemics, health, and the Sun/corona virus/Fauci covid emails/210609 Leopold +------------------------------------------------------------------------------------------------------------- |/media/bill/Dell2/Website - raw/Pandemics, health, and the Sun/corona virus/Fauci covid emails/210609 Leopold +------------------------------------------------------------------------------------------------------------- |+-----+---------------+-----------------------+ ||s/ //|;s/<\(.*\)@/<@/|;s/<@[ ]*\(.*\)>/<@\1>/| |+-----+---------------+-----------------------+ +------------------------------------------------------------------------------------------------------------- >> dumkopf! I forgot to link the sedExprs! sed_pdftotext := link sed_messyStuff sed_formatHeads sed_orgAcronym ; >>no I didn't... so why are they still a list? -->[nextv] sed_pdftotext := link sed_messyStuff sed_formatHeads sed_orgAcronym ; -->[nextv] ?invalid host command -->[nextv] -->[nextv] sed_pdftotext := link sed_messyStuff sed_formatHeads sed_orgAcronym ; -->[nextv] ?invalid host command -->[nextv] >> what? missing link in all sedExprs sed_messyStuff := second cols (n_rows n_cols reshape sed_messyStuff) ; >> That was it!! hard to see Re-run whole thing +-----+ olde code This was put into pEmails_fixHeads_pout, but in form of sedExpr : #] pEmails_fixExtraDateToFromCcSubject_pout IS OP pinn pout sedExpr - split lines with multiple keyWords pEmails_fixExtraDateToFromCcSubject_pout IS OP pinn pout sedExpr { LOCAL finn fout p_temp ; % ; IF (NOT AND (EACH path_exists ("p_old pinn) ("p_new pout))) THEN EACH write '?pEmails_fixExtraDateToFromCcSubject_pout error, file unknown, one of : ' pinn pout '' ; ELSE finn := open pinn "r ; fout := open pout "w ; WHILE (NOT isfault (line := readfile finn)) DO lineL := line str_splitBy_subStrL key ; fout EACHRIGHT writefile lineL ; ENDWHILE ; EACH close finn fout ; ENDIF ; } write '-> pEmails_xtraHeads_pout : ' ; pEmails_xtraHeads_pout p_pdftotext p_xtraHeads ; # in header : [,un]comment alternate definitions of variables below # pEmails_fixHeads_pout # not useful? i_seq := 'From: ' 'Date: ' 'To: ' 'Cc: ' 'Subject: ' EACHLEFT find_Howell headL ; i_seq := (NOT EACH isfault i_seq) subList i_seq ; headL lineL := i_seq EACHLEFT EACHRIGHT pick headL lineL ; # loaddefs link d_Qndfs 'emails - generic optrs.ndf' IF flag_debug THEN write 'loading pEmails_get_pContacts_test' ; ENDIF ; #] pEmails_get_pContacts_test IS - res ipsa loquitor pEmails_get_pContacts_test IS { LOCAL finn fout ; NONLOCAL p_emailsClean p_contacts reference_Fauci sedExprToFromCC ; % ; title := 'Anthony Fauci (NIH/NIAID) corona virus emails - [From:,To:,CC:] extraction of contacts' ; pEmails_get_pContacts p_emailsClean p_contacts sedExprToFromCC title reference_Fauci ; } p_formatHeads := link d_temp 'consolidate_DateToFromCc temp.txt' ; p_xtraHeads := link d_temp 'pEmails_xtraHeads_pout temp.txt' ; % rearrange [headL, lineL] to standard sequencing ; % f_pdftotext := 'getHead_from_blankLines - test short.txt' ; % f_pdftotext := 'getHead_from_blankLines - test.txt' ; 08********08 15-16Jun2021 [test, correct, improve] getHead_from_blankLines +-----+ olde code % write headL ; % write testHeadL ; % EACH write '' '+---+' 'new call of getHead_from_blankLines ' '+---+' ; # olde code host link 'pdftotext "' p_emailsPdf '" "' p_temp '" ' ; host link 'sed "' sedExprFormat1 '" "' p_temp '" >"' p_pdftotext '" ' ; p_sedDateToFromCc := link d_temp 'sed_DateToFromCc temp.txt' ; ELSE lineExtra := link lineExtra line ; IF ('Subject:' = line) THEN write lineExtra ; lineExtra := null ; testHeadL := '<:wait for line:>' ; ELSEIF (null ~= testHeadL) THEN writeHead := ((first headL) find_Howell fullHeadL) pick writeHeadL ; write (link writeHead lineExtra) ; % fout writefile (link writeHead lineExtra) ; % reset for next head or exit ; lineExtra := null ; headL := rest headL ; ELSE null ; ENDIF ; ENDIF ; ELSE lineL := link lineL [line] ; IF (null ~= testHeadL) THEN line lineL := [first, rest] lineL ; head headL := [first, rest] headL ; writeHead := ((head find_Howell fullHeadL) pick writeHeadL ; write (link writeHead (first lineL)) ; % fout writefile (link writeHead (first lineL)) ; % reset for next head or exit ; lineL := rest lineL ; headL := rest headL ; ENDIF ; IF (AND ('Subject: ' = head) ( (subjectLine := last lineL)) THEN testHeadL := '<:done:>' ; ENDIF ; # testHeadL := 'Date:' 'To:' 'CC:' 'Subject:' ELSE testHeadL := (NOT in_list) sublist testHeadL ; IF (~= ['Subject:'] head) THEN testHeadL := testHeadL link ['Subject:'] ; ENDIF ; IF (= ['Subject:'] head) THEN testHeadL := 'done' ; ENDIF ; writeHead := ((first headL) find_Howell fullHeadL) pick writeHeadL ; write (link writeHead (first lineL)) ; % fout writefile (link writeHead (first lineL)) ; % reset for next head or exit ; lineL := rest lineL ; headL := rest headL ; IF (AND ('Subject: ' = head) ('Subject: ' (NOT in) testHeadL)) THEN testHeadL := '<:done:>' ; ENDIF ; # olde code IF (str_isOf_chrSet (5 drop line) chrs_emailBlanks) THEN write line ; IF (line = 'From:') THEN getHead_from_blankLines finn fout ; ELSE % getHead_from_blankLines finn fout line ; ENDIF ; ENDIF ; 08********08 14Jun2021 +-----+ https://www.usatoday.com/story/news/factcheck/2021/06/05/fact-check-fauci-emails-hydroxychloroquine-dont-show-he-lied/7544007002/ Fact check: Fauci's emails don't show he 'lied' about hydroxychloroquine Daniel Funke USA TODAY,Jun2021, updated 06Jun2021 +-----+ https://www.foxnews.com/politics/fauci-emails-shine-new-light-on-early-questions-about-covid-19-origins Scientist who emailed Fauci about coronavirus possibly being engineered deactivates Twitter account Fauci has faced increasing criticism from Republicans 07Jun2021 Kristian Andersen, a virologist at California’s Scripps Research Institute who emailed Dr. Anthony Fauci in January 2020 to raise the possibility that the coronavirus may have been engineered, has deactivated his Twitter account. He wrote that his team was in the early stages of looking critically at the data but found "the genome inconsistent with expectations from evolutionary theory. But we have to look at this much more closely and there are still further analyses to be done, so those options could still change." +-----+ https://www.dailysignal.com/2021/06/02/11-takeaways-from-faucis-emails-about-covid-19/ 11 Takeaways From Fauci’s Emails About COVID-19 Fred Lucas / @FredLucasWH / June 02, 2021 “Fauci’s emails show he suspected early last year that COVID-19 possibly leaked from the Wuhan lab—yet he stayed silent,” Scalise wrote on Twitter. “This is a major cover-up. We need a full congressional investigation into the origins of COVID-19.” 1. Capitol Hill Response 2. ‘Gain of Function’ 3. ‘Masks … for Infected People’ 4. ‘China … Misled the World’ 5. ‘Sequences … Look Engineered’ 6. ‘Your Comments Are Brave’ (not Wuhan Engineered) 7. Chinese Official: Let’s ‘Work Together’ 8. NFL Season 9. ‘Cult Following’ 10. Contacts With Gates, Zuckerberg 11. Republican Correspondents +-----+ https://www.usatoday.com/story/news/factcheck/2021/06/03/fact-check-email-fauci-doesnt-contain-origin-coronavirus/7511931002/ Fact check: No, email to Fauci doesn't contain origin of a 'coronavirus bioweapon' Rick Rouan USA TODAY, 03Jun2021 +-----+ https://townhall.com/tipsheet/spencerbrown/2021/06/02/fauci-emails-list-n2590371 Six Bombshell Revelations from Fauci's Emails Spencer Brown @itsspencerbrown Posted: Jun 02, 2021 4:35 PM 1. Scientists told Fauci that COVID-19 might be engineered but he ignored them 2. Fauci was circulating articles on "gain of function" research on February 1, 2020 3. Fauci tattled on Ron DeSantis but stayed silent on Cuomo 4. Fauci knew masks didn't work and told people not to wear them 5. The Chinese penpals 6. In March 2020, Fauci thought COVID would stop on its own without a vaccine 08********08 14Jun2021 [clean, contacts] emails +-----+ # given extract lines of interest s/^From://I;s/^To://I;s/^CC://I[ ] create common parenthesis ;s/[{(\[]/(/;s/[)}\]]/)/[ ] placeholder for newline ;s/;/\\sed_n/g[ ] get rid of multiple spaces ;s/[ ]\+/ /g problematic lineStart1 ;s/^ //g problematic lineStart2 ;s/^\.\+//g firstname tighten ;s/ \, /\, /g lastname tighten ;s/ \, /\, /g get rid of title problem ;s/Dr. //" # results % clean up organisational acronyms ; % regexpr search1 (.*) replace '\1' ; % regexpr search2 \t\t replace '\t\t' ; '(NIH/NIAID)' ';s/(N[1IJlTf]H\/N[1IJlTf]A[1IJlTf]D)\(.*\)/(NIH\/NIAID)/' '(NIH/CC/DLM)' ';s/(NIH\/CC\/DLM)\(.*\)/(NIH\/CC\/DLM)/' '(NIH/FIC)' ';s/(NIH\/FIC)\(.*\)/(NIH\/FIC)/' '(NIH/NCI)' ';s/(NIH\/NCI)\(.*\)/(NIH\/NCI)/' '(NIH/OD)' ';s/(NIH\/OD)\(.*\)/(NIH\/OD)/' '(NIH/VRC)' ';s/(NIH\/VRC)\(.*\)/(NIH\/VRC)/' '(CDC/DDID/NCIRD/OD)' ';s/(CDC\/[D0][D0][I1][D0]\/NCIRD\/[O0][D0])\(.*\)/(CDC\/DDID\/NCIRD\/OD)/' '(CDC/OD)' ';s/(CDC\/[O0]D)\(.*\)/(CDC\/OD)/' '(OS/IOS)' ';s/(OS\/10S)\(.*\)/(OS\/IOS)/' '(OS/ASPR/IO)' ';s/(OS\/ASPR\/[1I]0)\(.*\)/(OS\/ASPR\/IO)/' # I need these repeats? one uses s///g # YES one for each case - either <> or () # ;s/<\(.*\) \(.*\)>/<\1\2>/ # ;s/<\(.*\) \(.*\)>/<\1\2>/ # ;s/(\(.*\) \(.*\)/(\1\2)/ # ;s/(\(.*\) \(.*\)/(\1\2)/ # ;s/(\(.*\) \(.*\)/(\1\2)/ # ;s/(\(.*\) \(.*\)/(\1\2)/ # ;s/,[ ]*/, / # NOT these : # ;s/, \(.*\)[ ]*\(.*\)[ ]*(/, \1\2 (/ # ;s/^[ ]*// # ;s/[ ]\+(/ (/ # ;s/^\(.*\)[ ]\+\(.*\),/\1\2, / # ;s/<\(.*\) \(.*\)>/<\1\2>/g # ;s/^[ .]\+//g +-----+ Corrections sed won't work : # using : sedExprAll := link sedExprToFromCC ; qnial> extract_contacts sed: -e expression #1, char 2: extra characters after command qnial> sedExprHost xrc ie fitrss^rm/Is^o/Is^C/Ilchle o eln;//\e_/gtrdo utpesae;/ \//gtrdo pcswti )s((*)\.\)(12/polmtclnSat;/ /polmtclnSat;/\\/gisnm ihe;/\ \ gatae ihe;/ (\ /gtrdo rai il;/r / # sedExprAll := link [sedExprToFromCC] ; qnial> extract_contacts sh: 1: Syntax error: Unterminated quoted string qnial> sedExprHost s/^From://I;s/^To://I;s/^CC://I;s/;/\\sed_n/g;s/[ ]\+/ /g;s/(\(.*\) \(.*\))/(\1\2)/g;s/^ //g;s/^\.\+//g;s/ \, /\, /g;s/, ((/\, (/g;s/Dr. //" # might be problem : 'get rid of erratic title' ';s/Dr. //"' change to : 'get rid of erratic title' ';s/Dr. //' >> OK, works now, go back to full origina with Dr. fix # original form 'extract lines of interest' 's/^From://I;s/^To://I;s/^CC://I' 'placeholder for newline' ';s/;/\\sed_n/g' 'get rid of multiple spaces' ';s/[ ]\+/ /g' 'create common parenthesis' ';s/[{(\[]/(/g;s/[)}\]]/)/g' 'get rid of spaces within ()' ';s/(\(.*\) \(.*\))/(\1\2)/g' 'problematic lineStart1' ';s/^ //g' 'problematic lineStart2' ';s/^\.\+//g' 'firstname tighten' ';s/ \, /\, /g' 'lastname tighten' ';s/, (/\, (/g' 'get rid of erratic title' ';s/Dr. //' # modified forms to test 'create common parenthesis' ';s/[{(]/(/g;s/[)}]/)/g' # error check without these 'lastname tighten' ';s/, (/\, (/g' # original form '(NIH/CC/DLM)' ';s/(N[1IJlTf]H\/CC\/[D0]LM)/(NIH\/CC\/DLM)/' '(NIH/FIC)' ';s/(N[1IJlTf]H\/F[1IJlTf]C)/(NIH\/FIC)/' '(NIH/NCI)' ';s/(N[1IJlTf]H\/NC[1IJlTf])/(NIH\/NCI)/' '(NIH/OD)' ';s/(N[1IJlTf]H\/[O0][D0])/(NIH\/OD)/' '(NIH/VRC)' ';s/(N[1IJlTf]H\/VRC)/(NIH\/VRC)/' '(CDC/DDID/NCIRD/OD)' ';s/(C[D0]C\/[D0][D0][I1][D0]\/NCIRD\/[O0][D0])/(CDC\/DDID\/NCIRD\/OD)/' '(CDC/OD)' ';s/(C[D0]C\/[O0][D0])/(CDC\/OD)/' '(OS/IOS)' ';s/([O0]S\/[1IJlTf]0S)/(OS\/IOS)/' '(OS/ASPR/IO)' ';s/([O0]S\/ASPR\/[1IJlTf][O0])/(OS\/ASPR\/IO)/' # error check without these # original form sedExprAll := link sedExprToFromCC sedExprOrgAcronym ; # error check with modified form sedExprAll := link sedExprToFromCC ; +-----+ now back to problem with sed_n, multiple emails, cc >> do the part to consolidate [From:, To:, CC:, Date:], then come back 08********08 12Jun2021 [clean, contacts] emails Latika Chugh {W ION) 08********08 12Jun2021 convert pdf to text $ pdftotext "$d_webRawe"'Pandemics, health, and the Sun/corona virus/210609 Leopold - Anthony Fauci emails, NIH Freedom Of Information Act.pdf' "$d_webRawe"'Pandemics, health, and the Sun/corona virus/210609 Leopold - Anthony Fauci emails, NIH Freedom Of Information Act.txt' >> OK - great! It's not just photos of pages >> many spaces in mddle of words etc, pretty good In-Emails typically go through "Cassetti, Cristina v(NIH/NIAID) [E]", who is likely assitant to Fauci Cleanup : Remove : , extra s, Consolidate : sentence (to period,\n, ), paragraphs, emails Extractions : To:, From:, CC:, Date:, Subject: search some terms (spelling?) : Non-vaccine treatments : Hits 1 hydroxychloroquine, chloroquine South Korea has been administering Hydroxl Chloroquine, a treatment for Malaria, to her citizens that have contracted Coronavirus From: RJ Claymont -> Cassetti, Cristina -> (b)( Date: March 11, 2020 at 4:22:35 AM ED T To: "Fauci , Anthony (NIH/N lAID ) [E]" (b)(6) Subject: Breakthrough: Chloroquine phosphate has shown apparent efficacy in treatment of COVID-19 associated pneumonia in clinical studies remdesivir N-acetyl cysteine (NAC) - has been used to rapidly clear Rotavirus diarrheal infections in children porcine epidemic diarrhea virus (PEDV) - coronavirus plaquenil Plaquenil Shortage Causing National Health Emergency for Lupus Patients (b)(6) When I tried to refill Plaquenil today, I learned that there was a national shortage due to doctors prescribing Plaquenil to their well patients and themselves. Recent news articles have reported its success in preventing and treating the coronavirus . I have done some literature searching on potential treatments for the novel coronavirus and stumbled across a few case case reports from China in 2005 at the time of the SARS outbreak. They detailed some successes in treatment of severe cases with chloroquine. I saw a more recent study showing hydroxychloroquine had better in vitro efficacy than chloroquine. vitamin C, D3 Blaylock cytokine paper to emphasize the importance of vitamin C, D3 etc. Instant Hand Sanitiser is an alcoholbased gel which kills bacteria invermectin - 08********08 Olde ToDos 14Jun2021 need a dictionary to fix spaces in words 18Jun2021 problems with sed expressions -> p_orgAcronym1 : sed: -e expression #1, char 62: unknown option to `s' -> p_orgAcronym2 : -> p_orgAcronym3 : 20Jun2021 pTxt_removeAdd_spaces : would it be much more efficient to preset (gage shapes) and use indexing rather than appends ? % create QNial list of fragCombines that can be used to search for components and missing spaces ; % find all subs of frags (potential builders) ; % cull [fragL, allSubFragL] ; # enddoc