Home Neural Nets Projects Software programming & code Professional & Resume Publications & reports
Howell-produced videos Blogs Cool stuff Crazy themes and stories Hosted sub-sites Neil Howell's Art
Software programming & code QNial programming language Linux bash scripts LibreOffice macros System_maintenance

25Nov2020 This website has been substantially revamped, with changes to [page, menu, content, link]s. There will be problems with links - probably 10 or so webPages of 187 total still have to be corrected after coding changes and my mistakes messed them up.
Many of my web-postings will illustrate my [errors, inconsistencies, ignorance] and some may be upsetting. But I prefer to be seen as a fool than something that I am not? Huh?..., whatever...

Howell's webSite setup & tools for maintaining [menu, header, footer, body] links, TableOfContents

This is an extremely [quick, incomplete] description of a Sep-Nov2020 project to revamp my website following major changes to [files, directory, url]s.

Howell's simple (ugly) website approach

My approach to webSites is to make them entirely in html (no [CSS, Javascript, etc]). I am not very interested in having a [nice, pretty] website that is pleasant to go through, because people will not browse through my website, they will usually zero in on substantive content via web searches (eg duckduckgo.com). Although there are a few webPages that summarize [knowledge, work, insights] on specific issues in [commentary, images, data], the vast majority of my webSite content does NOT occur on webPages, it is in the files scattered over a large number of directories.

I now save nearly all of my non-[confidential, private] work in my d_webRawe "raw directory" so that it is automatically uploaded to my website, without having to think about it too much. That makes my webSite updates much easier to do, which hopefully means that it won't take 5-10 years between substantive updates, because its not something that I like to spend time on.

In order to understand what my webSite system does, one must first understand the structure of multiple "copies" of the webSite, and what transformations are done at each stage.

There are 2 copies of my webSite on separate drives, which helps to reduce mistakes of over-writing files in the wrong directories, plus a third copy which is online website.

The first is my "working directory" (d_webRaw). d_webRaw has "bare bones" webPages consisting only of the body of a webPage, along with minimalist "executable embeds" coding in the html webPages that add [menu, header, footer, TableOfContents] at a later stage. This "bare-bones" format removes distractions and long page-[up, down] travelling within a web-page as it is being [created, modified, moved to a different sub-directory, etc].

The second copy on my webSite (d_webSite), is mostly a copy of my "working directory" (d_webRaw) copy, except that the webPages (targeted html files) are transformed by changing inserting [menu, header, body, footer] links to relative addressing. Relative addressing is a "must" for being able to easily transfer files between directories and the online webSite (d_webOnln, as described below). d_webRaw and d_webSite are on different USB drives to minimize my recurrent problem of editing the d_webSite files directly, which creates a mess of inconsistencies when I start editing the wrong version of files I should ONLY be working in d_webRaw, but mistakes do happen).

The third and final copy of my webSite is the uploaded online version that the public can see (d_webOnln), which is merely a copy of d_webSite, which again makes uploads simple. All webPage [menu, header, footer, link]s [appear, work] the same way in d_webSite and d_webOnln, so it is easy to see [formatting, link function, etc] [problem, opportunity]s on my local d_webSite drive rather than to mess around with uploads etc when developing webPages.

To summarize, I only work in d_webRawe, and the progression of file transfers is :

d_webRaw -> (via rsync) -> d_webSite -> (via fileZilla) -> online website

The file transfers and the automated processing of [menu, header, footer, link]s are done by the software described in the rest of this webPage.

webSite [menuHeadFoot, link, TableOfContents, link] maintenace tools

A hybrid [QNial, bash script, unix command] environment processes the d_webRaw webPages to [final, uploadable] d_webSite webPages. This starts with a text editor (eg [geany, kwrite] in Linux), along with [QNial programs, Unix commands, bash scripts] to automate [augmentation, processing, checks, fixes].

QNial operators

Here is a very brief description of the key QNial operators used :

"$d_webOnln""Software programming & code/Qnial/MY_NDFS/webSite header.ndf" :
*********************
loaddefs link d_Qndfs 'webSite header.ndf'
+-----+
Global variables
+-----+
Find all d_webRawe html files related to webSite [URLs, convert, update]
webSite_extract_pathsSubDirsFnames IS - all relevant webSite paths to p_[all, html]FileList
sorted [fname, all] lists
webSite_readpathsSubDirsFnames IS - read stable [path, dir] lists


"$d_webOnln""Qnial/Software programming & code/MY_NDFS/webSite header.ndf" :
*********************
loaddefs link d_Qndfs 'webSite maintain [menu, header, footer, body] links, TableOfContents.ndf'
+-----+
Change links in the body of an html file to a relative format, exclude 'http' '#' 'mailto:'
webRaw_extract_links IS - generate fileList with %20 in links (p_html_files_pct20)
midIndxsLines_ignoreBads IS OP webPage webSite - remove "bad links" that should not be processed
internalLinks_return_relativePath IS OP backtrack strLeft strRight line - convert links to relative
+-----+
Update webPages[Raws, Sites, All]
webPageRaw_update IS OP flag_test pinn pout d_inn d_out - update pinn to d_out using executeEmbeds
Although built for webSite maintenance, with adaptation this has far more general applicability.
webPageSite_update IS OP flag_test pinn pout d_inn d_out - update webPageRaw-to-Site, executeEmbeds
Although built for webSite maintenance, with adaptation this has far more general applicability.
webAllRawOrSite_update IS OP flag_backup optr_rawOrSite - update webSite to "uploadable" version
+-----+
URLs - check and count, helps for debugging
webURLs_extract IS - extract all link urls from a website [external, internal, menu, pagePosn]
urls_check IS OP linkType - create sublists of internal html files, check links with curl
website_link_counts IS - summarize the counts for links [external, internal, menu, tableOfContent]s

A [close, simple] description of the automated part of the transformation process between d_webRawe and d_webSite (odiffernt USB drives) is provided by :

webSite_doAll IS
{
writeDoStep 'webSite_extract_pathsSubDirsFnames' ;
writeDoStep (link 'host ' chr_apo 'bash "$d_bin""rsync website.sh"' chr_apo) ;
writeDoStep (link 'host ' chr_apo 'bash "$d_bin""webSite check for [z_Archive, z_Old].sh"' chr_apo) ;
writeDoStep 'webSite_extract_pathsSubDirsFnames' ;
writeDoStep (link 'str_replaceIn_pathList l '
chr_apo d_webRawe chr_apo ' '
chr_apo '../../' chr_apo ' '
chr_apo '[\#=; backtrack ;=\#]' chr_apo ' '
'htmlPathsSortedByPath'
) ;
writeDoStep 'webAllRawOrSite_update l "webPageRawe_update' ;
writeDoStep 'webAllRawOrSite_update l "webPageSite_update' ;
writeDoStep 'webURLs_extract' ;
writeDoStep (link 'urls_check ' chr_apo 'intern' chr_apo) ;
% writeDoStep (link 'urls_check ' chr_apo 'extern' chr_apo) ;
writeDoStep 'webSite_link_counts' ;
}

(I had to modify the [\#=; backtrack ;=\#] above to prevent it from being processed.)

By far the longest time is required for the testing of "external" (online) links, which takes about 5-10 minutes, so I often comment that out during development.

In addition to the operators listed above, many [new, modified] operators were made for this project, most notably in :
"$d_webOnln""Qnial/Software programming & code/MY_NDFS/strings.ndf" :
"$d_webOnln""Qnial/Software programming & code/MY_NDFS/fileops.ndf" :

I won't comment on these, except to point out fileops.ndf operators to facilitate making BACKUPS of [file, directory]s, which of course is critical!!! :
+-----+
[backup, restore] cart [path, dir]s
path_backupDatedToSameDir IS OP path - backup dated version of a file in same directory
path_backupDated_delete IS OP path - rename a file with date precursor
path_backupDatedTo_dir IS OP path dirBackup - backup dated version of a file to a specified FLAT dir
pathList_backupTo_dir IS OP pathList dirToCreateBackupDir - backup dated files to a FLAT dir
dirBackup_restoreTo_paths IS OP d_backup p_pathList - restore paths listed in a backup (FLAT) dir

A [brutally detailed, sad] documentation of my flounding to develope the coding is in :
"$d_webOnln""Qnial/Software programming & code/MY_NDFS/website updates notes.txt"

Unix commands and bash

As stated above, I use a hybrid [QNial, bash script, unix command] approach, so a fair amount of [Unix, bash] code is integrated intothe QNial programs listed above. I will note several bash scripts :
"$d_webOnln""Qnial/Software programming & code/bin/rsync website.sh"
"$d_webOnln""Qnial/Software programming & code/bin/lftp update www-BillHowell-ca.sh"
"$d_webOnln""Qnial/Software programming & code/bin/webSite check for [z_Archive, z_Old].sh"
"$d_webOnln""Qnial/Software programming & code/bin/website urls.sh"

Testing

I don't have time right now to go into details of tests that have been setup. These tests have beencial to me, by allowing fast testing after major (and minor) changes and innvations that affect coding elwhere.
"$d_webOnln""Software programming & code/Qnial/code develop_test/Website updates- tests.ndf"
"$d_webOnln""Software programming & code/Qnial/code develop_test/file_ops- test.ndf"
"$d_webOnln""Software programming & code/Qnial/code develop_test/strings- tests.ndf"

File uploads to my online webSite

For uploads, I mostly use fileZilla, which does cause problems sometimes with uploading unchanged files if you forget to check the settings. In principle, the settings are saved, in practise I lose them when playing around. Here is my checklist :

Make sure that upload transfer settings are correct!!!
fileZa always seems to re-upload files that aren't newer!!!
Menu -> File -> Site manager -> Advanced tab ->
Adjust server timezone offset -> 0 hours (just leave like that!)
ALWAYS set overwrite for each transfer!!!! FileZilla options are NOT stable.
Menu -> Edit -> settings -> Transfers -> File exists action :
Downloads -> Overwrite file if source file newer
Uploads -> Overwrite file if source file newer
Menu -> Edit -> settings -> Logging
Log to file : /media/bill/PROJECTS/bin/filezilla.log
Menu -> View -> Directory listing filters -> check ONLY "Temporary & backup files" for local filters
-> Edit filter rules -> "Temporary & backup files" :
Filename ends with : [~, .bak.]
Filename contains : [References, z_Archive, z_Old, z_References]
uncheck : Conditions are case sensitive
check : Filter applies to : Files, Directories
Click OK to retain changes
Remove remaining transfer queue!!! to make sure you start fresh :
Menu -> Edit -> Clear private data -> check Clear transfer queue box
click in transfer [queued files, failed transfers, successful transfers] windows and [clear, delete] lists after all done

I also use lftp, but not nearly as often as it seems much slower. It does have the advantages of a scripting approach with great flexibility and logging.

Formatting constraints

As one should expect, automated processing of d_webPageRawe webPages does impose [constraint, requirement]s on the raw html files. I've run out of time for now, but here is an incomplete list a few of the requirements.

Link subdirectories do NOT have to be fully specified, but at least the file name (fname) or end subDirectory should be spelled correctly.

[A HREF=, IMG SRC] formats
The full-capitalisation as shown is the REQUIRED format for links (for now), and "SRC=" must follow directly after "IMG ". For example :
These will NOT work :
img style="border-width:0" src="gnu-head-mini.png">
img SRC="gnu-head-mini.png" alt="Creative commons.png" style="border-width:0">
IMG style="border-width:0" src="gnu-head-mini.png">
IMG alt="gnu-head-mini.png" style="border-width:0" SRC="Creative commons.png">

These will work :
IMG SRC="gnu-head-mini.png" style="border-width:0"=>
IMG SRC="gnu-head-mini.png" style="border-width:0" alt="Creative commons.png">
IMG SRC="gnu-head-mini.png" alt="Creative commons.png style="border-width:0" src=>
IMG SRC="gnu-head-mini.png" alt="Creative Commons License" style="border-width:0">

As you can see, 'SRC' must occur ONE space after IMG. Of course, I am being lazy, as a single option in [grep, sed] allows case-insensive code and spacing can also be handled easily.

Website stats

The 25Nov2020 status of my webSite links is provided in the file :
"$d_webRawe""Website - raw/webWork files/webSite summary of [fail, unknown, OK,total] links.txt"

As you can see, there are 166 "failed links" (failed targets of links - most with multiple links going to them). I still have some work to do to fix links, although many can be fixed easily by [QNial, bash] tools if I get around to it. URLs to exterrntes are another matter. I've learned that one cannot rely on amazon links for any length of time. I can understand that - they have to change their webPags continually. But this is also a problem with other sites as well.

Website stats for : www.BillHowell.ca 201125 12h50m26s
Summary of the number of links by type [external, internal, menu, tableOfContent] and [OK, bad] :

6977 = count of all links in webSite

1161 = count of all [file, dir, url]s targeted by links in the webSite

Counts below are the number of unique TARGETED [file, dir]s of links
(eg 5+ links per target on average)

Failures :
+--+------------+
|79|errors list |
+--+------------+
|22|extern fails|
+--+------------+
|38|howell list |
+--+------------+
|27|intern fails|
+--+------------+

Unknowns - I havent written code to really show [OK, fail] :
+---+-----------+
|71 |mailto list|
+---+-----------+
|277|pgPosn list|
+---+-----------+

OKs - these links have been shown to work :
+---+---------+
|239|extern OK|
+---+---------+
|408|intern OK|
+---+---------+

[fail, unknown, OK, total] counts :
+----+-------------+
| 166|failed links |
+----+-------------+
| 348|unknown links|
+----+-------------+
| 647|OK links |
+----+-------------+
|1161|total |
+----+-------------+

Updates: 25Nov2020 initial
Directory of available files for this webpage

Copyright © 2007 through 2020
All website content is owned by William Neil Howell of Alberta, Canada, except when it isn't.

Permission is granted to copy, distribute and/or modify Howell's content under either:
GNU Public License The GNU Free Documentation License; with no Invariant Sections, Front-Cover Texts, or Back-Cover Texts.
Creative Commons License Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.
It is expected that reusers of this web-site will:
  • Acknowledge William Neil Howell and/or the specific author of web-site content,
  • Provide a link or reference back to the specific web-page carrying the content that is being used,
  • Allow any modifications made to the content to also be reused under the terms of either of the two standard licences (GNU or Creative Commons).