As of
Est. 

Consistency Checks

☑️ It is interesting, how many things can go wrong. And got rotten over times. Better watch out!

Some sources of inconsistency I had anticipated, but others I wasn't aware of until I stumbled across them. So I decided to build some controls and checks into the website.

This should not be a problem. VitePress warns about dead links. Yes and No. It took me a while to realize that the hash part of URLs is not checked by VitePress. And that might probably catch you out. If you change the heading of a section, all links to that section are broken.

My solution was, as you can guess, Yet Another Vite Plugin: the dead-link plugin collects the set of all possible internal link targets, i.e. the id tags of HTML elements. It also collects two sets of URLs from the hyperlinks on the pages: those pointing to external pages and those pointing to internal ones. The plugin then outputs the set difference of the internal URLs minus the internal link targets. If this result is empty, there are no missing targets for internal hyperlinks.

The external links are written to a shell script. Each URL is retrieved with curl. If the target page does not exist or is a 404 page, the check fails for this URL.

bash
...
echo -n http://ergberg.tk; \
res=$(curl -fLs -m 5 \
  http://ergberg.tk | pup "head > title" text{} 2>&1 ); \
echo -n " [$res] "; \
if [ "$res" = "EOF" ] \
then echo " FAILED" ; else echo " OK" \
fi
...
...
echo -n http://ergberg.tk; \
res=$(curl -fLs -m 5 \
  http://ergberg.tk | pup "head > title" text{} 2>&1 ); \
echo -n " [$res] "; \
if [ "$res" = "EOF" ] \
then echo " FAILED" ; else echo " OK" \
fi
...

The pup command (github:ericchiang/pup) lets us check for content using CSS selectors. I use it to extract the title of the document. In general, the result is not important, only the existence. The output of the script might look similar to this excerpt:

txt
https://pages.cloudflare.com    [Cloudflare Pages]  OK
https://graphviz.org/           [Graphviz]  OK
http://dot.tk                   [Dot TK - Find a new FREE domain]  OK
https://prismjs.com/index       [Prism]  OK
https://bro.ke.nl.ink           [EOF]  FAILED
https://pages.cloudflare.com    [Cloudflare Pages]  OK
https://graphviz.org/           [Graphviz]  OK
http://dot.tk                   [Dot TK - Find a new FREE domain]  OK
https://prismjs.com/index       [Prism]  OK
https://bro.ke.nl.ink           [EOF]  FAILED

Expected External Content

It is good to know that an external link target still exists. Sometimes it would be even better to know if it still shows the same information. This is a generalization of the approach above: To check a web page for a specific piece of information, I state

  • the URL,
  • the selector
  • the expected result

And where to store the checks? I decided to put them into the Markdown files, along with the content that depends on them. They can be easily cut out of the markup. They are written to an external script, like the checks described in the previous section. Both are executed during a special prepare build step.

The format I use in the markup files is

bash
§<command>§<expected result>
§<command>§<expected result>

Here is an example from /basics/vitepress:

bash
§curl -s \
 https://vitepress.dev/guide/what-is-vitepress.html |\
 pup '.vp-doc > div:nth-child(1) > p:nth-child(2) text{}' |\
 tr -d '\n '§VitePressisVuePress&#39;littlebrother,builtontopofVite.
§curl -s \
 https://vitepress.dev/guide/what-is-vitepress.html |\
 pup '.vp-doc > div:nth-child(1) > p:nth-child(2) text{}' |\
 tr -d '\n '§VitePressisVuePress&#39;littlebrother,builtontopofVite.

For better readability, it is formatted here in multiple lines. In the markdown file, this is a single line of text. If the VitePress documentation changes its claim that VitePress is VuePress' little brother, built on top of Vite. this will throw an error the next time the checks are run.

As with the checks above, the generated shell script is a bit more complex, showing the source markup file of the check and the result of the test.

Other Checks?

As you can see from the example above, this is not really just about web page content. Other shell commands can be specified between the two § characters. An example to check if the file size of the search index is still in the range stated in the text, and node.js is still the documented version? Voilà:

bash
§ ls -l --block-size=M docs/index.json \
 | awk '{print $5}' §2M

 § node -v §v18.12.1
§ ls -l --block-size=M docs/index.json \
 | awk '{print $5}' §2M

 § node -v §v18.12.1

Security

I run shell commands that I extract from the markdown files. The scripts are not part of the normal build. They are not executed during the build on Cloudflare Pages, for example. If you want to trick me into executing commands in my local sandbox build system, I would suggest manipulating my package.json or the node_module swamp. For me, that seems to be the easier way.

Last Modification Dates

Changes to lines starting with § are ignored when looking for the last modifications in a file.