If you've ever needed to hunt through a massive HTML project for a specific, repetitive clause, there aren't a lot of tool that are really that helpful. I'm sure there are some tools better suited to it, but I found a recipe for gawk that I like.
This recipe starts by setting the Gawk Record Separator RS
to the close tag for the component
I'm looking for. The gsub
command replaces all of the newlines with an obscure Unicode character,
"∎", the "tombstone," so you might have
to change this if you're parsing math proofs, but in general it's rare enough that I can get away
with it.
The search starts with trying to find the open tag of the HTML component, and the tag inside it that I was looking for. This is a combination layout & display we use a lot at authentik. I then convert the result back to having the proper newlines, and print it with a separator.
Since the results are legitimate Lit-Element references, I can put them all into a .ts
file and
prettier will format them into neat, regular columns, regardless of how heavily indented they were
in the original.
BEGIN {RS="</ak-form-element-horizontal>"}
{
gsub(/\n/, "∎")
if (match($0, /\s*<ak-form-element-horizontal[^>]*>.*<input[^>]*type="[^"][^"]*"[^>]*>.*/, m)) {
result = m[0] "</ak-form-element-horizontal>"
gsub(/∎/, "\n", result)
print "<!-- ----------------------------------------------------------------------- -->"
print result "\n"
}
}
As a simple addition, here's how I find all of my horizontal form clauses that do not have an
<input>
tag, but something else (like <select>
, <textarea>
, or just some display elements):
BEGIN {RS="</ak-form-element-horizontal>"}
{
gsub(/\n/, "∎")
if (match($0, /\s*<ak-form-element-horizontal[^>]*>.*/, m)) {
result = m[0] "</ak-form-element-horizontal>"
if (result !~ /<input/) {
gsub(/∎/, "\n", result) # Restore newlines
print "<!-- ----------------------------------------------------------------------- -->"
print result "\n"
}
}
}
I know you're not supposed to be able to use Regular Expressions to scan HTML, but using Prettier to format individual tags onto their own lines makes it a heck of a lot easier.