muckrights-sans-merde
bonum fabula frat
### cleaning-an-html-mess-0-2
previous version:
=> cleaning-an-html-mess-in-30-minutes.html cleaning-an-html-mess-in-30-minutes
*originally posted:* jun 2021
the original code took 30 minutes, this update took 22.
```
# cancel consecutive repeated text 0.2
# jun 2021
# license: creative commons cc0 1.0 (public domain)
# http://creativecommons.org/publicdomain/zero/1.0/
# "compile" program from code editor:
# fig50 ccrt.fig | tail
#date
#Thu Jun 3 06:30:06 UTC 2021 <- started coding
#logic:
#1. parsing on when it hits: ">"
#2. parsing off when it hits: "</td"
# date
# Thu Jun 3 17:51:30 UTC 2021 0.2 started
# - len / minus 4 / left / rtrim
# - if leading "[", locate "]", len / minus location / right / ltrim
# date
# Thu Jun 3 18:13:52 UTC 2021 0.2 completed
#3. compare to buffer
#4. if non-match, print
#5. copy to buffer
#go!
p arrstdin
prevbuf ""
buf ""
fb 0
forin lin p
forin each lin
fb 1
now buf plus each swap now buf
# parsing on when it hits: ">"
ifequal each ">"
buf "" # reset
next
# parsing off when it hits: "</td"
bufright4 buf right 4 lcase
ifequal bufright4 "</td"
# len / minus 4 / left / rtrim
buflen buf len minus 4
now buf left buflen rtrim swap now buf
# if leading "[", locate "]", len / minus location / right / ltrim
bufleft buf ltrim left 1
ifequal bufleft "["
bufloc instr buf "]"
iftrue bufloc
buflen buf len minus bufloc
now buf right buflen ltrim swap now buf
next
next
# compare to buffer
ifequal buf prevbuf
fb 0
else
# if non-match, print
fb 1
next
# copy to buffer
prevbuf buf
break
next
next
buf ""
iftrue fb
now lin print
next
next
# date
# Thu Jun 3 06:59:43 UTC 2021
# 30 minutes including debugging
# (probably 20-25 without the comments)
```
i noticed that at least one person wasnt having their duplicates filtered, because there was one trailing space after half their posts, before the closing table data tag.
the first addition to the logic should fix that.
it wasnt necessary to "ignore" (filter) the tr-bridge user. checking for a leading "[" and removing [*] from the buffer lets the dupe filter fix those lines, but [*] is preserved because it prints the whole unedited line, not the buffer itself.
=> https://muckrights-sans-merde.neocities.org