[PLUG] question on linux tool to clean URLs
David Barr
dafydd at dafydd.com
Wed Feb 6 04:50:32 UTC 2019
Hey, Randall,
To be pedantic, the tracking tags and such are all stuff that appear
after the question mark delimiting character in the HTTP PUT request,
right? `https://foo/bar/baz?evil_tag=evil`
The trick then, is to select only the lines containing question marks,
and then delete from the question mark to the end of the line. Try this:
```
sed -e '/\?/ s/\?.*$//' <file>
```
Pedantry again: That's "select lines containing a (backslash escaped)
question mark," followed by "substitute all characters from and
including that (backslash escaped) question mark to the end of the line
($) with nothing."
I haven't tested this on a file, so I deserve whatever mockery I get if
I missed something.
Cheers!
David
On 2/5/19 2:48 PM, logical american wrote:
> Hi:
>
> Is there a linux tool which cleans up the URLs in a text file (I
> believe Western unicode encoding) so that all the tracking tags,
> fbclid, etc are removed and the pure URL is left in the text?
>
> In one recent email I received, there were 28 govdelivery.com tags and
> others embedded inside the URLs, and I don't wish the posted material
> to provide an easy access for the website to be tracked.
>
> Thanks
>
> Randall
>
More information about the PLUG
mailing list