[PLUG] question on linux tool to clean URLs

David Barr dafydd at dafydd.com
Wed Feb 6 04:50:32 UTC 2019


Hey, Randall,

To be pedantic, the tracking tags and such are all stuff that appear
after the question mark delimiting character in the HTTP PUT request,
right? `https://foo/bar/baz?evil_tag=evil`

The trick then, is to select only the lines containing question marks,
and then delete from the question mark to the end of the line. Try this:

```
sed -e '/\?/ s/\?.*$//' <file>
```

Pedantry again: That's "select lines containing a (backslash escaped)
question mark," followed by "substitute all characters from and
including that (backslash escaped) question mark to the end of the line
($) with nothing."

I haven't tested this on a file, so I deserve whatever mockery I get if
I missed something.

Cheers!
David

On 2/5/19 2:48 PM, logical american wrote:
> Hi:
>
> Is there a linux tool which cleans up the URLs in a text file (I
> believe Western unicode encoding) so that all the tracking tags,
> fbclid, etc are removed and the pure URL is left in the text?
>
> In one recent email I received, there were 28 govdelivery.com tags and
> others embedded inside the URLs, and I don't wish the posted material
> to provide an easy access for the website to be tracked.
>
> Thanks
>
> Randall
>




More information about the PLUG mailing list