[PLUG] Identifying white space in text files [RESOLVED]

Ben Koenig techkoenig at protonmail.com
Tue Dec 24 16:58:37 UTC 2024


On Tuesday, December 24th, 2024 at 8:44 AM, Richard Owlett <rowlett at access.net> wrote:

> On 12/24/24 10:29 AM, Ben Koenig wrote:
> 
> > On Tuesday, December 24th, 2024 at 7:04 AM, Richard Owlett rowlett at access.net wrote:
> > 
> > > On 12/24/24 8:43 AM, Rich Shepard wrote:
> > > 
> > > > On Tue, 24 Dec 2024, Rich Shepard wrote:
> > > > 
> > > > > Converting between spaces and tabs is easily done with sed, and that's
> > > > > what
> > > > > my web searches show. But, I don't recall a tool that will tell me
> > > > > whether
> > > > > the white spaces in a text file are spaces or a tab, and that's not
> > > > > showing
> > > > > up in my web search. How's it done?
> > > > 
> > > > Fugeddaboutit, gawk treats all white spaces as a single space so it don't
> > > > matter.
> > > > 
> > > > I have a PDF file with three columns: First, Last, and Company. When I use
> > > > pdftotext I end up with a single column: all first names, a space, all last
> > > > names, a space, all company names rather than rows with three fields. I've
> > > > no idea what software produced the PDF but now I need to figure out how to
> > > > convert the columns to rows. I'm sure emacs' rectangle commands will do
> > > > this
> > > > so that's what I'll use until I get it right.
> > > > 
> > > > Rich
> > > 
> > > Try looking at the file with GHEX - GNOME Hex editor for files.
> > > HTH
> > 
> > IIRC the regex pattern for whitespace is \s. This matches all whitespace characters, tabs or otherwise.
> 
> 
> So? ?? ;}
> GHEX has two panes. One with characters. One with the actual hex value
> of each location in the file. IIUC OP wants to be able to distinguish
> multiple space characters from a single tab character.
> 
> 
> > -Ben


I'll let the OP tell me if that's not a workable solution, thanks. 

But if you want to be snarky, I'll point out that there are 2 threads on this subject. one, where he says he wants to distinguish between tabs and spaces. The second, he marks the subject as [RESOLVED] and states the following:

<quote>
Fugeddaboutit, gawk treats all white spaces as a single space so it don't
matter.
<\quote>

Hmmmm.....Maybe I'm reading too much into it, but it sounds like the difference between tabs and spaces was irrelevant from the beginning. If this is truly the case, then he can use his original sed solution, with \s for more complete matching of spaces. I can only make assumptions based on the information provided. In this case, the OP told us that he doesn't actually need to distinguish between tabs and spaces. His words, not mine.

Case closed, or would you like to spar some more?
-Ben


More information about the PLUG mailing list