Linux hot take: bash bashing 

The convention of having the command shell replace "*" with a list of all matching files in the current folder[1] only in is... well.

I understand how it's useful for really basic core file utilities.

For anything that needs to do recursive directory-searches, though, it really gets in the way and raises the bar for what the user has to know in order to make use of CLI in Linux.

I just now had a long conversation with an advanced bash user[2], and apparently there really is no way to get this information without setting an option in bash before running a command (and then presumably unsetting it afterwards, so as not to break other programs).

Just... why.

[1] ...and *only* the current folder... and only including folders that match the same pattern -- like "*.rb" would include a folder named "foldername.rb", which pretty much never happens

[2] Much thanks to sophia kara for hashing through this with me. I was very grumpy about it.

Linux hot take: bash bashing 

@woozle 1. What option is this?

2. Globbing is a convenience, but generally *Not Good Programming Practice*.

3. find | read or find | xargs is probably what you want. More specifically:

while find . <args> | read file do; echo ">>> $file <<<"; <processing on file>; done

I like to echo the name of the file(s) found, first, both as a verification of the find command/results, and as a progress indicator.

Follow

re: Linux hot take: bash bashing 

@dredmorbius

#1: I've documented my findings -- htyp.org/bash/globbing

#2: Hard agree -- especially when there's no way to access the raw information (without making the user jump through extra hoops to provide it).

#3: I'd consider this an "extra hoop".

It seems to me that bash needs to be patched to provide the information in the execution environment. It already provides all kinds of other information of more dubious value, e.g. the format of the command-prompt, so why not this?

· · Web · 3 · 0 · 0

@dredmorbius Also, re echoing found-files-first -- I was also thinking that, as long as we're stuck with this convention, it might be a good idea to echo the arguments received first (horizontally), aka "here's what I got -- is this what you really meant?", then the list of matching files (at least as an option, depending on what we're trying to do).

@woozle Readline insert-competions: M-*

linux.die.net/man/1/bashv

E.g.:

$ rm -rf newdir. #Delete in case it exists
$ mkdir newdir
$ cd newdir
$ touch foo bar baz

then, WITHOUT hitting <enter>:

$ ls *

Now type <alt>-* (that's what M-*) means

bash now displays:

$ ls foo bar baz

That is your glob expansion.

@woozle Bash is (at least) two things:

1. An interactive command environment.

2. A scripting tool.

The *benefit* of combining these features is that _what you use daily to interact with the system_ is *also* what you can use _for basic system automation tasks_.

In fact you can segue from one to the other through "shell one-liners" and the like. As a consequence, bash is the one programming tool I know best, _simply from daily familiarity_.

The combination also forces compromises.

1/

@woozle @woozle And those are well known and many.

The first line of the Bash manpage "BUGS" section acknowledges this: "It's too big and too slow."

The manpage itself is over 110 pages (via 'pr'), which is ... large.

Globbing is not a bash-specific feature but was introduced with earlier shells -- actually originally an external utility for the original Thompson shell:

unix.stackexchange.com/questio

#bash #linux #unix #history #globbing #scripting

2/

@woozle The reasons for globs is that *when used interactively* they are convenient.

When used *programmatically* (as scripts) ... they're convenient but also dangerous.

And you're looking at decades of legacy, so drastic changes are ... problematic. Many old scripts will break. This can mean difficult-to-understand elements, but also means tools remain stable with time.

Another result is that Unix / Linux end up being a mix of technical domains *and* a social lore. Both matter.

3/

[1/2] @dredmorbius

Working hypothesis:

Globbing was created/designed with the idea that there would be (or are) a lot of Really Simple Utility Programs that couldn't afford to be smart enough to do anything but take input from a single file and do something with it. Globbing therefore allows the user to perform those operations on multiple files without having to type a command for each file.

Problems:

  1. globbing does not handle recursion at all. So if you want to perform the operation recursively, some other mechanism has to be employed.
  2. ...and of course it prevents more sophisticated applications from doing their own globbing.
[2/2] @dredmorbius

Solutions

  • Backwards-compatible: provide the raw command-line (up to the first operator -- pipe, <, > maybe others, but basically anything that divides {input to the command} from anything else) as an environment variable.
  • Backwards-breaking-ish: turn off globbing and train users to use external utilities for globbing. (This gives the user much more control over how globbing should be interpreted, allowing for things like folder-recursion, and also makes it clearer wtf is going on.)
    • Optional backwards-compatibility variation: have a (user-editable) list of legacy apps that expect globbing, and turn it back on when running any of those.

@woozle What you describe here is actually historically true. Before globbing was added to the shell, there was a `glob` command which you would feed a glob pattern and it would expand to all the matching stuff using the libc call. So instead of ls * it would be ls `glob *`.They found that too inconvenient for interactive use so it was rolled into the shell.

Of course the tradeoff for this convenience is that if it remained a command you could've definitely added, say `glob -r` to recursively glob and such.

@woozle " it prevents more sophisticated applications from doing their own globbing."

False.

You can escape or quote globbing metacharacters and pass them to other processes as is frequently done with find:

find . -name foo\*
find . -name 'bar*'

@woozle "globbing does not handle recursion..."

Could you give examples?

@woozle You might want to consider what the options of Doing Things Differently might be:

- You could have _no_ globbing. Running quick shell commands interactively would be ... tedious.

- You could put globbing elsewhere -- have individual commands glob by their own logic. DOS variants did this, with the obvious result that ... different commands glob differently. By globbing *in the shell*, expansion occurs *before the command runs.* Commands see the expansion, not the glob.

4/

@dredmorbius
You could put globbing elsewhere -- have individual commands glob by their own logic. DOS variants did this, with the obvious result that ... different commands glob differently.

To my mind, this is correct behavior; individual programs should be able to interpret file-mask characters in ways that are appropriate to context. The system should provide services to reinforce the conventional interpretation, but not to enforce it.

rename *.old *.new could never work in bash (which is part of why the Linux rename command takes a regex as its first argument -- yes, more powerful, but less intuitive) -- bash would interpret the second parameter as {all existing files ending in ".new"} and pass them as arguments, which is worse than useless.

(I started to give an example of non-file-related meanings of wildcard characters, but ran out of space; let me know if that would be useful.)

@woozle @dredmorbius

"more powerful, less intuitive" = guiding principle of all software development

@sydneyfalk @dredmorbius

Except for web and GUI, where the guiding principle is "all features are technical debt", typically followed by "Cut them down. Cut them ALL down."

@sydneyfalk I'll point again to the spectacularly effective UI/UX testing work of the Ellen Degeneres & Associates Usability Labs: invidio.us/watch?v=Gjin8t633pc

Virtually everything is learned.

*Good* powerful systems build on a consistent set of base concepts to deliver power and comprehensibility, preferably with discoverability.

Consistency *over time* is a key aspect of that.

It's harder than it looks but still somewhat attainable.

@woozle

@dredmorbius @woozle

(feel free to drop me, I haven't any useful responses I suspect)

@woozle reinforce the conventional interpretation, but not to enforce it.

rename *.old *.new could never work in bash (..."

mmv(1)

mmv - move/copy/append/link multiple files by wildcard patterns

unix.com/man-page/Linux/1/mmv/

@dredmorbius @woozle The Amiga had globbing and command-line parameter handling in the commands rather than the shell and it was uniform and it was beautiful, because they all used the same functions in dos.library.
@dredmorbius Good question! I think it never came up. AmigaShell scripts have a facility for receiving parameters using the dos.library command-line parser and some rudimentary control flow, but I think at the point where you wanted to do anything beyond simple batch processing you'd switch to ARexx or C.

@clacke
@notclacke @dredmorbius wiki.amigaos.net/wiki/AmigaOS_… doesn't even mention that Shell would have loops, so I don't think you'd be able to do anything useful with an expanded wildcard.

@woozle You might arguably have globs match regex patterns. There are ways to achieve this, but the shell itself doesn't.

(There *are* advanced glob patterns, though, should you care to use them. That's an area of bash I'm still not very familiar with.)

Historically, globbing predates regexes, though.

And you could have globs span directory path boundaries. For various reasons, that isn't done. It strikes me as dangerous, especially if you end up with a */../* type pattern.

5/

@woozle So then you have the case where you want to run a process on a bunch of stuff.

There are two general approaches to this: write a script which explicitly lists the arguments, or generate the commands dynamically. Each can be done and has merits.

The advantage of either is far better control over the supply of arguments, incremental development (testing to see you're going to get the list you want), often, the ability to restart / re-do processing.

And the biggie for me:

6/

@woozle For a very minor bit of up-front setup, you gain the ability to repeat a process tens, hundreds, thousands, millions, ... of times. Monotonously, until completion.

With highly predictable, testable, behaviour.

That's been especially useful to me over the years.

Keep in mind that globs expand *onto the commandline buffer*, which is itself a limited resource (how limited depends on the shell and OS).

Loops ... not so much.

Reading my G+ takeout is a case in point.

11/

@woozle That's 11,000+ json files sitting on a MacOS box. On that maching, a "grep *.json" fails with a too many arguments error. So instead:

find . -maxdepth 1 -type f -name \*.json -print0 | xargs -0 grep <pattern>'

There are numerous variations that can be made to that, including batching out requests. MacOS can't handle 11,000 files at a time, but I _could_ grep 100 easily, so using xargs(1)'s "-n / --max-args" argument reduces the total number of processes.

12/

@woozle All of which I write as you seem to be criticising a thing without understanding how it came to be, without suggesting a specific alternative approach, and without considering the possible consequences of doing things differently.

Not that "different" is wrong. But sometimes long-standing methods widely adopted and used ... have valid foundations.

15/end/

@dredmorbius Part of my criticism is intended as an "if there's a good reason for this, then I'd like to know what it is, but I kind of suspect there isn't".

In my researches, I've discovered that glob() is actually a system call that any application could invoke in a single line of code. The helpfulness of doing it automatically and without any option of retrieving the original data therefore seems... questionable.

@woozle Coordination problem.

EVERY. SINGLE. BINARY. AND. EXECUTABLE. WOULD. HAVE. TO. DO. THIS. ALWAYS. CORRECTLY. AND. CONSISTENTLY.

Or you build it into the shell.

ONCE.

@dredmorbius The level of complexity and knowledge involved in calling glob() is about the same as that involved in correctly interpreting what is currently passed.

The amount of arbitraryness/counterintuitivity is, I would posit, slightly less.

However, I see your allcaps and will be happy to accept a backwards-compatible revision to bash instead of doing away with the existing standard altogether.

I can be...merciful.

@dredmorbius

P.S. DOS never seemed to have a problem with coordination... even among 3rd-party utilities.

@hirojin @dredmorbius
you seen https://mywiki.wooledge.org/glob btw?
adds link to small collection here.

@woozle @dredmorbius "Greg's wiki" is gigantic and includes pretty much all the compatibility notes for the different shells / implementations, and it's sourced from and used my #bash on freenode

which used to be a relatively friendly place (much better to get useful info out of than any of the Linux channels)

@hirojin @dredmorbius

I should probably add it to my interwiki links on htyp.org as a standard reference.

Sign in to participate in the conversation
Toot.Cat

The social network of the future: No ads, no corporate surveillance, ethical design, and decentralization! Own your data with Mastodon!