Follow

Linux hot take: bash bashing 

The convention of having the command shell replace "*" with a list of all matching files in the current folder[1] only in is... well.

I understand how it's useful for really basic core file utilities.

For anything that needs to do recursive directory-searches, though, it really gets in the way and raises the bar for what the user has to know in order to make use of CLI in Linux.

I just now had a long conversation with an advanced bash user[2], and apparently there really is no way to get this information without setting an option in bash before running a command (and then presumably unsetting it afterwards, so as not to break other programs).

Just... why.

[1] ...and *only* the current folder... and only including folders that match the same pattern -- like "*.rb" would include a folder named "foldername.rb", which pretty much never happens

[2] Much thanks to sophia kara for hashing through this with me. I was very grumpy about it.

· · Web · 2 · 1 · 0

Linux hot take: bash bashing 

@woozle 1. What option is this?

2. Globbing is a convenience, but generally *Not Good Programming Practice*.

3. find | read or find | xargs is probably what you want. More specifically:

while find . <args> | read file do; echo ">>> $file <<<"; <processing on file>; done

I like to echo the name of the file(s) found, first, both as a verification of the find command/results, and as a progress indicator.

re: Linux hot take: bash bashing 

@dredmorbius

#1: I've documented my findings -- htyp.org/bash/globbing

#2: Hard agree -- especially when there's no way to access the raw information (without making the user jump through extra hoops to provide it).

#3: I'd consider this an "extra hoop".

It seems to me that bash needs to be patched to provide the information in the execution environment. It already provides all kinds of other information of more dubious value, e.g. the format of the command-prompt, so why not this?

@dredmorbius Also, re echoing found-files-first -- I was also thinking that, as long as we're stuck with this convention, it might be a good idea to echo the arguments received first (horizontally), aka "here's what I got -- is this what you really meant?", then the list of matching files (at least as an option, depending on what we're trying to do).

@woozle Readline insert-competions: M-*

linux.die.net/man/1/bashv

E.g.:

$ rm -rf newdir. #Delete in case it exists
$ mkdir newdir
$ cd newdir
$ touch foo bar baz

then, WITHOUT hitting <enter>:

$ ls *

Now type <alt>-* (that's what M-*) means

bash now displays:

$ ls foo bar baz

That is your glob expansion.

@woozle Bash is (at least) two things:

1. An interactive command environment.

2. A scripting tool.

The *benefit* of combining these features is that _what you use daily to interact with the system_ is *also* what you can use _for basic system automation tasks_.

In fact you can segue from one to the other through "shell one-liners" and the like. As a consequence, bash is the one programming tool I know best, _simply from daily familiarity_.

The combination also forces compromises.

1/

@woozle @woozle And those are well known and many.

The first line of the Bash manpage "BUGS" section acknowledges this: "It's too big and too slow."

The manpage itself is over 110 pages (via 'pr'), which is ... large.

Globbing is not a bash-specific feature but was introduced with earlier shells -- actually originally an external utility for the original Thompson shell:

unix.stackexchange.com/questio

#bash #linux #unix #history #globbing #scripting

2/

@woozle The reasons for globs is that *when used interactively* they are convenient.

When used *programmatically* (as scripts) ... they're convenient but also dangerous.

And you're looking at decades of legacy, so drastic changes are ... problematic. Many old scripts will break. This can mean difficult-to-understand elements, but also means tools remain stable with time.

Another result is that Unix / Linux end up being a mix of technical domains *and* a social lore. Both matter.

3/

[1/2] @dredmorbius

Working hypothesis:

Globbing was created/designed with the idea that there would be (or are) a lot of Really Simple Utility Programs that couldn't afford to be smart enough to do anything but take input from a single file and do something with it. Globbing therefore allows the user to perform those operations on multiple files without having to type a command for each file.

Problems:

  1. globbing does not handle recursion at all. So if you want to perform the operation recursively, some other mechanism has to be employed.
  2. ...and of course it prevents more sophisticated applications from doing their own globbing.
[2/2] @dredmorbius

Solutions

  • Backwards-compatible: provide the raw command-line (up to the first operator -- pipe, <, > maybe others, but basically anything that divides {input to the command} from anything else) as an environment variable.
  • Backwards-breaking-ish: turn off globbing and train users to use external utilities for globbing. (This gives the user much more control over how globbing should be interpreted, allowing for things like folder-recursion, and also makes it clearer wtf is going on.)
    • Optional backwards-compatibility variation: have a (user-editable) list of legacy apps that expect globbing, and turn it back on when running any of those.

@woozle What you describe here is actually historically true. Before globbing was added to the shell, there was a `glob` command which you would feed a glob pattern and it would expand to all the matching stuff using the libc call. So instead of ls * it would be ls `glob *`.They found that too inconvenient for interactive use so it was rolled into the shell.

Of course the tradeoff for this convenience is that if it remained a command you could've definitely added, say `glob -r` to recursively glob and such.

@woozle " it prevents more sophisticated applications from doing their own globbing."

False.

You can escape or quote globbing metacharacters and pass them to other processes as is frequently done with find:

find . -name foo\*
find . -name 'bar*'

@woozle "globbing does not handle recursion..."

Could you give examples?

@woozle You might want to consider what the options of Doing Things Differently might be:

- You could have _no_ globbing. Running quick shell commands interactively would be ... tedious.

- You could put globbing elsewhere -- have individual commands glob by their own logic. DOS variants did this, with the obvious result that ... different commands glob differently. By globbing *in the shell*, expansion occurs *before the command runs.* Commands see the expansion, not the glob.

4/

@dredmorbius
You could put globbing elsewhere -- have individual commands glob by their own logic. DOS variants did this, with the obvious result that ... different commands glob differently.

To my mind, this is correct behavior; individual programs should be able to interpret file-mask characters in ways that are appropriate to context. The system should provide services to reinforce the conventional interpretation, but not to enforce it.

rename *.old *.new could never work in bash (which is part of why the Linux rename command takes a regex as its first argument -- yes, more powerful, but less intuitive) -- bash would interpret the second parameter as {all existing files ending in ".new"} and pass them as arguments, which is worse than useless.

(I started to give an example of non-file-related meanings of wildcard characters, but ran out of space; let me know if that would be useful.)

@woozle @dredmorbius

"more powerful, less intuitive" = guiding principle of all software development

@sydneyfalk @dredmorbius

Except for web and GUI, where the guiding principle is "all features are technical debt", typically followed by "Cut them down. Cut them ALL down."

@sydneyfalk I'll point again to the spectacularly effective UI/UX testing work of the Ellen Degeneres & Associates Usability Labs: invidio.us/watch?v=Gjin8t633pc

Virtually everything is learned.

*Good* powerful systems build on a consistent set of base concepts to deliver power and comprehensibility, preferably with discoverability.

Consistency *over time* is a key aspect of that.

It's harder than it looks but still somewhat attainable.

@woozle

@dredmorbius @woozle

(feel free to drop me, I haven't any useful responses I suspect)

@woozle reinforce the conventional interpretation, but not to enforce it.

rename *.old *.new could never work in bash (..."

mmv(1)

mmv - move/copy/append/link multiple files by wildcard patterns

unix.com/man-page/Linux/1/mmv/

@dredmorbius @woozle The Amiga had globbing and command-line parameter handling in the commands rather than the shell and it was uniform and it was beautiful, because they all used the same functions in dos.library.
@dredmorbius Good question! I think it never came up. AmigaShell scripts have a facility for receiving parameters using the dos.library command-line parser and some rudimentary control flow, but I think at the point where you wanted to do anything beyond simple batch processing you'd switch to ARexx or C.

@clacke
@notclacke @dredmorbius wiki.amigaos.net/wiki/AmigaOS_… doesn't even mention that Shell would have loops, so I don't think you'd be able to do anything useful with an expanded wildcard.

@woozle You might arguably have globs match regex patterns. There are ways to achieve this, but the shell itself doesn't.

(There *are* advanced glob patterns, though, should you care to use them. That's an area of bash I'm still not very familiar with.)

Historically, globbing predates regexes, though.

And you could have globs span directory path boundaries. For various reasons, that isn't done. It strikes me as dangerous, especially if you end up with a */../* type pattern.

5/

@woozle So then you have the case where you want to run a process on a bunch of stuff.

There are two general approaches to this: write a script which explicitly lists the arguments, or generate the commands dynamically. Each can be done and has merits.

The advantage of either is far better control over the supply of arguments, incremental development (testing to see you're going to get the list you want), often, the ability to restart / re-do processing.

And the biggie for me:

6/

@woozle For a very minor bit of up-front setup, you gain the ability to repeat a process tens, hundreds, thousands, millions, ... of times. Monotonously, until completion.

With highly predictable, testable, behaviour.

That's been especially useful to me over the years.

Keep in mind that globs expand *onto the commandline buffer*, which is itself a limited resource (how limited depends on the shell and OS).

Loops ... not so much.

Reading my G+ takeout is a case in point.

11/

@woozle That's 11,000+ json files sitting on a MacOS box. On that maching, a "grep *.json" fails with a too many arguments error. So instead:

find . -maxdepth 1 -type f -name \*.json -print0 | xargs -0 grep <pattern>'

There are numerous variations that can be made to that, including batching out requests. MacOS can't handle 11,000 files at a time, but I _could_ grep 100 easily, so using xargs(1)'s "-n / --max-args" argument reduces the total number of processes.

12/

@woozle All of which I write as you seem to be criticising a thing without understanding how it came to be, without suggesting a specific alternative approach, and without considering the possible consequences of doing things differently.

Not that "different" is wrong. But sometimes long-standing methods widely adopted and used ... have valid foundations.

15/end/

@dredmorbius Part of my criticism is intended as an "if there's a good reason for this, then I'd like to know what it is, but I kind of suspect there isn't".

In my researches, I've discovered that glob() is actually a system call that any application could invoke in a single line of code. The helpfulness of doing it automatically and without any option of retrieving the original data therefore seems... questionable.

@woozle Coordination problem.

EVERY. SINGLE. BINARY. AND. EXECUTABLE. WOULD. HAVE. TO. DO. THIS. ALWAYS. CORRECTLY. AND. CONSISTENTLY.

Or you build it into the shell.

ONCE.

@dredmorbius The level of complexity and knowledge involved in calling glob() is about the same as that involved in correctly interpreting what is currently passed.

The amount of arbitraryness/counterintuitivity is, I would posit, slightly less.

However, I see your allcaps and will be happy to accept a backwards-compatible revision to bash instead of doing away with the existing standard altogether.

I can be...merciful.

@dredmorbius

P.S. DOS never seemed to have a problem with coordination... even among 3rd-party utilities.

@hirojin @dredmorbius
you seen https://mywiki.wooledge.org/glob btw?
adds link to small collection here.

@woozle @dredmorbius "Greg's wiki" is gigantic and includes pretty much all the compatibility notes for the different shells / implementations, and it's sourced from and used my #bash on freenode

which used to be a relatively friendly place (much better to get useful info out of than any of the Linux channels)

@hirojin @dredmorbius

I should probably add it to my interwiki links on htyp.org as a standard reference.

Linux hot take: bash bashing 

You're not afraid of filenames containing \newlines. Because you don't handle other people's filenames? For fun, create a file with a space in its name in your distro's package cache dir. Last time I tried that it completely stalled apt-get. sh is broken in many ways :^)

@dredmorbius @woozle

Linux hot take: bash bashing 

@RefurioAnachro I'm aware they can exist. For my purposes, that's not _generally_ a problem.

Though "find -print0 | xargs -0" generally addresses _that_ particular case.

(Wrapping the whole thing into a script is ... more work.)

I'm not sure precisely where that problem lies -- it's somewhere between the filesystem, perverse behaviour, and shell field/record delimiting conventions.

@woozle

Linux hot take: bash bashing 

I'm not sure what to think of newlines in filenames. Or control characters. I like spaces in filenames, and I hate that android prohibits colons. I remember consciously deciding not to care about dos' limitations, then came android :-/

As nice as it can be, Unicode brings whole new classes of problems, normalization being one that can bit you even if (or especially when) you os has provisions for it.

@dredmorbius @woozle

Linux hot take: bash bashing 

But none of that is really perverse. This is: Did you know that not all linux syscalls check against '/' in filenames?

Anyways, I the classical hurdles mostly in conjunction with field splitting.

@dredmorbius @woozle

Linux hot take: bash bashing 

@RefurioAnachro @dredmorbius @woozle The whole problem is that UNIX (and even Plan 9) refuses to define any data structure encodings.
Could just use linear TSV and that would solve most problems. There is no reason to make intermediate data look like natural text. (and TSV is readable enough anyways)

Linux hot take: bash bashing 

Good point, @grainloom! Yes, I believe the os should provide standards for data exchange of all sorts. I really like how json simplifies things. I mean, at least for the js ecosystem. It may end up adding yet another variant and layer of complexity in other contexts.

@dredmorbius @woozle

Linux hot take: bash bashing 

I can haz 'description' field for files? What about user defined header fields? Of course, these would need to survive passing files between systems...

@grainloom @dredmorbius @woozle

@RefurioAnachro Files-with-metadata is actually a major component of a project I'm looking at.

Implementing that as a metadata-aware filesystem offers certain capabilities.

Though it also makes that portability / export issue a bit of a pain.

@grainloom @woozle

@dredmorbius @RefurioAnachro @grainloom

I very much want a means of entering per-file custom metadata. My current design for this involves an app, which could solve the portability problem by exporting data on a per-volume or per-folder basis.

@woozle @dredmorbius @RefurioAnachro not sure if that needs to be a whole new thing. could probably make something good enough with userspace file systems. much like how tag based file systems can still be represented as a directory tree.

@grainloom @dredmorbius @RefurioAnachro

I'm not sure I get what you have in mind.

I think the *main* metadata I want are hierarchical topic tags (the only piece that needs to be stored with the file would be a numeric ID) and a few timestamps (not necessarily the same as the file-creation or file-edited timestamps)... and of course a textual description. ...and there should be a facility for recursively searching files in a folder for metadata that matches a given criterion.

I'm not expecting anything much to happen with this unless I do it, given the current state of GUI file-searching tools.

@woozle @dredmorbius @RefurioAnachro grep-like search or indexed search?
what i was thinking of is just transforming the files into directories or something. it's mostly backwards compatible too. you can use it with tar, zip, etc. merging and diffing remains easily available.
i'll try to elaborate when i have more time.

@woozle @dredmorbius @RefurioAnachro
the only thing i can think of that would be broken by that representation is patterns like: for file in files_in_directory(d) do stuff(f) end.
but that's easy to work around with a wrapper that translates the directories back to files. (so "snow.mp3" was a directory with, idk, id3 tags in it, plus a file named "data", or something, but now it's a file again)

@woozle @dredmorbius @RefurioAnachro this is all pretty easy with Plan 9's bind(1) and related tools, and shouldn't be too difficult on Linux either, with FUSE and stuff.

@grainloom @dredmorbius @RefurioAnachro

Hmm... like, for myfile.jpg, you could have a .myfile.jpg/ folder (or some similar naming-scheme) with all the attributes as individual files underneath it...?

My programmer brain goes "agh, inefficient!" but I don't actually know how inefficient it would be. It would probably be a drop in the bucket.

Next step: need a GUI for managing all those meta-subfiles.

@woozle @dredmorbius @RefurioAnachro that's just a representation, just like how /dev is not really a file system.
the underlying data structure could be anything.

@woozle @dredmorbius @RefurioAnachro (i mean, /dev is a file system, but like, it's not stored anywhere. yknow what i mean.)

@grainloom @dredmorbius @RefurioAnachro

Ah, ok -- so this requires support within the OS or filesystem.

I'm looking for something that can work with existing OSs/filesystems/drives -- though if such a thing appeared in an OS, I'd still be interested in trying it out.

@woozle @dredmorbius @RefurioAnachro well, kinda, but on anything that supports FUSE (so, most relevant UNIX clones, AFAIK) this should work and would be able to interop with everything that uses files. i'm mostly sure that it also wouldn't require root to mount it. the underlying drivers don't really matter to it either.

@woozle @dredmorbius @RefurioAnachro if the target system supports stuff like mounthing SFTP shares, then it can do this too.

@grainloom @woozle @dredmorbius @RefurioAnachro git-annex supports both tags and key-value pairs. There's also xattrs, which are supported in various forms by Linux, the BSDs, Mac OS, and Windows. rsync supports them but I'm not sure if any other archivers besides tar do. Adding them to something like 7zip seems like it'd be easier than implementing a filesystem

Sign in to participate in the conversation
Toot.Cat

The social network of the future: No ads, no corporate surveillance, ethical design, and decentralization! Own your data with Mastodon!