Follow

Linux hot take: bash bashing 

Linux hot take: bash bashing 

re: Linux hot take: bash bashing 

@dredmorbius Also, re echoing found-files-first -- I was also thinking that, as long as we're stuck with this convention, it might be a good idea to echo the arguments received first (horizontally), aka "here's what I got -- is this what you really meant?", then the list of matching files (at least as an option, depending on what we're trying to do).

@woozle Readline insert-competions: M-*

linux.die.net/man/1/bashv

E.g.:

$ rm -rf newdir. #Delete in case it exists
$ mkdir newdir
$ cd newdir
$ touch foo bar baz

then, WITHOUT hitting <enter>:

$ ls *

Now type <alt>-* (that's what M-*) means

bash now displays:

$ ls foo bar baz

That is your glob expansion.

@woozle Bash is (at least) two things:

1. An interactive command environment.

2. A scripting tool.

The *benefit* of combining these features is that _what you use daily to interact with the system_ is *also* what you can use _for basic system automation tasks_.

In fact you can segue from one to the other through "shell one-liners" and the like. As a consequence, bash is the one programming tool I know best, _simply from daily familiarity_.

The combination also forces compromises.

1/

@woozle @woozle And those are well known and many.

The first line of the Bash manpage "BUGS" section acknowledges this: "It's too big and too slow."

The manpage itself is over 110 pages (via 'pr'), which is ... large.

Globbing is not a bash-specific feature but was introduced with earlier shells -- actually originally an external utility for the original Thompson shell:

unix.stackexchange.com/questio

#bash #linux #unix #history #globbing #scripting

2/

@woozle The reasons for globs is that *when used interactively* they are convenient.

When used *programmatically* (as scripts) ... they're convenient but also dangerous.

And you're looking at decades of legacy, so drastic changes are ... problematic. Many old scripts will break. This can mean difficult-to-understand elements, but also means tools remain stable with time.

Another result is that Unix / Linux end up being a mix of technical domains *and* a social lore. Both matter.

3/

[1/2] @dredmorbius

Working hypothesis:

Globbing was created/designed with the idea that there would be (or are) a lot of Really Simple Utility Programs that couldn't afford to be smart enough to do anything but take input from a single file and do something with it. Globbing therefore allows the user to perform those operations on multiple files without having to type a command for each file.

Problems:

  1. globbing does not handle recursion at all. So if you want to perform the operation recursively, some other mechanism has to be employed.
  2. ...and of course it prevents more sophisticated applications from doing their own globbing.
[2/2] @dredmorbius

Solutions

  • Backwards-compatible: provide the raw command-line (up to the first operator -- pipe, <, > maybe others, but basically anything that divides {input to the command} from anything else) as an environment variable.
  • Backwards-breaking-ish: turn off globbing and train users to use external utilities for globbing. (This gives the user much more control over how globbing should be interpreted, allowing for things like folder-recursion, and also makes it clearer wtf is going on.)
    • Optional backwards-compatibility variation: have a (user-editable) list of legacy apps that expect globbing, and turn it back on when running any of those.

@woozle What you describe here is actually historically true. Before globbing was added to the shell, there was a `glob` command which you would feed a glob pattern and it would expand to all the matching stuff using the libc call. So instead of ls * it would be ls `glob *`.They found that too inconvenient for interactive use so it was rolled into the shell.

Of course the tradeoff for this convenience is that if it remained a command you could've definitely added, say `glob -r` to recursively glob and such.

@woozle " it prevents more sophisticated applications from doing their own globbing."

False.

You can escape or quote globbing metacharacters and pass them to other processes as is frequently done with find:

find . -name foo\*
find . -name 'bar*'

@woozle "globbing does not handle recursion..."

Could you give examples?

@woozle You might want to consider what the options of Doing Things Differently might be:

- You could have _no_ globbing. Running quick shell commands interactively would be ... tedious.

- You could put globbing elsewhere -- have individual commands glob by their own logic. DOS variants did this, with the obvious result that ... different commands glob differently. By globbing *in the shell*, expansion occurs *before the command runs.* Commands see the expansion, not the glob.

4/

@dredmorbius
You could put globbing elsewhere -- have individual commands glob by their own logic. DOS variants did this, with the obvious result that ... different commands glob differently.

To my mind, this is correct behavior; individual programs should be able to interpret file-mask characters in ways that are appropriate to context. The system should provide services to reinforce the conventional interpretation, but not to enforce it.

rename *.old *.new could never work in bash (which is part of why the Linux rename command takes a regex as its first argument -- yes, more powerful, but less intuitive) -- bash would interpret the second parameter as {all existing files ending in ".new"} and pass them as arguments, which is worse than useless.

(I started to give an example of non-file-related meanings of wildcard characters, but ran out of space; let me know if that would be useful.)

@woozle @dredmorbius

"more powerful, less intuitive" = guiding principle of all software development

@sydneyfalk @dredmorbius

Except for web and GUI, where the guiding principle is "all features are technical debt", typically followed by "Cut them down. Cut them ALL down."

@sydneyfalk I'll point again to the spectacularly effective UI/UX testing work of the Ellen Degeneres & Associates Usability Labs: invidio.us/watch?v=Gjin8t633pc

Virtually everything is learned.

*Good* powerful systems build on a consistent set of base concepts to deliver power and comprehensibility, preferably with discoverability.

Consistency *over time* is a key aspect of that.

It's harder than it looks but still somewhat attainable.

@woozle

@dredmorbius @woozle

(feel free to drop me, I haven't any useful responses I suspect)

@woozle reinforce the conventional interpretation, but not to enforce it.

rename *.old *.new could never work in bash (..."

mmv(1)

mmv - move/copy/append/link multiple files by wildcard patterns

unix.com/man-page/Linux/1/mmv/

@dredmorbius @woozle The Amiga had globbing and command-line parameter handling in the commands rather than the shell and it was uniform and it was beautiful, because they all used the same functions in dos.library.

@woozle You might arguably have globs match regex patterns. There are ways to achieve this, but the shell itself doesn't.

(There *are* advanced glob patterns, though, should you care to use them. That's an area of bash I'm still not very familiar with.)

Historically, globbing predates regexes, though.

And you could have globs span directory path boundaries. For various reasons, that isn't done. It strikes me as dangerous, especially if you end up with a */../* type pattern.

5/

@woozle So then you have the case where you want to run a process on a bunch of stuff.

There are two general approaches to this: write a script which explicitly lists the arguments, or generate the commands dynamically. Each can be done and has merits.

The advantage of either is far better control over the supply of arguments, incremental development (testing to see you're going to get the list you want), often, the ability to restart / re-do processing.

And the biggie for me:

6/

@woozle For a very minor bit of up-front setup, you gain the ability to repeat a process tens, hundreds, thousands, millions, ... of times. Monotonously, until completion.

With highly predictable, testable, behaviour.

That's been especially useful to me over the years.

Keep in mind that globs expand *onto the commandline buffer*, which is itself a limited resource (how limited depends on the shell and OS).

Loops ... not so much.

Reading my G+ takeout is a case in point.

11/

@woozle That's 11,000+ json files sitting on a MacOS box. On that maching, a "grep *.json" fails with a too many arguments error. So instead:

find . -maxdepth 1 -type f -name \*.json -print0 | xargs -0 grep <pattern>'

There are numerous variations that can be made to that, including batching out requests. MacOS can't handle 11,000 files at a time, but I _could_ grep 100 easily, so using xargs(1)'s "-n / --max-args" argument reduces the total number of processes.

12/

@woozle All of which I write as you seem to be criticising a thing without understanding how it came to be, without suggesting a specific alternative approach, and without considering the possible consequences of doing things differently.

Not that "different" is wrong. But sometimes long-standing methods widely adopted and used ... have valid foundations.

15/end/

@dredmorbius Part of my criticism is intended as an "if there's a good reason for this, then I'd like to know what it is, but I kind of suspect there isn't".

In my researches, I've discovered that glob() is actually a system call that any application could invoke in a single line of code. The helpfulness of doing it automatically and without any option of retrieving the original data therefore seems... questionable.

@woozle Coordination problem.

EVERY. SINGLE. BINARY. AND. EXECUTABLE. WOULD. HAVE. TO. DO. THIS. ALWAYS. CORRECTLY. AND. CONSISTENTLY.

Or you build it into the shell.

ONCE.

@dredmorbius The level of complexity and knowledge involved in calling glob() is about the same as that involved in correctly interpreting what is currently passed.

The amount of arbitraryness/counterintuitivity is, I would posit, slightly less.

However, I see your allcaps and will be happy to accept a backwards-compatible revision to bash instead of doing away with the existing standard altogether.

I can be...merciful.

@dredmorbius

P.S. DOS never seemed to have a problem with coordination... even among 3rd-party utilities.

@hirojin @dredmorbius
you seen https://mywiki.wooledge.org/glob btw?
adds link to small collection here.

@woozle @dredmorbius "Greg's wiki" is gigantic and includes pretty much all the compatibility notes for the different shells / implementations, and it's sourced from and used my #bash on freenode

which used to be a relatively friendly place (much better to get useful info out of than any of the Linux channels)

@hirojin @dredmorbius

I should probably add it to my interwiki links on htyp.org as a standard reference.

Linux hot take: bash bashing 

Linux hot take: bash bashing 

Linux hot take: bash bashing 

Linux hot take: bash bashing 

Linux hot take: bash bashing 

Linux hot take: bash bashing 

Linux hot take: bash bashing 

@RefurioAnachro Files-with-metadata is actually a major component of a project I'm looking at.

Implementing that as a metadata-aware filesystem offers certain capabilities.

Though it also makes that portability / export issue a bit of a pain.

@grainloom @woozle

@dredmorbius @RefurioAnachro @grainloom

I very much want a means of entering per-file custom metadata. My current design for this involves an app, which could solve the portability problem by exporting data on a per-volume or per-folder basis.

@woozle @dredmorbius @RefurioAnachro not sure if that needs to be a whole new thing. could probably make something good enough with userspace file systems. much like how tag based file systems can still be represented as a directory tree.

@grainloom @dredmorbius @RefurioAnachro

I'm not sure I get what you have in mind.

I think the *main* metadata I want are hierarchical topic tags (the only piece that needs to be stored with the file would be a numeric ID) and a few timestamps (not necessarily the same as the file-creation or file-edited timestamps)... and of course a textual description. ...and there should be a facility for recursively searching files in a folder for metadata that matches a given criterion.

I'm not expecting anything much to happen with this unless I do it, given the current state of GUI file-searching tools.

@woozle @dredmorbius @RefurioAnachro grep-like search or indexed search?
what i was thinking of is just transforming the files into directories or something. it's mostly backwards compatible too. you can use it with tar, zip, etc. merging and diffing remains easily available.
i'll try to elaborate when i have more time.

@woozle @dredmorbius @RefurioAnachro
the only thing i can think of that would be broken by that representation is patterns like: for file in files_in_directory(d) do stuff(f) end.
but that's easy to work around with a wrapper that translates the directories back to files. (so "snow.mp3" was a directory with, idk, id3 tags in it, plus a file named "data", or something, but now it's a file again)

@woozle @dredmorbius @RefurioAnachro this is all pretty easy with Plan 9's bind(1) and related tools, and shouldn't be too difficult on Linux either, with FUSE and stuff.

@grainloom @dredmorbius @RefurioAnachro

Hmm... like, for myfile.jpg, you could have a .myfile.jpg/ folder (or some similar naming-scheme) with all the attributes as individual files underneath it...?

My programmer brain goes "agh, inefficient!" but I don't actually know how inefficient it would be. It would probably be a drop in the bucket.

Next step: need a GUI for managing all those meta-subfiles.

@woozle @dredmorbius @RefurioAnachro that's just a representation, just like how /dev is not really a file system.
the underlying data structure could be anything.

@woozle @dredmorbius @RefurioAnachro (i mean, /dev is a file system, but like, it's not stored anywhere. yknow what i mean.)

@grainloom @dredmorbius @RefurioAnachro

Ah, ok -- so this requires support within the OS or filesystem.

I'm looking for something that can work with existing OSs/filesystems/drives -- though if such a thing appeared in an OS, I'd still be interested in trying it out.

@woozle @dredmorbius @RefurioAnachro well, kinda, but on anything that supports FUSE (so, most relevant UNIX clones, AFAIK) this should work and would be able to interop with everything that uses files. i'm mostly sure that it also wouldn't require root to mount it. the underlying drivers don't really matter to it either.

@woozle @dredmorbius @RefurioAnachro if the target system supports stuff like mounthing SFTP shares, then it can do this too.

@grainloom @woozle @dredmorbius @RefurioAnachro git-annex supports both tags and key-value pairs. There's also xattrs, which are supported in various forms by Linux, the BSDs, Mac OS, and Windows. rsync supports them but I'm not sure if any other archivers besides tar do. Adding them to something like 7zip seems like it'd be easier than implementing a filesystem

Linux hot take: bash bashing 

Linux hot take: bash bashing 

Sign in to participate in the conversation
Toot.Cat

The social network of the future: No ads, no corporate surveillance, ethical design, and decentralization! Own your data with Mastodon!