sed - behaviour of holdspace

Go To StackoverFlow.com

1

I have (from the sed website http://sed.sourceforge.net/sed1line.txt) this one-liner:

sed -e '/./{H;$!d;}' -e 'x;/AAA/!d;/BBB/!d;/CCC/!d'

Its purpose is to search a paragraph for either AAA, BBB or CCC.

My understanding of the script:

  • '/./' matches every line wich is not empty
  • '{}' all commands within the brackets handle the matched lines
  • 'H' appends the holdspace with the matched lines
  • '$!d' delete from patternspace everything but the last line
  • 'x' swaps the pattern- and holdspace
  • '/AAA/!d' search for AAA paragraph and print it

What is not clear to me:

  1. In the holdspace should be several separate lines (for each paragraph), why am I able to search the whole paragraph? Are the lines in the holdspace merged to one line?
  2. And how does sed know when one paragraph ends and the other begins in the holdspace?
  3. Why do I have to append '$!d', why is not '$d' sufficient? Why am I not able to omit the '-n' and use '$p' instead of '$!d' in this case?

Thank you very much for every comment!

My test data (match every paragraph with XX in it):

YYaaaa
aaa1
aaa2
aXX3
aaa4

YYbbbb
bbb1
bbb2

YYcccc
ccc1
ccc2
ccc3
cXX4
ccc5

YYdddd
ddd1
dXX2

Following command is used:

sed -ne '/./{H;$!d};x;/XX/p' test2

Versions:

$ sed --version
GNU sed-Version 4.2.1
$ bash --version
GNU bash, Version 4.2.10(1)-release (x86_64-pc-linux-gnu)
2012-04-04 19:39
by Oliver


1

It collects a paragraph as individual lines into the hold space (H), then when you hit an empty line, /./ fails and it falls through to the x which basically zaps the hold space for the next paragraph.

In order to correctly handle the final paragraph, it needs to cope with a paragraph which is not followed by an empty line, therefore it falls through from the last line as if it were followed by an empty line. This is a common idiom for scripts which collect something up through a particular pattern (or, to put it differently, it's a common error for such scripts to fail to handle the last collected data at end of file).

So in other words, if we are looking at a non-empty line, add it to the hold space, and unless it's the last line in the file, delete it and start over from the beginning of the script with the next input line. (Perhaps your understanding of d was not complete? This is what $!d means.)

Otherwise, we have an empty line, or end of file, and the hold space contains zero or more lines of text (one paragraph, possibly empty). Exchange them into the pattern space (the current, empty, line conveniently moves to the hold space) and examine the pattern space. If it fails to match one of our expressions, delete it. Otherwise, the default action is to print the entire pattern space.

2012-04-04 19:45
by tripleee
Thanks for the fast clarifying, tripleee. I missunderstood indeed the meaning of 'd'. So to resume (correct me if I'm wrong), the script puts lines in the holdspace until it hits an empty line or the end of the file and then proceeds with searching the paragraph (and printing, if it hits the pattern in the paragraph). It starts over again until the last line is reached - Oliver 2012-04-04 20:15
That's right. Although the script you presented will require AAA and BBB and CCC to be present (not or) - tripleee 2012-04-04 20:19
Fine, thanks again! Question answered :- - Oliver 2012-04-04 20:39
Ads