Learning

zip in Python

zip takes n number of iterables and returns list of tuples. ith element of the tuple is created using the ith element from each of the iterables.

list_a = [1, 2, 3, 4, 5]
list_b = [‘a’, ‘b’, ‘c’, ‘d’, ‘e’]

zipped_list = zip(list_a, list_b)

print zipped_list # [(1, ‘a’), (2, ‘b’), (3, ‘c’), (4, ‘d’), (5, ‘e’)]
zipped_list is a list of tuples where ith tuple i.e (1, ‘a’) is created using the ith element of list_a i.e 1 and ith element of list_b i.e ‘a’

If the length of the iterables are not equal, zip creates the list of tuples of length equal to the smallest iterable.

list_a = [1, 2, 3]
list_b = [‘a’, ‘b’, ‘c’, ‘d’, ‘e’]

zipped_list = zip(list_a, list_b)

print zipped_list # [(1, ‘a’), (2, ‘b’), (3, ‘c’)]
zip truncates the extra elements of list_b in the output.

zip always creates the tuple in the order of iterables from left to right. list_a will always be before list_b in the output tuples

Unzip a list of tuples

To unzip a list of tuples we zip(*listP_of_tuples). Unzip creates separate list.

Example:

zipper_list = [(1, ‘a’), (2, ‘b’), (3, ‘c’)]

list_a, list_b = zip(*zipper_list)
print list_a # (1, 2, 3)
print list_b # (‘a’, ‘b’, ‘c’)

Zip in Python3

In Python3, zip methods returns a zip object instead of a list. This zip object is an iterator. Iterators are lazily evaluated.

Lazy evaluation, or call-by-need is an evaluation strategy which delays the evaluation of an expression until its value is needed and which also avoids repeated evaluations (Wikipedia definition).

Iterators returns only element at a time. len function cannot be used with iterators. We can loop over the zip object or the iterator to get the actual list

Consider the below example:

list_a = [1, 2, 3]
list_b = [4, 5, 6]

zipped = zip(a, b) # Output: Zip Object. <zip at 0x4c10a30>
len(zipped) # TypeError: object of type ‘zip’ has no len()
zipped[0] # TypeError: ‘zip’ object is not subscriptable
list_c = list(zipped) #Output: [(1, 4), (2, 5), (3, 6)]
list_d = list(zipped) # Output []… Output is empty list becuase by the above statement zip got exhausted.
x,y=zip(*list_c)
print (x,y) # Output : (1, 2, 3) (4, 5, 6)

In the above example, zipped is a zip object which is an iterator. Using len function or accessing it’s element by index gives type error.

We convert the zip object to a list by list(zipped). After this we can use all the methods of list.

Iterators can be evaluated only time. After that they get exhausted, hence list_d output is empty list.

How to run find -exec?

Find the files in the current directory that contain the text “chrome”.

find . -exec grep chrome {} \;

find . -exec grep chrome {} +

find will execute grep and will substitute {} with the filename(s) found. The difference between ; and + is that with ; a single grep command for each file is executed whereas with + as many files as possible are given as parameters to grep at once.

Here’s another example of how to use find/exec…

find  . -name "*.py" -print -exec fgrep hello {} \;

This searches recursively for all .py files, and for each file print(s) out the filename and fgrep’s for ‘hello’ on that (for each) file. Output looks like (just ran one today):

./r1.py
./cgi-bin/tst1.py
print "hello"
./app/__init__.py
./app/views.py
./app/flask1.py
./run.py
./tst2.py
print "hello again"

find accepts multiple -exec portions to the command. For example:

find . -name "*.txt" -exec echo {} \; -exec grep banana {} \;

Note that in this case the second command will only run if the first one returns successfully, as mentioned by @Caleb. If you want both commands to run regardless of their success or failure, you could use this construct:

find . -name "*.txt" \( -exec echo {} \; -o -exec true \; \) -exec grep banana {}

What is the meaning of ##* in shell script?

${0##*/}
${0%/*}

Those are not regular expressions, they are examples of Bash’s parameter expansion: the substitution of a variable or a special parameter by its value. The Wooledge Wiki has a good explanation.

Basically, in the example you have, ${0##*/} translates as:

for the variable $0, and the pattern ‘/’, the two hashes mean from the beginning of the parameter, delete the longest (or greedy) match—up to and including the pattern.

So, where $0 is the name of a file, eg., $HOME/documents/doc.txt, then the parameter would be expanded as: doc.txt

Similarly, for ${0%/*}, the pattern / is matched against the end of parameter (the %), with the shortest or non-greedy match deleted – which in the example above would give you $HOME/documents.

The glob (*) indicates that everything up to and including the pattern will be deleted. Hence, for the beginning of the parameter, #, it is on the left and the end, %, working the other way from the right

Another example

text="//ABC/REC/TLC/SC-prod/1f9/20/00000000957481f9-08d035805a5c94bf"

echo ${text##*/}

${var#Pattern}

Remove from $var the shortest part of $Pattern that matches the front end of $var.

${var##Pattern}

Remove from $var the longest part of $Pattern that matches the front end of $var.

So ${text##*/} remove from text everything before the last /. It’s useful to get the basename of directories for instance.

(There is also ${var%Pattern}/${var%%Pattern} to remove pattern that matches the back end of $var)

Example:

eravipo@elxabfk9m32:~/test/autoCompletion$ echo “${PWD}”
/home/eravipo/test/autoCompletion
eravipo@elxabfk9m32:~/test/autoCompletion$ echo “${PWD##*/}”
autoCompletion # Matches longest for / till end.
eravipo@elxabfk9m32:~/test/autoCompletion$ echo “${PWD#*/}”
home/eravipo/test/autoCompletion # Matches shortest for /, first /
eravipo@elxabfk9m32:~/test/autoCompletion$

Purpose of the : (colon) – Bash builtin?

Historically, Bourne shells didn’t have true and false as built-in commands. true was instead simply aliased to :, and false to something like let 0.

: is slightly better than true for portability to ancient Bourne-derived shells.

if command; then :; else ...; fi

Since if requires a non-empty then clause and comments don’t count as non-empty, : serves as a no-op.

Nowadays (that is: in a modern context) you can usually use either : or true. Both are specified by POSIX, and some find true easier to read. However there is one interesting difference: : is a so-called POSIX special built-in, whereas true is a regular built-in.

Special built-ins are required to be built into the shell; Regular built-ins are only “typically” built in, but it isn’t strictly guaranteed. There usually shouldn’t be a regular program named : with the function of true in PATH of most systems.

Regular built-ins must be compatible with exec – demonstrated here using Bash:

$ ( exec : )
-bash: exec: :: not found
$ ( exec true )
$

POSIX also explicitly notes that : may be faster than true, though this is of course an implementation-specific detail.

Use it to easily enable/disable variable commands:

#!/bin/bash
if [[ "$VERBOSE" == "" || "$VERBOSE" == "0" ]]; then
    vecho=":"     # no "verbose echo"
else
    vecho=echo    # enable "verbose echo"
fi

$vecho "Verbose echo is ON"

Thus

$ ./vecho
$ VERBOSE=1 ./vecho
Verbose echo is ON

This makes for a clean script. This cannot be done with ‘#’.

You could use it in conjunction with backticks (``) to execute a command without displaying its output, like this:

: `some_command`

Of course you could just do some_command > /dev/null, but the :-version is somewhat shorter.

That being said I wouldn’t recommend actually doing that as it would just confuse people. It just came to mind as a possible use-case.

Functions

name() { command; …. command; }
Atleast one whitespace character must seperate the { from the first command and that a semicolon must seperate the last command closing from the closing brace if they occur on the same line

Functions exits only in the shell they are defined and can’t be passed down to subshells.
Since function is executed in the current shell, changes to the current directory or variables remain after the function has completed execution.

You can group the definations in a file and then execute the file in the current shell by
. myfuncs
This will have the effect of causing any functions defined inside myfuncs to be read in and defined to the current shell.

Once function is defined, its execution is faster than an equivalent shell program file. Thats because the shell wont have to Search the disk for the program , open the file and read its contents into memory.

Another advantage is to group all the related shell programs in a single file

Removing the function declaration
$unset function
The same command can be used to unset the variable definition.

The return command

To terminate the execution of the function, we can use return command
return n
The value n is used as the return status of the function. If omitted then the status of the last command is returned. This is also what gets returned if we do not mention return at all in the function

More on I/O

Standard constructs >, < and >> for input, output and output with append redirection.

command 2>file # redirects standard error to the file

Redirecting the standard output for a command to standard error by writing
command >& 2
>& notation specifies output redirection to a file associated with a file descriptor. File descriptor 0 for standard input, 1 for standard output and descriptor 2 is standard error.

Few examples below
Standard output and standard error redirected to a file
command > foo 2>>foo
Can also be written as ‘command > foo 2>&1 ‘ to acheive the same effect.
Shell evaluates redirection from left to right on the command line, so the last example cannot be written as
‘command 2>&1 >foo’ where error is redirect to output (standard console by default) and then output to foo

We can dynamically redirect standard input or output in a program using exec command.
exec < datafile
exec > datafile
exec 2> /tmp/errors

<&- and >&-

The characters >&- have the effect of closing the standard output. If preceded by a file descriptor then the associated file is closed instead. So writing ‘ls >&-‘ causes the output from ls to go nowhere since standard output is closed by shell before ls is executed.

In-line Input redirection

If the << characters follow a command in the format
command << word
then the shell will use the lines that follow as the standard input for command, up until a line that contains just word is found. Here’s the small example at the terminal

wc -l << ENDOFDATA
heres a line
and another
and yet another line
ENDOFDATA
3
Here shell fed everyline typed into the standard input until ENDOFDATA is encountered.

The shell performs parameter substitution for the redirected input data executes back-quoted commands and recognizes the backslash character. However any special characters like *, | and ” are ignored. If you have dollar signs, back quotes or back slashes in these lines that you dont want to be interpreted by the shell, then we can precede them with a backslash character.
$ cat << FOOBAR
>$HOME
>****
> \$foobar
>`date`
FOOBAR

output:
/home/steve
******
$foobar
Mon Aug 5 18:34:12 EDT 2016

Alternatively if you want shell to leave the input lines untouche then you can precede the word that follows the << with backslash
$ cat <<\FOOBAR
>$HOME
>****
> \$foobar
>`date`
FOOBAR

output:
$HOME
****
\$foobar
`date`

Shell Archives

One of the best uses of the shell in-line input redirection feature is for creating shell archive files.
With this technique one or more files can be put into a single file and sent across.
Eg
$cat shar
for file
do
echo “echo Extracting $file”
echo “cat >$file <<\ENDOFDATA”
cat $file
echo “ENDOFDATA”
done

eval, wait , trap commands

The eval command

eval command-line
where command-line is a normal command line that you would type at the terminal.
When you have eval infront of it, the net effect is that the shell scans the command line twice before executing it.

Consider the below example with out use of eval
$ pipe=”|”
$ ls $pipe wc -l
| not found
wc not found
-l not found
$
The errors come from ls.
The shell takes care of pipes and I/O substitutions before variable substitution, so it never recognizes the pipe symbol inside pipe. The result is that three arguments |, wc -l are passed to ls as arguments.
Putting eval in front of the command , gives the desired results
$ eval ls $pipe wc -l
First shell scans the command and replaces with pipe value and then eval causes it to rescan to execute the command.

The eval command is frequently used in shells programs that build up command lines inside one or more variables.

$ cat last
eval echo \$$#
$ last one two three four
four
$ last * // gets the last file
zoo_report

The first time shell scans ‘echo \$$#’, the backslash tell it to ignore the $ that immediately follows. After that it encounters the special parameter $#, so it substitutes the value on the command line.

The only problem with this approach is that just first nine positional paramters can only be accessed. $10 will not work.

===================================================

wait command

If you submit a command line to the background for execution, then the command line runs in a subshell that is independent of current shell. At times , you may want to wait for the child command to finish execution before proceeding with current process.
In such cases wait command will be useful.

wait process-id
where process-id is the id of the process you want to wait for which was sent to background if omitted then shell waits for all the child process to complete execution.

The $! variable :
The shell stores the process id of the last command executed in the background inside this special variable.
So wait $! , will wait for the last program sent to the background.

prog1 &
pid1=$!

prog2 &
pid2=$!

wait $pid1 # wait to finish prog1

=================================================================

The trap command

When you hit the DELETE or BREAK key at the terminal during execution of a shell, terminal sends signals to the executing program. The program can specify the actions that should be taken upon receipt of the signal. This is done by the trap command as below
trap commands signals
Numbers are assigned to different signals. (0 for Exit, 1 for Hangup, 2 for Interupt , DELETE key, 15 for kill default)

Below example show how you can remove some files and then exit if someone tries to abort the program from the terminal by hitting DELETE key
trap “rm $WORKDIR/work1$$ $WORKDIR/dataout$$; exit” 2

You can also specify multiple signals as below. For hung up and delete key
trap “rm $WORKDIR/work1$$ $WORKDIR/dataout$$; exit” 1 2

NOTE : Shell scans the command line at the time the trap command is executed and also again once the listed signals are received. So in the last example the value of WORKDIR and $$ will be substituted at the time of the trap command is executed. If you wanted the substitution to occur at the time that either signal 1 or 2 were received (for example WORKDIR may not have defined yet) then you can put the commands inside the single quotes
trap ‘rm $WORKDIR/work1$$ $WORKDIR/dataout$$; exit’ 1 2

trap with no arguments, will display of any traps that you have changed

Ignoring signals
trap “” 2
Signal is ignored when its received as we have null action.

Resetting traps
If you omit the first agrument then it will reset to the default action upon receipt of the signal
trap 1 2

Wildcards and Character Class

Summary of wildcards and their meanings

Wildcard

Meaning

Matches any characters

Matches any single character

[characters]

Matches any character that is a member of the set characters. The set of characters may also be expressed as a POSIX character class such as one of the following:

POSIX Character Classes
[:alnum:]	Alphanumeric characters
[:alpha:]	Alphabetic characters
[:digit:]	Numerals
[:upper:]	Uppercase alphabetic characters
[:lower:]	Lowercase alphabetic characters

[!characters]

Matches any character that is not a member of the set characters

**Examples of wildcard matching**
Pattern	Matches
*	All filenames
g*	All filenames that begin with the character “g”
b*.txt	All filenames that begin with the character “b” and end with the characters “.txt”
Data???	Any filename that begins with the characters “Data” followed by exactly 3 more characters
[abc]*	Any filename that begins with “a” or “b” or “c” followed by any other characters
[[:upper:]]*	Any filename that begins with an uppercase letter. This is an example of a character class.
BACKUP.[[:digit:]][[:digit:]]	Another example of character classes. This pattern matches any filename that begins with the characters “BACKUP.” followed by exactly two numerals.
*[![:lower:]]	Any filename that does not end with a lowercase letter.

You can use wildcards with any command that accepts filename arguments.

unset command

Used to remove the definition of a variable from the environment. To do so, type unset followed by the variable name
$x=100
$echo $x
100
$unset x
$echo $x

$
Variables IFS, PATH, PS1 and PS2 cannot be unset.
unset can also be used to remove the definitions of the functions.

‘set’ command & IFS variable

shell’s set command is a dual-purpose commands.
1) Used to set various shell options
2) Used to reassign the positional parameters $1, $2,…

The -x option

set -x : Turns on the trace mode in the shell
set +x : Turns off the trace mode

Trace option is not passed down to subshells. But you can trace a subshells execution either by running the shell with the -x option followed by the name of the program as in ‘sh -x program’ OR
you can insert a set -x command inside the file itself.

set with NO arguments
set is a shell builtin that displays all shell variables, non only the environment ones, and also shell functions which is what you are seeing at the end of the list.
Variables are displayed with a syntax that allow them to be set when the lines are executed or sourced.

From bash manual page:
If no options or arguments are supplied, set displays the names and values of all shell variables and functions, sorted according to the current locale, in a format that may be reused as input for setting or resetting the currently-set variables.
On different shells, the behavior is not necessarily the same, for example ksh set doesn’t displays shell functions.

Using set to reassign positional parameters
If words are given as arguments to set on the command linke then the positional parameters $1$2,. will be assigned to those words.
So ‘set a b c’ assigns a to $1, b to $2 and c to $3. $# also gets set to 3.
set one two three four
echo $1:$2:$3:$4 –> prints : one:two:three:four

So after executing set, everything seems to work consistently : $#, $* and the for loop with out a list
Eg : $ set one two three
$ for arg; do echo $arg; done
one
two
three
$

The – – Option

cat words
# counts words on a line
read line
set $line
echo $#
$words Running it
Here’s the line for you to count
7

$words
-1 + 5 = 4
words: -1 bad options
After the line was read and assgined to line, the command set $ line was executed. After the shell did its substitution the comand looked like this. set -1 + 5 = 4
When set executed it thought – as an option thus the error message.

Another problem with words is if you give a line consisting entirely of whitespaces or null.

To protect from these problems, we can use — option to set. This tell set not to interpret the subsequent arguments on the command line as options. It also prevents from displaying all of your variables if no other arguments follow or when you typed a null line.

With the addition of while loop and expr, the words program can be easily modified to count the number of words on standard input.
$ cat words
count=0
while read line
do
set — $line
count=`expr $count + 1`
done
echo $count

After each line is read, the set command is executed to take advantage of the fact that $# will be assigned the number of words on the line. The option — is supplied to set just in case any of the lines read begin with -, or consists of white-spaces.

set — $line

set *
echo $# # Counts the number of files in the current directory

IFS – Internal Field Separator

The shell uses this value when parsing the input.
To determine the acutal characters stored in this variable, pipe the output from echo into the od(octal dump) commnad with the -b (byte display) option

$ echo “$IFS” | od -b
0000000 040 011 012 012
0000004
$

We can change the IFS to any character(s) of interest. To change IFS to only newline, open quote followed immediately by the RETURN key , followed by the closed quote
IFS=” # Set it to a just newline
”
This way, it preserves the leading whitespaces which are stripped off by the shell in general because of whitespace delimiter.
$read line
Here’s a line
$echo $line
Here’s a line # Leading whitespaces preserved

$line=”Hello:World:Master”
$IFS=:
set $line
echo $#
3
$ for field; do echo $field; done
Hello
World
Master

Changing the IFS is often done in conjuction with execution of the ‘set’ command.
This technique is a powerful one. It is very fast compared to alternate approach of echo the value of $ line into the tr command, where all colons could have been translated into newlines.