Run script command on parallel

i’ve bash script which I need to run on it two command in parallel

For example I’m executing a command of npm install which takes some time (20 -50 secs)

and I run it on two different folders in sequence first npm install on books folder and the second
is for orders folder, is there a way to run both in parallel in shell script ?

For example assume the script is like following:

#!/usr/bin/env bash

   dir=$(pwd)

  cd $tmpDir/books/  

  npm install

  grunt

  npm prune production 
  cd $tmpDir/orders/

  npm install

  grunt

 npm prune production 

Solution:

You could use & to run the process in the background, for example:

#!/bin/sh

cd $HOME/project/books/
npm install &

cd $HOME/project/orders/
npm install &

# if want to wait for the processes to finish
wait

To run and wait for nested/multiple processes you could use a subshell () for example:

#!/bin/sh

(sleep 10 && echo 10 && sleep 1 && echo 1) &

cd $HOME/project/books/
(npm install && grunt && npm prune production ) &

cd $HOME/project/orders/
(npm install && grunt && npm prune production ) &

# waiting ...
wait

In this case, notice the that the commands are within () and using && that means that only the right side will be evaluated if the left size succeeds (exit 0) so for the example:

(sleep 10 && echo 10 && sleep 1 && echo 1) &
  • It creates a subshell putting things between ()
  • runs sleep 10 and if succeeds && then runs echo 10, if succeeds && then run sleep 1 and if succeeds && then runs echo 1
  • run all this in the background by ending the command with &
Advertisements

Resolve variable from config-file based on output

I have a shell script that consist of two files, one bash-file (main.sh) and one file holding all my config-variables(vars.config).

vars.config

domains=("something.com" "else.something.com")

something_com_key="key-to-something"
else_something_com_key="key-to-something else"

In my code i want to loop through the domains array and get the key for the domain.

#!/usr/bin/env sh
source ./vars.config
key="_key"
for i in ${domains[@]}; 
do
    base="$(echo $i | tr . _)" # this swaps out . to _ to match the vars
    let farmid=$base$key 
    echo $farmid
done

So when i run it i get an error message

./main.sh: line 13: let: key-to-something: syntax error: operand
expected (error token is “key-to-something”)

So it actually swaps it out, but i cant save it to a variable.

Solution:

You can expand a variable to the value of its value using ${!var_name}, for example in your code you can do:

key="_key"
for i in ${domains[@]};
do
    base="$(echo $i | tr . _)" # this swaps out . to _ to match the vars
    farmid=$base$key
    farmvalue=${!farmid}
    echo $farmvalue
done

Getting specific Nth words from variable

I have this script

#!/bin/bash

tmpvar="$*"
doit () {
    echo " ${tmpvar[1]} will be installed "
    apt-get install ${tmpvar[2*]}
    echo " ${tmpvar[1]} was installed "
}
doit

Which works under the command ./file.sh word1 word2 word3 word4
The point is to get the first word for the ‘echos’ and the rest for the installation command.

Example: ./file.sh App app app-gtk
Therefor displaying the first word in both ‘echos’ and getting the rest for the apt command.
But this is not working.

Solution:

You may use shift here:

doit () {
   arg1="$1"  # take first word into a var arg1
   shift      # remove first word from $@

   echo "$arg1 will be installed..."
   # attempt to call apt-get
   if apt-get install "$@"; then
      echo "$arg1 was installed"
   else
      echo "$arg1 couldn't be installed">&2
}

and call this function as:

doit "$@"

How to kill a range of consecutive processes in Linux?

I am working on a multi-user Ubuntu server and need to run multiprocessing python scripts. Sometimes I need to kill some of those processes. For example,

$ ps -eo pid,comm,cmd,start,etime | grep .py
3457 python          python process_to_kill.py - 20:57:28    01:44:09
3458 python          python process_to_kill.py - 20:57:28    01:44:09
3459 python          python process_to_kill.py - 20:57:28    01:44:09
3460 python          python process_to_kill.py - 20:57:28    01:44:09
3461 python          python process_to_kill.py - 20:57:28    01:44:09
3462 python          python process_to_kill.py - 20:57:28    01:44:09
3463 python          python process_to_kill.py - 20:57:28    01:44:09
3464 python          python process_to_kill.py - 20:57:28    01:44:09
13465 python         python process_not_to_kill.py - 08:57:28    13:44:09
13466 python         python process_not_to_kill.py - 08:57:28    13:44:09

processes 3457-3464 are to be killed. So far I can only do

$ kill 3457 3458 3459 3460 3461 3462 3463 3464

Is there a command like $ kill 3457-3464 so I can specify the starting and ending processes and kill all of those within the range?

Solution:

Use the shell’s brace expansion syntax:

$ kill {3457..3464}

which expands to:

$ kill 3457 3458 3459 3460 3461 3462 3463 3464

Or you can kill processes by name with pkill. For example:

$ pkill -f process_to_kill.py

Does all linux users are present on /etc/passwd?

There is one user “user1” which I cant find in /etc/passwd but I can execute cmds like

$touch abc
$chown user1 abc
$su user1

These command runs fine, but if I try to chown to some really nonexistent user these chown and su commands fail

I was wondering where is this user1 coming from?

Solution:

While logged in with user1 (after su user1) execute:

getent passwd $USER

This fetches user passwd entries across different databases. All users are not necessarily system users – they can come from LDAP etc.
Check docs on getenv.

Also check your nsswitch.conf to see all sources used to obtain name-service information.

./path of file to execute not executing

I am trying to execute matlab from desktop path of file is

      /usr/local/MATLAB/R2017b/bin/matlab

executing ./usr/local/MATLAB/R2017b/bin/matlab

also tried .//usr/local/MATLAB/R2017b/bin/matlab

and ./ /usr/local/MATLAB/R2017b/bin/matlab
how it works?

Solution:

Just run /usr/local/MATLAB/R2017b/bin/matlab to access the binary via the full path otherwise you will run try to run it via the relative path: <CURRENT DIR>/usr/local/MATLAB/R2017b/bin/matlab if you put a . before.

You can also change the add /usr/local/MATLAB/R2017b/bin/ to your path variable in order to be able to execute the command matlab without having to specify its whole path each time.

Also change your ~/.bashrc file and add PATH=$PATH:/usr/local/MATLAB/R2017b/bin to be able to keep those change after a reboot and just run matlab

How can I do full outer join on multiple csv files (Linux or Scala)?

I have 620 csv files and they have different columns and data. For example:

//file1.csv
word, count1
w1, 100
w2, 200

//file2.csv
word, count2
w1, 12
w5, 22

//Similarly fileN.csv
word, countN
w7, 17
w2, 28

My expected output

//result.csv
word, count1, count2, countN
w1,    100,     12,    null
w2,    200 ,   null,    28  
w5,    null,    22,    null
w7,    null,   null,    17

I was able to do it in Scala for two files like this where df1 is file1.csv and df2 is file2.csv:

df1.join(df2, Seq("word"),"fullouter").show()

I need any solution, either in Scala or Linux command to do this.

Solution:

Using Spark you can read all your files as a Dataframe and store it in a List[Dataframe]. After that you can apply reduce on that List for joining all the dataframes together. Following is the code using three Dataframes but you can extend and use same for all your files.

//create all three dummy DFs
val df1 = sc.parallelize(Seq(("w1", 100), ("w2", 200))).toDF("word", "count1")
val df2 = sc.parallelize(Seq(("w1", 12), ("w5", 22))).toDF("word", "count2")
val df3 = sc.parallelize(Seq(("w7", 17), ("w2", 28))).toDF("word", "count3")

//store all DFs in a list
val dfList: List[DataFrame] = List(df1, df2, df3)

//apply reduce function to join them together
val joinedDF = dfList.reduce((a, b) => a.join(b, Seq("word"), "fullouter"))

joinedDF.show()
//output
//+----+------+------+------+
//|word|count1|count2|count3|
//+----+------+------+------+
//|  w1|   100|    12|  null|
//|  w2|   200|  null|    28|
//|  w5|  null|    22|  null|
//|  w7|  null|  null|    17|
//+----+------+------+------+

//To write to CSV file
joinedDF.write
  .option("header", "true")
  .csv("PATH_OF_CSV")

This is how you can read all your files and store it in a List

//declare a ListBuffer to store all DFs
import scala.collection.mutable.ListBuffer
val dfList = ListBuffer[DataFrame]()

(1 to 620).foreach(x=>{
  val df: DataFrame = sqlContext.read
    .format("com.databricks.spark.csv")
    .option("header", "true")
    .load(BASE_PATH + s"file$x.csv")

  dfList += df
})