Merging Columns from the Same File in Linux

I have a text file like this:

Name2 381 3923 1000
Name2 321 323  9000
Name2 121 212  1000
Name3 321 644  4444
Name3 111 342  4234
Name3 342 323  3232

I want to merge the first three columns only into something like this:

Name2:381:3923 1000
Name2:321:323  9000
Name2:121:212  1000
Name3:321:644  4444
Name3:111:342  4234
Name3:342:323  3232

However, I want to preserve any other column present in the file. I tried with sed and regex, but the problem I have is that I need to preserve other columns, and simply substituting out spaces for colons messes that.

Solution:

Following awk may help you on same.

awk '{sub(/ /,":");sub(/ /,":")} 1'  Input_file

Solution 2nd: sed solution.

sed 's/ /:/1;s/ /:/1'  Input_file
Advertisements

Delete certain columns and add horizontally in AW. So many columns that I cannot type each one

I have been struggling more than a day and I cannot make my script work. Please help.

My txt file extends to 500 columns.

I need to delete columns 5,9,13,21,…, always delete n=4 column.

Then, after removing the columns I mentioned above, I need to add all the columns remaining, BUT NOT taking into account ONLY the column 1.
For this I am using:

awk '{print $1,$2+$3+.........}' >> comb.xvg

The thing is that don’t want to go manually adding until I reach 500.

My final document should have only two columns.

  • The first from the very beginning
  • And the another column that has the sum of all the other ones (please be aware that I am adding horizontally and not vertically).
    The sum is done horizontally from column 2 to the column 500.

Could someone please help me to do this? I have tried different sets using for loop but they fail.

I am new at this and also using stack. Please my apologies if I am not fully clear but I cannot upload pics.

Thanks.

Solution:

awk to the rescue!

this script will sum up the columns 2,3,4, 6,7,8, 10,.. (that is skipping 5,9,…4k+1…)

awk '{sum=0; for(i=2;i<=NF;i++) sum+=(i-1)%4?$i:0; print $1,sum}'

Explanation
We’re summing up the elements in the row. If we were to add them all, sum+=$i would do, however you want to skip the values at indices 2k+1, so we use the ternary operator v=c?a:b, that is if(c) v=a; else v=b. (i-1)%4 is the modulus by 4, will be zero for i=5,9,…,2k+1.

deleting the columns doesn’t seem to be necessary since you’re not printing the resulting panel.

to test

$ seq 20 | xargs -n 10 | awk ...

prints

1 40
11 110

to verify: sum(2+3+…+10) = 54, so after removing 5 and 9, you’ll get 40. For sum(12+13+…+20) it’s 10 more for each element, i.e. 40+7*10=110.

Follow up question: How to add s2=2,6,10…; s3=3,7,11…; s4=4,8,12…

awk '{s2=s3=s4=0; 
      for(i=2;i<=NF;i+=4) 
        {s2+=$i; s3+=$(i+1); s4+=$(i+2)}; 
      print $1, s2, s3, s4}' 

unique values out of two files

i have two separate files, from which i need to make a new one, which has the unique values out of the two files.

Example:
File A:

1234567890123456720100603104500 Random text or data.
2345678901234567820100602104500 [New] Random Text.
3456789012345678920100509213849 Earlier \Date.
4567890123456789020100521195058 & InBetween Date 

File B:

    1234567890123456720100603104500 Random text or data altered.
    2345678901234567820100602104500 [New] Random Text.
    3456789012345678920100509213849 Earlier \Date.
    4567890123456789020100521195058 & InBetween Date 

Output:

    1234567890123456720100603104500 Random text or data.
    1234567890123456720100603104500 Random text or data altered.       
    2345678901234567820100602104500 [New] Random Text.
    3456789012345678920100509213849 Earlier \Date.
    4567890123456789020100521195058 & InBetween Date 

sort -u does the job when it has to do with one file, but what when we have two, three, etc files? I would also appreciate implementation with sed and awk

Solution:

sort accepts multiple files. Simply run sort -u FILE1 FILE2 ....

Remove all one character words in string EXCEPT 'a' 'i' and 'o'

I’m trying to parse a file with the single quotes removed, leaving behind some random ‘s’s and whatnot.

So far I’ve tried:

echo "a b c d e f g h i o omgifack" | grep -o '[^bcdefghjklmnpqrstuvBCDEFGHJKLMNPQRSTUV]\{2,\\}' 
echo "a b c d e f g h i o omgifack" | tr -d '[bcdefghjklmnpqrstuvwxyzBCDEFGHJKLMNPQRSTUVWXYZ]'

The example I based these off of:

echo "a b c d e f g h i o omgifack" | grep -o '[a-z]\{2,\\}'

Desired output:

>a i o omgifack

I can’t seem to figure it out, but I’m sure I’m missing something obvious. Open to solutions using awk, sed, tr, grep… anything that works. Thanks!

Solution:

I’d probably use something like

echo "a b c d e f g h i o omgifack" | grep -wo '[[:alpha:]]\{2,\}\|[AIOaio]'
a
i
o
omgifack

This will isolate all whole words (because of -w) that are either more than two letters long (that’s the [[:alpha:]]\{2,\} part) or one of [AIOaio].

Note that if the text contains umlauts or accented characters (such as ä, é, ß etc.), [[:alpha:]] does include those iff grep is run under a locale that recognizes them as part of its alphabet.

Add Double Quotes Around Output Field

I’m trying to surround awk output fields with double quotes but, keep running into errors. This code is running in a bash script on an Ubuntu system

Errors

awk: cmd. line:1: BEGIN {print "<table id="\"alert\"",">"} ; {  print "<tr><td class="\"site\"",">" $1 "</td><td class="\"fdate\"",">" $2 "</td><td class="\"prcp\"",">" $3 "</td><td class="\"snow\"",">" $4 "</td><td class="\"tmp2m\"",">" $5 "</td><td class="\"dpt2m\"",">" $6 "</td><td class="\"wchill\"",">" $7 "</td><td class="\"wind10m\"",">" $8 "</td></tr>"} ; END { print "</table>"}
awk: cmd. line:1:                          ^ backslash not last character on line
awk: cmd. line:1: BEGIN {print "<table id="\"alert\"",">"} ; {  print "<tr><td class="\"site\"",">" $1 "</td><td class="\"fdate\"",">" $2 "</td><td class="\"prcp\"",">" $3 "</td><td class="\"snow\"",">" $4 "</td><td class="\"tmp2m\"",">" $5 "</td><td class="\"dpt2m\"",">" $6 "</td><td class="\"wchill\"",">" $7 "</td><td class="\"wind10m\"",">" $8 "</td></tr>"} ; END { print "</table>"}
awk: cmd. line:1:                          ^ syntax error

Attempted Code

awk -F, 'BEGIN {print "<table id="\"alert\"",">"} ; {  print "<tr><td class="\"site\"",">" $1 "</td><td class="\"fdate\"",">" $2 "</td><td class="\"prcp\"",">" $3 "</td><td class="\"snow\"",">" $4 "</td><td class="\"tmp2m\"",">" $5 "</td><td class="\"dpt2m\"",">" $6 "</td><td class="\"wchill\"",">" $7 "</td><td class="\"wind10m\"",">" $8 "</td></tr>"} ; END { print "</table>"}' /home/weather/csv > /home/weather/csv.html

Solution:

You’ve got too many quotes going on, and you probably don’t want to separate the individual items to be printed with commas:

$ awk -F, 'BEGIN {print "<table id=\"alert\"" ">"}'
<table id="alert">

Sometimes it helps to set the double quote as a variable to make things more readable:

$ awk -F, -v q='"' 'BEGIN {print "<table id=" q "alert" q ">"}'
<table id="alert">

How can I replace 'bc' tool in my bash script?

I have the following command in my bash script:

printf '\n"runtime": %s' "$(bc -l <<<"($a - $b)")"

I need to run this script on around 100 servers and I have found that on few of them bc is not installed. I am not admin and cannot install bc on missing servers.

In that case, what alternative can i use to perform the same calculation? Please let me know how the new command should look like

Solution:

In case you need a solution which works for floating-point arithmetic you can always fall back to Awk.

awk -v a="$a" -v b="$b" 'BEGIN { printf "\n\"runtime\": %s", a-b }' </dev/null

Putting the code in a BEGIN block and redirecting input from /dev/null is a common workaround for when you want to use Awk but don’t have a file of lines to loop over, which is what it’s really designed to do.

Print whole line with highest value from one column

I have a little issue right now.
I have a file with 4 columns

test0000002,10030010330,c_,218
test0000002,10030010330,d_,202
test0000002,10030010330,b_,193
test0000002,10030010020,c_,178
test0000002,10030010020,b_,170
test0000002,10030010330,a_,166
test0000002,10030010020,a_,151
test0000002,10030010020,d_,150
test0000002,10030070050,c_,119
test0000002,10030070050,b_,99
test0000002,10030070050,d_,79
test0000002,10030070050,a_,56
test0000002,10030010390,c_,55
test0000002,10030010390,b_,44
test0000002,10030010380,d_,41
test0000002,10030010380,a_,37
test0000002,10030010390,d_,35
test0000002,10030010380,c_,33
test0000002,10030010390,a_,31
test0000002,10030010320,c_,30
test0000002,10030010320,b_,27
test0000002,10030010380,b_,26
test0000002,10030010320,a_,23
test0000002,10030010320,d_,22
test0000002,10030010010,a_,6

and I want the highest value from 4th column sorted from 2nd column.

test0000002,10030010330,c_,218 
test0000002,10030010020,c_,178 
test0000002,10030010330,a_,166 
test0000002,10030010020,a_,151 
test0000002,10030070050,c_,119 
test0000002,10030010390,c_,55 
test0000002,10030010380,d_,41 
test0000002,10030010320,c_,30 
test0000002,10030010390,a_,31 
test0000002,10030010380,c_,33 
test0000002,10030010390,d_,35 
test0000002,10030010320,a_,23 
test0000002,10030010380,b_,26 
test0000002,10030010010,a_,6

Solution:

It appears that your file is already sorted in descending order on the 4th column, so you just need to print lines where the 2nd column appears for the first time:

awk -F, '!seen[$2]++' file
test0000002,10030010330,c_,218
test0000002,10030010020,c_,178
test0000002,10030070050,c_,119
test0000002,10030010390,c_,55
test0000002,10030010380,d_,41
test0000002,10030010320,c_,30
test0000002,10030010010,a_,6

If your input file is not sorted on column 4, then

sort -t, -k4nr file | awk -F, '!seen[$2]++'