test_1(Note the empty line)
YY Female
XX Male
test_2(Note the empty line)
XX Engineer
YY Lawyer
test_join:
#! /bin/bash
# We have to sort 2 data files with the first field(by default)
sort ./test_1 > test_1.tmp
sort ./test_2 > test_2.tmp
# output the content of sorted result
echo =====================
cat ./test_1.tmp
echo =====================
cat ./test_2.tmp
echo =====================
# join will assume "fields" two files joined are already sorted, and its
# algorithm are taking advantage of this. If fields are not sorted, it will
# complain and the result is messed
join ./test_1.tmp ./test_2.tmp
# remove the temporary files
rm test_1.tmp
rm test_2.tmp
terminal:
sort is trying to sort per first field(by default), for the empty line, first field is null which is less than anything else. So for the sorted result, empty line always goes first.
join is trying to merge two sorted result into one place, by default, it is trying to merge based on field 1.
aubinxia@aubinxia-VirtualBox:~/Desktop/xxdev$ ./test_join
=====================
XX Male
YY Female
=====================
XX Engineer
YY Lawyer
=====================
XX Male Engineer
YY Female Lawyer
2. join: use different delimiter
test_1:
:YY: Female
:XX: Male
test_2:
:XX: Engineer
:YY: Lawyer
test_join:
#! /bin/bash
# sort data files firstly
sort ./test_1 > test_1.tmp
sort ./test_2 > test_2.tmp
# output the content of sorted result
echo =====================
cat ./test_1.tmp
echo =====================
cat ./test_2.tmp
echo =====================
# join is using ':' as the delimiter, then for fields "XX" and "YY" in data
# files, they are "2nd" field, 1st fields is empty for lines containing these
# fields.
# -1 2 means for first file, we join per 2nd field
# -2 2 means for second file, we join per 2nd field too
# -o 1.2 means explicitly output first file's 2nd field
# -o 1.3 means explicitly output first file's 3rd field
# by explicitly specifying which field we want to output, we can avoid join
# from only outputting common field once. The developer controls now.
join -t ':' -1 2 -2 2 -o 1.2 -o 1.3 -o 2.2 -o 2.3 ./test_1.tmp ./test_2.tmp
# remove the temporary files
rm test_1.tmp
rm test_2.tmp
terminal:
Since we are specifying the output fields in script, for the empty line, all fields getting outputted are now, so the result is ":::", which means 4 null fields.
aubinxia@aubinxia-VirtualBox:~/Desktop/xxdev$ ./test_join
=====================
:XX: Male
:YY: Female
=====================
:XX: Engineer
:YY: Lawyer
=====================
:::
XX: Male:XX: Engineer
YY: Female:YY: Lawyer
No comments:
Post a Comment