sort list by keywords

Go To StackoverFlow.com

0

I have a list of keywords in a keywords.txt file. I have another file list.txt with the keywords in the beginning of each line. How can I sort the lines in list.txt to the same order they appear in keywords.txt?

keywords.txt

house
car
tree
woods
mailbox

list.txt

car bbdfbdfbdfbdf
tree gdfgvsgsgs
mailbox gsgsdfsdf
woods gsgsdgsdgsdgsdgsddsd
house gsdgfsdgsdgsdgsdg

final result in list.txt

house gsdgfsdgsdgsdgsdg    
car bbdfbdfbdfbdf
tree gdfgvsgsgs
woods gsgsdgsdgsdgsdgsddsd
mailbox gsgsdfsdf
2012-04-04 16:57
by Blainer
Are we talking Windows batch file scripting here? Or what scripting languages are okay (Python, Perl, Ruby, etc.) - kiswa 2012-04-04 17:05
I dont even know how to go about this. Windows batch would be fine if possible - Blainer 2012-04-04 17:06


1

Here is an improved and simplified version of kiswa's answer.

@echo off
(
  for /f "usebackq" %%A in ("keywords.txt") do findstr /bl "%%A" list.txt
)>sorted.txt
REM move /y sorted.txt list.txt

The FINDSTR command only matches lines that begin with the keyword, and it forces the search to be a literal search. (FINDSTR could give the wrong result if the /L option is not specified and the keyword happens to contain a regex meta-character.)

The code to replace the original file with the sorted file is commented out. Simply remove the REM statement to activate the MOVE statement.

As with kiswa's answer, the above will only output lines from list.txt that match a keyword in keywords.txt.

You might have lines in list.txt that do not match a keyword. If you want to preserve those lines at the bottom of the sorted output, then use:

@echo off
(
  for /f "usebackq" %%A in ("keywords.txt") do findstr /bli "%%A" "list.txt"
  findstr /vblig:"keywords.txt" "list.txt"
)>sorted.txt
::move /y sorted.txt list.txt

Note that the /I (case insensitive) option must be used because of a FINDSTR bug dealing with multiple literal search strings of different lengths. The /I option avoids the bug, but it would cause problems if your keywords are case sensitive. See What are the undocumented features and limitations of the Windows FINDSTR command?.

You might have keywords that are missing from list.txt. If you want to include those keywords without any data following them, then use:

@echo off
(
  for /f "usebackq" %%A in ("keywords.txt") do findstr /bl "%%A" "list.txt" || echo %%A
)>sorted.txt
::move /y sorted.txt list.txt

Obviously you can combine both techniques to make sure you preserve the union of both files:

@echo off
(
  for /f "usebackq" %%A in ("keywords.txt") do findstr /bli "%%A" "list.txt" || echo %%A
  findstr /vblig:"keywords.txt" "list.txt"
)>sorted.txt
::move /y sorted.txt list.txt

All of the above assume the keywords do not contain space or tab characters. If they do, then the FOR /F options and FINDSTR options must change:

@echo off
(
  for /f "usebackq delims=" %%A in ("keywords.txt") do findstr /bic:"%%A" "list.txt" || echo %%A
  findstr /vblig:"keywords.txt" "list.txt"
)>sorted.txt
::move /y sorted.txt list.txt
2012-04-04 18:59
by dbenham


1

$ join -1 2 -2 1 <(cat -n keywords.txt | sort -k2) <(sort list.txt) | sort -k2n | cut -d ' ' -f 1,3-
house gsdgfsdgsdgsdgsdg
car bbdfbdfbdfbdf
tree gdfgvsgsgs
woods gsgsdgsdgsdgsdgsddsd
mailbox gsgsdfsdf
2012-04-04 17:07
by kev
is this a windows batch - Blainer 2012-04-04 17:10
it's a bash comman - kev 2012-04-04 17:10
ok awesome. i will test this ou - Blainer 2012-04-04 17:12
You should test it in linux. (You didn't mention Windows - kev 2012-04-04 17:13
I have a ubuntu virtual machine. Im about to test this - Blainer 2012-04-04 17:14
I made a .sh and a .bsh with your script and i get this: root@ubuntu:~/Desktop/sort# bash script.sh script.sh: line 1: $: command not found cut: invalid byte or field list Try cut --help' for more information. root@ubuntu:~/Desktop/sort# bash script.bsh cut: script.bsh: line 1: $: command not found invalid byte or field list Trycut --help' for more information - Blainer 2012-04-04 17:24
You don't need $(it's a bash prompt - kev 2012-04-05 00:39


0

Here's a Windows batch file. It's probably not the most efficient, but I think it's nicely readable.

@echo off

for /F "tokens=*" %%A in (keywords.txt) do (
    for /F "tokens=*" %%B in ('findstr /i /C:"%%A" list.txt') do (
        echo %%B >> sorted.txt
    )
)

del list.txt

rename sorted.txt list.txt

This creates a sorted file, then removes the list file and renames the sorted file.

2012-04-04 17:47
by kiswa
this deletes some of my lines in the final sorted file. I start with 46 lines and end with 38. I can send you my list and keywords files if need be - Blainer 2012-04-04 18:06
It will only work if the lines all match a keyword in the sort. If you want unsorted items to stay in the list, that's a different thing than you asked originally. Also, empty lines will be removed - kiswa 2012-04-04 18:08
All the lines match a keyword. Empty lines and unmatched lines should not be an issue...I have 46 lines, 46 keywords. here are my files http://www.mediafire.com/?xbuzhe245i8nim - Blainer 2012-04-04 18:11
For some reason the findstr command in Windows will not locate these lines in your list.txt file: ZZZYYYJesus-Christ.html ZZZYYYDeity-Christ.html ZZZYYYwhy-believe-resurrection.html ZZZYYYJesus-crucified.html ZZZYYYJesus-Jew.html ZZZYYYJesus-myth.html ZZZYYYnames-Jesus-Christ.html ZZZYYYstations-cross.htmlkiswa 2012-04-04 18:38
yea, do you have andy idea why this is - Blainer 2012-04-04 18:42
I can't access the mediafire site from my current location, so I can't be sure. But perhaps there are spaces (or other hidden characters) in the keywords.txt file that are preventing the match. If so, then removing tokens=* from the 1st FOR statement should solve the problem - dbenham 2012-04-04 19:21
Ads