Python Script File Finder

The File Finder

[EDIT]This is the wrong direction to find assembly code inside files, since there are too many things to check, such as binary files, in-line assembly code. Jon Masters has created a script that does find assembly code within src rpm packages through use of finding extensions of files. This is done by search for files with the extension .s and other extensions for c and c++. If the script finds an extension for c or c++ it searches the file to see if there are any in-line assembly statements. Afterwards the script attempts to build the package to see if it is successful. I believe that this method is much better then the below method.[EDIT]

The project that I’ve been working on is: http://zenit.senecac.on.ca/wiki/index.php/ARMv8_Support_Analysis

The first step that has been started on this project is to make a python script to automate the process of detecting assembly code inside files. This script is not a finished product at the moment, but demonstrates the initial idea that may be used to complete this task.

Note: This script will work with any text file of keywords, it does not need to be used to search for only assembly code. Have fun! (:

This script is run from the command line with 2 or more arguments. You may place any number of directories into the arguments, as long as the last argument is the keywords file.

./filefinder.py [directory] [keywords-text-file]

The script will use python’s os.walk module to search through all files and sub-directories within the given directory. The script will check the permissions on all files and directories, if the script is not allowed into a file or directory, then it will send a warning flag and continue. This is done so that you do not miss a file or directory structure that may contain assembly code.

The script will then begin opening files and searching them for a match with words inside the keywords file provided(this could contain most commonly used assembly line code). The keywords file should be 1 word per line. When a match is found, the script will flag the file. Optionally there is a line commented out in the script, that would show every line number that a match was found inside each file.

End result: Every file in the directory tree will be search(if you have permission) or flagged to tell you that assembly code was found.

I would like to mention that I am fairly new to object oriented programming, and also rusty with python. Any suggestions, improvements, or mistakes will be greatly appreciated.

#!/usr/bin/env python
# Andrew Oatley-Willis
# This script will be used to search through a directory, subdirectories, and all files
# it will then check to make sure that all directories and files have the proper permissions
# to be searched. This script will be looking through all the files for keywords, at which
# case it will flag the file that contains the keyword.

# Bugs:
# -This script cannot run through the /dev directory... So don't put /dev as a directory or sub-directory

# Things to do:
# -Make a blacklist file which will check which directories to avoid, this will be useful
# in case you don't want to scan some directories.

import os
from sys import argv

class filefinder:

    # Calls the necessary functions
    def __init__(self):
        if len(argv) >= 3 and os.path.exists(argv[len(argv)-1]):
            argv.pop(0)
            searchdirs = argv[0:len(argv)-1]
            print searchdirs
            self.startsearch(searchdirs)
        else:
            print "----------[ file finder ]----------\nThis script will be used to search through directories, subdirectories, and all files within.\nIt will then check to make sure that all directories and files have the proper permissions to\nbe searched. This script will be looking through all the files for keywords, at which case it\nwill flag the files that contain any keywords.\n\nSyntax:\n./filefinder [directory-to-search] [keywords-file.txt]\n\nShow only warnings:\n./filefinder [directory-to-search] [keywords-file.txt] | grep Warnings\n\nShow only assembly warnings:\n./filefinder [directory-to-search] [keywords-file.txt] | grep keywords\n"

    # The startsearch() function will search through the specified directory and all
    # subdirectories and print information about them.
    def startsearch(self, searchdirs):
        for search in searchdirs:
            for dirpath, dirnames, filenames in os.walk(search):
                for subdirname in dirnames:
                    searchpath = os.path.join(dirpath, subdirname)
                    print searchpath + " --> " + self.checkperm(searchpath) + self.scanfile(searchpath)
                for filename in filenames:
                    searchpath = os.path.join(dirpath, filename)
                    print searchpath + " --> " + self.checkperm(searchpath) + self.scanfile(searchpath)

    # The checkperm() function will check specific permissions on both directories and files to make
    # sure that the files are properly being searched, in case permissions are incorrect.
    def checkperm(self, checkfile):
        permissions = []
        if os.access(checkfile, os.R_OK):
            read = "r"
        else:
            if os.path.isdir(checkfile):
                read = "(Warning: Cannot read directory)"
            else:
                read = "(Warning: Cannot read file)"
        if os.access(checkfile, os.W_OK):
            write = "w"
        else:
            write = "-"
        if os.access(checkfile, os.X_OK):
            execute = "x"
        else:
            if os.path.isdir(checkfile):
                execute = "(Warning: Cannot parse directory)"
            else:
                execute = "-"
        return read + write + execute

    # The scanfile() function will scan all files, excluding directories, for specific keywords which it
    # will load in from a file in order to compare. When a match is found, it will display a flag saying
    # that the words were found inside the file.
    def scanfile(self, filename):
        warnings = ""
        if os.path.isdir(filename):
            warnings = "d"
        elif self.checkperm(filename)[0] != "r":
            warnings = ""
        else:
            warnings = "-"
            try:
                checkfile = open(filename, "r")
                keywords = self.loadkeywords(argv[len(argv)-1])
                count = 0
                for line in checkfile:
                    count += 1
                    for word in line.split():
                        if word in keywords:
                            # Uncomment line below for VERY verbose warnings on assembly code
                            #warnings = warnings + " (Warning: keywords on line " + str(count) + ") "
                            warnings = "(Warning: keywords detected)"
            except:
                warnings = "(Warning: Unknown error in scanfile())"
        return warnings

    # The loadkeywords() function simply opens the keywords.txt file and loads all the keywords from it.
    # These keywords will be checked against all of the files that are searched the scanfile() function.
    def loadkeywords(self, filename):
        keywords = []
        try:
            keywordsfile = open(filename, "r")
            for line in keywordsfile:
                keywords.append(line[0:len(line)-1])
        except:
            print "Error: Unknown error occured in loadkeywords()"
        finally:
            keywordsfile.close()
        return keywords

if __name__ == '__main__':
    filefinder()

About oatleywillisa

Computer Networking Student
This entry was posted in SBR600. Bookmark the permalink.

Leave a comment