To run this script, you need a Python environment. We recommend Anaconda for your own operating system.
This script takes a .txt-file (UTF-8 only) and counts how many times hashtags is used in the file. The script defaults to txtfile.txt, but you can use another file name if needed by prompt in beginning of script. The text file needs to be placed in the same directory as the code.
#TAKE A FILE AND EXTRACT HASHTAGS FROM IT PUBLIC VERSION #FILE FORMAT MUST BE UTF-8 name = input("Enter filename - default is txtfile.txt: ") if len(name) < 1 : name = "txtfile.txt" handle = open(name) tags = dict() lst = list() for line in handle : hline = line.split() for word in hline : #set exceptions for END of word to filter out special cases if word.startswith('#') and word.endswith(',') or word.endswith('”') or word.endswith(' ') : tags[word[:-1].lower()] = tags.get(word[:-1].lower(),0) + 1 continue #set loop to catch rest of words if word.startswith('#') : tags[word.lower()] = tags.get(word.lower(),0) + 1 else : continue for k,v in tags.items() : tags_order = (v,k) lst.append(tags_order) lst = sorted(lst, reverse=True)[:34] print('Hashtags and count of in this file: ' , '\n') for v,k in lst : print(k , v, '')
Known bugs and limitations
- If files are encoded as UTF-16, the script does not work properly, as Python does not parse it correctly. Solution: Convert to UTF-8
- If a word ends with anything else than with [,] or [“] it might not be filtered out and as such create duplicates of the dictionary. An example could be #hello and #hello! – this can be avoided by adding more exceptions for the word.endswith() method call.