Here is a list of external resources that might be of interest
On this page you can find a list of external resources helpful for data collection and data analysis.
Facepager (Data collection from public Facebook pages)
Facepager is a tool for collecting public data from Facebook, and there is public access, meaning students as well as researchers can use this software for free. Facepager was made for fetching public available data from YouTube, Twitter and other websites on the basis of APIs and webscraping. All data is stored in a SQLite database and may be exported to csv. If you wish to get started, you should watch the walkthrough video on their YouTube Channel here.
Get Twitter-bios from handle using Import-XML in Google sheets
This script can be used to get Twitter-bios (the text, that a user writes on their profile). It runs within Google sheets. There is known limitations to this, so we recommend to divide content over several sheets if you need to get more than 800 profile bios. The process is quite slow. But hey, it’s free and easy to use. Google sheets need to be open in a browser, while the script works. To use the script, have the Twitter handle (omit the @) in Column A and use the following code in Column B to reference Column A (here A1) to use Import XML. Handles with no bio or handles that do not longer exists will be shown as “N/A”. Code to use:
=join(char(10);importxml("https://twitter.com/intent/user?screen_name="&A1;"//*[@class='note']"))
Wordij (Semantic Network Tools)
WORDij is a family of various programs designed to automate content analysis a substantial amount. In other words; you feed Wordij a text file, and it analyzes the text. Wordij can analyse word cohesions and links, count frequently used words, extract proper nouns, ontologies and more. The software runs on Windows 32-bit and 64-bit, Mac 32-bit and 64-bit, and Linux 64-bit OS. Files analyzed are in UTF-8 (or UTF-16) format, so the programs can handle languages with graphic characters such as Chinese or Russian. WORDij output files, 8 per run, enable importation of files into a number of other network analyusis programs, such as UCINET, NodeXL, Pajek, Negopy, and others. WORDij is free for non-commercial academic research. Commercial licensing is available.
Gephi (Network visualization)
Gephi is the leading visualization and exploration software for all kinds of graphs and networks. Gephi is open-source and free and runs on Mac, Windows or Linux.
Twitter Username /User ID converter
Converts a Twitter-handle to User ID number. The ID number can be used with TCAT-user, please reference our hosted resources.
Open Refine
OpenRefine (formerly Google Refine) is a powerful tool for working with messy data: cleaning it; transforming it from one format into another; and extending it with web services and external data. In other words; if you have messy data, Open Refine can help to clean it up. It works only with structured data though, so for instance imagine a spreadsheet, where you have a column with adresses. Sometimes an adress is mentioned “This is where it is at 1” sometimes it is mentioned as “This_is_where_it_is_at_one”. Open Refine can analyze and identify that these two adresses are probably the same.
DMI Toolbase
Not so much a single tool but more of a collection of various tools for gathering or working with data. DMI (Digital Methods Initiative) are also the people behind the TCAT (Twitter Collection and Analysis Toolkit), which we do offer as a hosted service.
Instaloader
To use Instaloader, you should do the following.
- Download and set up a new Anaconda Environment with a Python version higher than 3.5.
- Install or open Jyputer Notebook on the environment and open a new terminal from Jupyter Notebook.
- Do a pip (not pip3, as that does not work with Anaconda) install of the instaloader and dependencies
pip install instaloader
- Create a new folder in your root-environment (typically documents-folder) called for instance “Instaloader”. Create the folder in a Finder/Explorer window if you aren’t familiar with creating folders in terminal. This is to avoid that your root folder is cluttered with scraped data.
- Change folder to the folder you just created. In terminal write:
cd instaloader
- Run various command line commands in your terminal. Please do note that the interface is rudimentary and text-based but filters can be applied with the use of boolean expressions for instance:
instaloader "#HASHTAG" --post-filter="date_utc >= datetime(2017,1,1) and date_utc <= datetime(2018,1,1)" --login=USERNAME
- We would love to implement this as a hosted service. However, it is not likely we will do so just now. Therefore, please experiment with it yourself. You can also ask our advice, and we will do our best to help. If you plan to use this tool on a regular basis or for larger datasets, you should probably be ready to use several user accounts and/or proxies to avoid being banned.