• OK, it's on.
  • Please note that many, many Email Addresses used for spam, are not accepted at registration. Select a respectable Free email.
  • Done now. Domine miserere nobis.

Automatically deriving personality by analysis of OP


Well-Known Member
Local time
Tomorrow 3:51 AM
Apr 19, 2016
I just went through some research papers which seek to establish a correlation between personality and text. It turns out that some traits can be strongly predicted by analyzing the text such as Neuroticism, Extraversion and Openness.

Also, I cannot find the goddamn website which posted analyses of US President's speeches and did a word-frequency analysis to term them as 'analytical' or 'intuitive'. This is actually the inspiration for me to create this post. Anyway, this is what it says. If you are able to find it or any related scientific paper, awesome. At the end of this post, I am attaching a small python helper script (dependencies: (pip: selenium, requests, pyperclip), geckodriver)) which will automate the process of downloading the file. Just use --help when you download it and get the dependencies put. It will use scihubtw.tw to fetch your files.

This proposed program can consist of these features:
  1. Getting OPs by user
  2. Doing a frequency word count to determine the traits. If anyone wants to use NLP/statistics or both, go ahead. That can also be done.
  3. Logging in as a user and then doing the tasks (Don't know if that is allowed here)
  4. Sorting, filtering stuff (Once you get the HTML element, that is not a problem)
My proposed solution would be to use selenium and quickly get access to the user in question's OPs and carry out the tasks.

This can be an ambitious successor to Serac's post where he analyzes demographics and provides a word count.

I don't have time to do this alone. So, if anyone's interested, PM me, we can try to implement this. It should not be too hard.

PS: I am not well versed in statistics or NLP. However, I can aid in making the core script to make this work.

Download from scihubtw.tw using doi links:
import pyperclip as ppc
from selenium import webdriver
from selenium.webdriver import FirefoxOptions
import os, sys
import argparse
import re
import requests
import subprocess

def downloadFile(url, title, dest):
    headers = {
        "Host": "sci.bban.top",
        "Referer" : url,
        "User-Agent" : "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:85.0)"
    curlString = '''curl "{}" -H "Host: {}" -H "Referer: {}" -H "User-Agent: {}" >> {}/{}.pdf'''.format(url, headers['Host'], headers['Referer'], headers['User-Agent'], dest, title)

    with open("downloadLink.curl", 'w') as FH:
    subprocess.check_output('bash downloadLink.curl'.split())

def download(driver, element, doi, cwd, dest):
    driver.get('https://www.scihubtw.tw/' + re.sub('^/', '', doi))
    found = driver.find_element_by_css_selector(element)
    link = re.search("'([^']+)'", found.get_attribute('onclick')).group(1).replace('\\', '')
    downloadFile(link, re.sub("[^a-zA-Z0-9.-]+", "", link), dest)

dlElement = "#buttons > ul:nth-child(1) > li:nth-child(2) > a:nth-child(1)"
cwd = os.getcwd()
link = ""
options = FirefoxOptions()
options.headless = True

parser = argparse.ArgumentParser("Download from scihub using doi link")
parser.add_argument("--doi", type=str, help="Pass the doi link. It should consist the substring 'doi'")
parser.add_argument('--auto', action='store_true', default=True, help='Use the clipboard to get the doi link. This is the default option')
parser.add_argument('--dest', default=os.environ['HOME'] + "/Downloads", help="Destination directory [default=$HOME/Downloads]")

args = parser.parse_args()

if args.doi:
    if not 'doi' in args.doi:
        print("Not a valid doi link ")

if not args.doi and args.auto:
    link = ppc.paste()
    if not 'doi' in link:
        print("Not a valid doi link ")

driver = webdriver.Firefox(options=options)       
download(driver, dlElement, link, cwd, args.dest)


Well-Known Member
Local time
Tomorrow 3:51 AM
Apr 19, 2016
This project is in for a long haul. I hate polluting my threads with info that is too specific. It makes it incoherent.

I will post the github link soon.


Prolific Member
Local time
Today 11:21 AM
Dec 12, 2009
Top Bottom