BurnedOut
Beloved Antichrist
I just went through some research papers which seek to establish a correlation between personality and text. It turns out that some traits can be strongly predicted by analyzing the text such as Neuroticism, Extraversion and Openness.
Also, I cannot find the goddamn website which posted analyses of US President's speeches and did a word-frequency analysis to term them as 'analytical' or 'intuitive'. This is actually the inspiration for me to create this post. Anyway, this is what it says. If you are able to find it or any related scientific paper, awesome. At the end of this post, I am attaching a small python helper script (dependencies: (pip: selenium, requests, pyperclip), geckodriver)) which will automate the process of downloading the file. Just use --help when you download it and get the dependencies put. It will use scihubtw.tw to fetch your files.
This proposed program can consist of these features:
This can be an ambitious successor to Serac's post where he analyzes demographics and provides a word count.
I don't have time to do this alone. So, if anyone's interested, PM me, we can try to implement this. It should not be too hard.
PS: I am not well versed in statistics or NLP. However, I can aid in making the core script to make this work.
Also, I cannot find the goddamn website which posted analyses of US President's speeches and did a word-frequency analysis to term them as 'analytical' or 'intuitive'. This is actually the inspiration for me to create this post. Anyway, this is what it says. If you are able to find it or any related scientific paper, awesome. At the end of this post, I am attaching a small python helper script (dependencies: (pip: selenium, requests, pyperclip), geckodriver)) which will automate the process of downloading the file. Just use --help when you download it and get the dependencies put. It will use scihubtw.tw to fetch your files.
This proposed program can consist of these features:
- Getting OPs by user
- Doing a frequency word count to determine the traits. If anyone wants to use NLP/statistics or both, go ahead. That can also be done.
- Logging in as a user and then doing the tasks (Don't know if that is allowed here)
- Sorting, filtering stuff (Once you get the HTML element, that is not a problem)
This can be an ambitious successor to Serac's post where he analyzes demographics and provides a word count.
I don't have time to do this alone. So, if anyone's interested, PM me, we can try to implement this. It should not be too hard.
PS: I am not well versed in statistics or NLP. However, I can aid in making the core script to make this work.
Download from scihubtw.tw using doi links:
import pyperclip as ppc
from selenium import webdriver
from selenium.webdriver import FirefoxOptions
import os, sys
import argparse
import re
import requests
import subprocess
def downloadFile(url, title, dest):
headers = {
"Host": "sci.bban.top",
"Referer" : url,
"User-Agent" : "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:85.0)"
}
curlString = '''curl "{}" -H "Host: {}" -H "Referer: {}" -H "User-Agent: {}" >> {}/{}.pdf'''.format(url, headers['Host'], headers['Referer'], headers['User-Agent'], dest, title)
with open("downloadLink.curl", 'w') as FH:
FH.write(curlString)
subprocess.check_output('bash downloadLink.curl'.split())
os.remove("downloadLink.curl")
def download(driver, element, doi, cwd, dest):
driver.get('https://www.scihubtw.tw/' + re.sub('^/', '', doi))
found = driver.find_element_by_css_selector(element)
link = re.search("'([^']+)'", found.get_attribute('onclick')).group(1).replace('\\', '')
print(link)
downloadFile(link, re.sub("[^a-zA-Z0-9.-]+", "", link), dest)
dlElement = "#buttons > ul:nth-child(1) > li:nth-child(2) > a:nth-child(1)"
cwd = os.getcwd()
link = ""
options = FirefoxOptions()
options.headless = True
parser = argparse.ArgumentParser("Download from scihub using doi link")
parser.add_argument("--doi", type=str, help="Pass the doi link. It should consist the substring 'doi'")
parser.add_argument('--auto', action='store_true', default=True, help='Use the clipboard to get the doi link. This is the default option')
parser.add_argument('--dest', default=os.environ['HOME'] + "/Downloads", help="Destination directory [default=$HOME/Downloads]")
args = parser.parse_args()
if args.doi:
if not 'doi' in args.doi:
print("Not a valid doi link ")
sys.exit()
if not args.doi and args.auto:
link = ppc.paste()
if not 'doi' in link:
print("Not a valid doi link ")
sys.exit()
driver = webdriver.Firefox(options=options)
download(driver, dlElement, link, cwd, args.dest)
driver.quit()