• OK, it's on.
  • Please note that many, many Email Addresses used for spam, are not accepted at registration. Select a respectable Free email.

Statistical Analysis of intpf

Local time
Today, 16:36
Joined
Jun 7, 2017
Messages
1,428
Location
Stockholm
#1
In the Coding thread we were discussing writing something that reads intpf threads and does analysis of them. I did a little implementation of that and downloaded that last 5000 threads. That's about since middle of 2014.


Some preliminary stats below.



Words written this year:

[bimgx=400]https://intpforum.com/attachment.php?attachmentid=2941&stc=1&d=1511785160[/bimgx]


Vocabulary:

[bimgx=400]https://intpforum.com/attachment.php?attachmentid=2942&stc=1&d=1511785286[/bimgx]




I might add more stuff later so stay tuned. Also, feel free to suggest any interesting stats you can think of

 

Attachments

Local time
Today, 09:36
Joined
Apr 4, 2010
Messages
5,678
Location
subjective
#2
The bar graph says these are about the number of words I typed in 2017 (70,000)

The other bar graph does not have me on the list, my vocabulary then is not that high.

http://www.speechinminutes.com/

70,000 at 130 wpm = 538.5 Minutes or 8.975 hours to read
 
Local time
Today, 16:36
Joined
Jun 7, 2017
Messages
1,428
Location
Stockholm
#3
The other bar graph does not have me on the list, my vocabulary then is not that high.
Me neither, man. But it should be said the current implementation is quite crude. It treats misspelled words, for example, as distinct words. I might add a filtering for dictionary words later.

Here is a random sample of 50 words from your posts:

is powers games but intelligence it i more hand of intp things thoughtless a with we one capacities faster and flawed when not best hatred i complete the so innateness the wrong to brain on to computers weight of my part abstraction he it gpu knowing then in were iq

washti:

in want infj ex healthy me condescending extentions food back to when thank sweats it about here each general you kinda small that family that meditate to my mine me as every krakow good ambience on very and cute better in in read for is win up chemistry extrovert constantly

Rook:

ahhhhh so whether do mix ecstasy no con hmmmm ash her believe help thingy above here list for pompeii hadeda minority randiness drugs self trotting eat me pulling good trance say who some impulses wild birds life so and booties a the this if termed else fish we soooooooooo vs
 
Local time
Today, 09:36
Joined
Apr 4, 2010
Messages
5,678
Location
subjective
#4
Here is a random sample of 50 words from your posts:

is powers games but intelligence it i more hand of intp things thoughtless a with we one capacities faster and flawed when not best hatred i complete the so innateness the wrong to brain on to computers weight of my part abstraction he it gpu knowing then in were iq
Looks like a word cloud (and it seems nonrandom, a weird sentence formation)

Something that could be done is mapping the complexity of word structure. And then creating sentences from it similarly too but not just copy of the original text.
 
Local time
Today, 09:36
Joined
Apr 4, 2010
Messages
5,678
Location
subjective
#5
I definitely see structure in the random sample.

is powers games but intelligence
it i more hand of intp things thoughtless
a with we one capacities faster and flawed when not best hatred
i complete the so innateness the wrong to brain
on to computers weight of my part
abstraction he it gpu knowing then in were iq
 
Local time
Today, 16:36
Joined
Jun 7, 2017
Messages
1,428
Location
Stockholm
#6
Here is your word cloud for your last 5k words, AK:
[bimgx=400]https://intpforum.com/attachment.php?attachmentid=2945&stc=1&d=1511792145[/bimgx]
 

Attachments

washti

tellurian
Local time
Today, 17:36
Joined
Sep 11, 2016
Messages
461
#7
I want cloud too! Plzzzz, do me!
liking this - can pretend i'm unique. In this random sample words repeat themself.
if checking for words complexity - certainly Cog, Kuu, AK and Blarraun will be on top
Could you do dictionary word only stat?
also YOU KINDA SMALL :cat:
 
Local time
Today, 16:36
Joined
Jun 7, 2017
Messages
1,428
Location
Stockholm
#8
krakow good ambiance


Here ya go washti:
[bimgx=400]https://intpforum.com/attachment.php?attachmentid=2946&stc=1&d=1511796913[/bimgx]
 

Attachments

Local time
Today, 16:36
Joined
Jun 7, 2017
Messages
1,428
Location
Stockholm
#9
Here is a list of the number of unique dictionary words used by respective users in their last 5000 words written:

Code:
                      user    N
  1: Perfectly Normal Beast 1665
  2:           Brontosaurie 1614
  3:                   Rook 1552
  4:                    Kuu 1485
  5:                    gps 1467
  6:               Blarraun 1451
  7:               TMills27 1451
  8:              Absurdity 1448
  9:              Nofriends 1445
 10:             EyeSeeCold 1440
 11:                 washti 1438
 12:         DrGregoryHouse 1414
 13:              Lagomorph 1414
 14:                  Fukyo 1409
 15:                 Adaire 1406
 16:           The Grey Man 1402
 17:                  TBerg 1401
 18:               Pyropyro 1398
 19:                 Yellow 1393
 20:      Pressure's Spring 1392
 21:              EditorOne 1391
 22:                 Auburn 1388
 23:            Intolerable 1386
 24:            ProxyAmenRa 1378
 25:            waechter418 1376
 26:              420MuNkEy 1372
 27:                    TAC 1372
 28:                Anktark 1369
 29:            Cherry Cola 1366
 30:              bvanevery 1366
 31:              Cognisant 1360
 32:                zerkalo 1359
 33:                Rualani 1359
 34:                  Rixus 1358
 35:               Analyzer 1354
 36:              manishboy 1353
 37:                Sinny91 1348
 38:       paradoxparadigm7 1347
 39:         Creeping Death 1346
 40:      Interdimensionist 1343
 41:            JimJambones 1339
 42:                   muir 1338
 43:               Bad Itch 1335
 44:               Cogitant 1335
 45:         TheAdditional1 1334
 46:             Jennywocky 1331
 47:           dark+matters 1329
 48:               OmoInisa 1328
 49:                  Happy 1326
 50:              Architect 1323
 51:                 cheese 1316
 52:             Miss spelt 1312
 53:            ENTP lurker 1309
 54:             PaulMaster 1307
 55:                  viche 1304
 56:                 Urakro 1304
 57:                   Milo 1302
 58:              BurnedOut 1297
 59:                Polaris 1296
 60:               INTPWolf 1294
 61:                  Puffy 1291
 62:        TransientMoment 1285
 63:         smithcommajohn 1281
 64:                Helvete 1271
 65:           Lapis Lazuli 1266
 66:                 nanook 1264
 67:             Tannhauser 1264
 68:              SpaceYeti 1257
 69:                 Shieru 1257
 70:                 Sly-fy 1256
 71:               Nebulous 1256
 72:                 PmjPmj 1255
 73:                FlorisV 1253
 74:       A_Scanner_Darkly 1243
 75:          Esurient Fere 1243
 76:             peoplesuck 1242
 77:               Valentas 1242
 78:                  Serac 1240
 79:            WhatWasThat 1238
 80:                   higs 1234
 81:               gilliatt 1232
 82:              Turnevies 1229
 83:      AphroditeGoneAwry 1222
 84:            Hunter Wulf 1221
 85:                crippli 1220
 86:                  AndyC 1218
 87:                  Alias 1216
 88:              Hadoblado 1216
 89:       TheHabitatDoctor 1216
 90:                0neKiwi 1210
 91:        Gather_Wanderer 1209
 92:          Cheeseumpuffs 1207
 93:      Invisible Gorilla 1205
 94:         onesteptwostep 1198
 95:                Grayman 1189
 96:             Inquisitor 1185
 97:           dutchdisease 1182
 98:            Reluctantly 1180
 99:            DrSketchpad 1177
100:               reloaded 1177
101:                 RaBind 1174
102:             rainman312 1170
103:           Shadow Angel 1170
104:                Minuend 1169
105:        Minute Squirrel 1168
106:         Rudolph Mondal 1167
107:                    Lot 1165
108:           TheManBeyond 1161
109:             Nihilmatic 1160
110:          deathvirtuoso 1158
111:                Feather 1155
112:           Seteleechete 1154
113:            Crystabelle 1154
114:                 Bogart 1140
115:             emmabobary 1139
116:                  Sixup 1138
117:                 JR_IsP 1136
118:           Artsu Tharaz 1135
119:                Redfire 1132
120:            computerhxr 1130
121:                   Teax 1130
122:                 Oprale 1128
123:              baccheion 1116
124:    YOLOisonlyprinciple 1115
125:            Sir Eus Lee 1114
126:             Animekitty 1108
127:           scorpiomover 1105
128:              Pizzabeak 1101
129:               WALKYRIA 1096
130:         IndigoViolet11 1092
131:                   E404 1090
132:                   Toro 1079
133:                intp_xp 1067
134:             The Gopher 1063
135:            doncarlzone 1062
136:             QuickTwist 1061
137:                 Nick85 1059
138:              ZenRaiden 1058
139:               redbaron 1057
140:                 HDINTP 1042
141:       Philosophyking87 1037
142:            Infinitatis 1031
143:         The Flycatcher 1027
144:                 Ucenna 1026
145:                   Haim  988
146:            Manipulator  986
147:         louiesgonnadie  975
148:                  iAmMe  959
149:                reckful  940
150:               e.lee.sa  884
151:              ruminator  875
152:                8151147  851
153:           A Son of Two  645
                       user    N
 
Local time
Today, 16:36
Joined
Jun 7, 2017
Messages
1,428
Location
Stockholm
#10
For users with at least 100 posts in 2017, the average no. of dictionary words used by avg number of words used per post in total:

[bimgx=400]https://intpforum.com/attachment.php?attachmentid=2947&stc=1&d=1511801961[/bimgx]


So for example QT and onestep write posts on average of about the same length, but onestep uses slightly more unique dictionary words.
 

Attachments

Haim

Worlds creator
Local time
Today, 18:36
Joined
May 26, 2015
Messages
691
Location
Israel
#11
It probably would make more sense to have unique to non unique words ratio as having 200 unique words in 300 is not the same as 200 to 1000.
 

Hadoblado

I em Hedo I like smell of grass
Local time
Tomorrow, 01:06
Joined
Mar 17, 2011
Messages
5,106
#12
I'm not winning so I propose that this only measures pretentiousness. :P

I like statsy stuff Serac, much appreciate.

Is it possible/plausible to remove non-words from the equation altogether, then have a unique vs. non-unique word ratio like Haim says?
 

gps

INTP 5w4 Iconoclast
Local time
Today, 11:36
Joined
Mar 16, 2010
Messages
200
Location
Upstate NY, USA, Earth
#13
These stats seem to produce something akin to a topo map as contrasted with a geological survey which addresses what's beneath the surface.
Chomsky's distinction between surface structure and deep structure is at issue.

A poet would use fewer words -- and perhaps even common, non-exotic ones -- to invoke deep structural meaning.

Also, I'd like to think that some if not much of the value of my posts comes from the hypertext links I overload onto the words and phrases which your statistical approach has thus far addressed.

It's fairly easy to do lexical analyses, but substantially harder to get at things like deep-structural concepts as well as the separate issue of aesthetics/artistry.
Short of having subjective human raters, graders, and critics assess the same texts your analyses access I don't see how this lexical-statisical approach can allow us to get at what we each experience as `quality' experienced via a given post or even an influential link or reference within a post ... let alone the quality of a thread arising from the interAction of the participants.
Sometimes the valuable and/or interesting feature of a thread is the tango between pairs of participants more than seemingly unrelated solo contributions in which atypical words and phrases are used.

I'm interested in the `code' though ... perhaps a block diagram of it which might be used to re-implement the gist of your efforts via various languages as part of the Code Thread you mentioned and in we've both participated.
 
Local time
Tomorrow, 01:36
Joined
Aug 26, 2010
Messages
4,621
#14
Okay so who is GPS and how has he written to much I haven't read.

Also I simplify my language for you folks, just saying. :D

(On an greater sombreness bulletin myself for sure does to speak the same subject matter a greatness when I argue)
 

higs

My word is my bond. Gold Bond
Local time
Today, 16:36
Joined
Apr 3, 2012
Messages
1,700
Location
Armchair
#15
Hey serac can you do mine ? :D love this
 

gps

INTP 5w4 Iconoclast
Local time
Today, 11:36
Joined
Mar 16, 2010
Messages
200
Location
Upstate NY, USA, Earth
#18
Okay so who is GPS and how has he written to much I haven't read.
He's some dickweed that showed up lusting after Jennywocky and -- in a fit of self-consciousnesses realized that his shortcomings, once again, would be his undoing -- redirected and rechannelized his Freudian-IDacious energies into generating streams of blather which limited-lexicon spell-checkers would erroneously mislead one to believe were unique.

As for reading any of that blather
 

Jennywocky

guud languager
Local time
Today, 11:36
Joined
Sep 25, 2008
Messages
10,614
Location
Charn
#20
I've heard if you say my name two more times, I actually appear *gargly laugh*

[bimgx=200]http://www.maskerix.com/wp-content/uploads/2017/03/diy-beetlejuice-halloween-costume-idea-4.jpg[/bimgx]

... wait, I'm already here. dammit! :ahh:


--------

I can't imagine what 50K words I posted here, I haven't even been posting much lately. That's, like, a whole book in NaNoWriMo...

Anyway... moah! Posting analysis is always interesting.

finally all that time spent reading thesauruses in the bath has paid off yay
I think I'll stick my Lovecraft reader in there so my word selection will become simultaneously freaky and elegant.
 

baccheion

Active Member
Local time
Today, 11:36
Joined
May 2, 2016
Messages
209
#21
Who are the most and least verbose members (words / post)? Also, how many posts have they written (more posts = more certainty)? Everything posted above could/should be normalized by number of posts.

What are my stats and where do I rank (out of how many)?
 
Local time
Today, 09:36
Joined
Apr 4, 2010
Messages
5,678
Location
subjective
#22
Local time
Today, 16:36
Joined
Jun 7, 2017
Messages
1,428
Location
Stockholm
#23
Alright, folks. The word cloud has been refined now. Now it will reveal the hidden depths of your character by using something called "term frequency–inverse document frequency" to detect which words you tend to use that are different from the rest of the forum. Behold:

Animekitty:
[bimgx=500]https://intpforum.com/attachment.php?attachmentid=2952&stc=1&d=1511899730[/bimgx]


washti:
[bimgx=500]https://intpforum.com/attachment.php?attachmentid=2955&stc=1&d=1511899799[/bimgx]


Perfectly Normal Beast:
[bimgx=500]https://intpforum.com/attachment.php?attachmentid=2954&stc=1&d=1511899799[/bimgx]


higs:
[bimgx=500]https://intpforum.com/attachment.php?attachmentid=2953&stc=1&d=1511899799[/bimgx]
 

Attachments

redbaron

consummate salt-extraction specialist
Local time
Tomorrow, 01:36
Joined
Jun 10, 2012
Messages
6,671
Location
38S 145E
#28
i want a cloud too :mad:
 

Jennywocky

guud languager
Local time
Today, 11:36
Joined
Sep 25, 2008
Messages
10,614
Location
Charn
#32
Hmmm. My articulations clearly are inadequately blistering from an intellectual perspective and so I have determined I must ameliorate my lamentable effectuation as we move forward in order to bequeath a suitable propensity of phraseology for posterity.
 
Local time
Today, 09:36
Joined
Apr 4, 2010
Messages
5,678
Location
subjective
#33
Hmmm. My articulations clearly are inadequately blistering from an intellectual perspective and so I have determined I must ameliorate my lamentable effectuation as we move forward in order to bequeath a suitable propensity of phraseology for posterity.
Write fancier more often so as to look good to readers in the future.
 

cheese

Prolific Member
Local time
Tomorrow, 01:36
Joined
Aug 24, 2008
Messages
3,182
Location
internet/pubs
#35
Cloud please! Or teach us how to do it so we can cloud for a lifetime!

Most of the clouds' biggest words fit the general impression I have of the poster. (Not all though, for sure.)

Is this difficult or time-consuming for you? Also, is it telling you anything interesting?
 
Local time
Tomorrow, 01:36
Joined
Aug 26, 2010
Messages
4,621
#36
Okay so I've looked at my cloud and theorised that it might be using quotes as well as text. Thoughts Serec?
 
Local time
Today, 16:36
Joined
Jan 9, 2016
Messages
99
#37
The fuck? I didn't even realize I had posted 1000 words.

Yeah....no, I'm nipping this shit in the bud
I'm going back into my hollow
I just want you all to know
I'm watching you
Intimately....:phear:
 

onesteptwostep

Think.. Be... ..buzz buzz :)
Local time
Tomorrow, 00:36
Joined
Dec 7, 2014
Messages
2,785
#38
mine mine mine
 
Local time
Today, 16:36
Joined
Jun 7, 2017
Messages
1,428
Location
Stockholm
#41
Okay so I've looked at my cloud and theorised that it might be using quotes as well as text. Thoughts Serec?
Based on examples I've looked at, that doesn't seem to be the case. If you have a specific post where you think that might be the case I can take a look
 
Local time
Today, 16:36
Joined
Jun 7, 2017
Messages
1,428
Location
Stockholm
#42
Cloud please! Or teach us how to do it so we can cloud for a lifetime!

Most of the clouds' biggest words fit the general impression I have of the poster. (Not all though, for sure.)

Is this difficult or time-consuming for you? Also, is it telling you anything interesting?
It's definitely fascinating to look at this stuff. Also, natural language processing seems like a very interesting field to me in general, which I would like to study more.

Once the data is in, producing clouds is very easy.
 
Local time
Today, 16:36
Joined
Jun 7, 2017
Messages
1,428
Location
Stockholm
#46
If anyone has suggestions for interesting machine-learning or NLP techniques to apply in this setting, btw, I'm all ears
 
Local time
Today, 17:36
Joined
Jan 1, 2009
Messages
3,694
#49
Nice, mediocrity, just what I was going for

I can guess my word cloud: well, though, maybe, perhaps, guess, some, however, also, might, but, people

All my previous 500 posts are basically the same

i want a cloud too :mad:
Shit and fuck were my first guesses. Not sure what to make of the milk and bowl, though. Hmmmm

What is your word cloud, serac?
 
Top Bottom