Uiser:Illandancient/Lexis changes late September

Frae Wikipedia, the free beuk o knawledge
Jump to navigation Jump to search

By analysing the word frequency lists for the Scots Leid Wikipedia from 20-Sep and comparing it to one from 01-Aug, we can identify which words have fallen out of favour. The word frequency lists can be found in a google sheet here, note that words of fewer than two letters have been ignored (a, I, e, n, and o are all fine Scots words, but not here), as have words with only one occurrence.

The twenty words who's occurrences have decreased the most are as follows

Word Change in occurrences
is -1764
the -1352
keeng -1279
of -1075
and -1010
in -861
than -614
for -547
years -532
lairge -484
it -479
to -467
commune -413
la -397
built -338
with -335
region -308
by -279
well -256
on -254

The words that have gained the most are as follows

Word Change in occurrences Notes
king 1338 Replacing "keeng"
telly 835 Replacing "television" or "TV" perhaps?
dochter 683
wis 593 Replacing English "was"
nor 583 Replacing English "than" in "more than" and "less than"
muckle 579 Replacing "lairge"
year 535 Replacing the English plural "years"
mile 474 Replacing the English plural "miles"
biggit 319 Replacing "built"
creatit 316 Perhaps replacing "built" an aa
month 302 Replacing English plural "months"
an 299 Replacing English "and"
atween 285 Replacing "between"
tae 282 Replacing English "to"
weel 251 Replacing English "well"
his 250
he 206
prince 196 This is due to the hard work of anonymous IP editors creating many new pages about European nobility
gien 191
princess 189 As above

Perhaps the net change in occurrences is a poor metric to use. If we instead compare the relative rank of words between the two word frequency snapshots the top twenty is as follows

Word Notes
published From 607th position to 859th position, a fall of 252 places and -210 occurrences
santiago From 609th position to 826th position, a fall of 217 places and -186 occurrences
built From 452th position to 660th position, a fall of 208 places and -338 occurrences
iran From 624th position to 817th position, a fall of 193 places and -161 occurrences
commune From 344th position to 522th position, a fall of 178 places and -413 occurrences
brazilian From 663th position to 827th position, a fall of 164 places and -123 occurrences
keeng From 128th position to 243th position, a fall of 115 places and -1279 occurrences
villa From 714th position to 813th position, a fall of 99 places and -69 occurrences
each From 333th position to 428th position, a fall of 95 places and -243 occurrences
developed From 603th position to 696th position, a fall of 93 places and -109 occurrences
were From 537th position to 624th position, a fall of 87 places and -124 occurrences
than From 206th position to 288th position, a fall of 82 places and -614 occurrences
one From 830th position to 908th position, a fall of 78 places and -58 occurrences
from From 564th position to 641th position, a fall of 77 places and -109 occurrences
has From 668th position to 742th position, a fall of 74 places and -65 occurrences
los From 427th position to 501th position, a fall of 74 places and -150 occurrences
being From 763th position to 835th position, a fall of 72 places and -47 occurrences
with From 233th position to 299th position, a fall of 66 places and -335 occurrences
force From 525th position to 583th position, a fall of 58 places and -89 occurrences
range From 366th position to 423th position, a fall of 57 places and -141 occurrences

If we assume that as the size of the wikipedia changes, the various proportions of words ought to change in sympathy, and if we control for English words being replaced, we can isolate words that are increasing disproportionately. The following are mostly due to new articles about European nobility

Word New occurrences
faither 79
princess 189
mairiage 100
queen 36
spaingie 34
wife 49
mairit 76
prince 196
louis 53
swaden 14
marie 42
europe 21