{ "cells": [ { "cell_type": "markdown", "id": "chemical-amsterdam", "metadata": {}, "source": [ "TD 4 - Data analysis\n", "----------------------------\n", "\n", "In this notebook, we manipulate some basic statistical notions using python libraries." ] }, { "cell_type": "code", "execution_count": 1, "id": "answering-statement", "metadata": {}, "outputs": [], "source": [ "import pandas\n", "import numpy as np\n", "import scipy, scipy.stats\n", "import random\n", "from matplotlib import pyplot as plt\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "id": "noted-amber", "metadata": {}, "source": [ "### Compositionality\n", "\n", "The compositionality dataset below comes from the experiments in compositionaliyty prediction described in [this paper](https://aclanthology.org/J19-1001/). We will focus on the column called _compositionality_ which contains average annotations on a scale from 0 to 5 by about 15-20 human judges per compound noun, on a set of 180 compound nouns in French. The details of the construction of this dataset can be found [here](https://aclanthology.org/P16-2026/). The dataset contains also many other columns that we may explore later, including automatic compositionality predictions.\n", "\n", "##### Reading the data\n", "\n", "We will read the full dataset from a tab-separated table file using Pandas, a very useful python library for data analysis." ] }, { "cell_type": "code", "execution_count": 2, "id": "major-orbit", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
compound_lemmacompositionality
134poule_mouillé0.0000
127pied_noir0.1333
19carte_blanc0.2000
151septième_ciel0.2143
15bouc_émissaire0.2308
.........
0activité_physique4.9333
55eau_potable5.0000
170téléphone_portable5.0000
96matière_gras5.0000
52eau_chaud5.0000
\n", "

180 rows × 2 columns

\n", "
" ], "text/plain": [ " compound_lemma compositionality\n", "134 poule_mouillé 0.0000\n", "127 pied_noir 0.1333\n", "19 carte_blanc 0.2000\n", "151 septième_ciel 0.2143\n", "15 bouc_émissaire 0.2308\n", ".. ... ...\n", "0 activité_physique 4.9333\n", "55 eau_potable 5.0000\n", "170 téléphone_portable 5.0000\n", "96 matière_gras 5.0000\n", "52 eau_chaud 5.0000\n", "\n", "[180 rows x 2 columns]" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "results_df=pandas.read_csv('superjoined.norm.tsv', sep='\\t')\n", "\n", "results_df[['compound_lemma','compositionality']].sort_values('compositionality')" ] }, { "cell_type": "markdown", "id": "acute-carpet", "metadata": {}, "source": [ "##### Mean, standard deviation\n", "\n", "We will start by looking at some basic statistical descriptors of the _compositionality_ column.\n", "* Is the `std` value below the population or sample standard deviation? To verify it, manually implement both formulas and check which value corresponds to the one reported below." ] }, { "cell_type": "code", "execution_count": 11, "id": "tested-suffering", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "count 180.000000\n", "mean 9309.922222\n", "std 13864.223072\n", "min 106.000000\n", "25% 1546.000000\n", "50% 4084.500000\n", "75% 13325.750000\n", "max 108842.000000\n", "Name: freq.w1&w2, dtype: float64" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "comp = results_df['compositionality']\n", "freq = results_df['freq.w1&w2']\n", "freq.describe()" ] }, { "cell_type": "markdown", "id": "seven-applicant", "metadata": {}, "source": [ "##### Histogram\n", "\n", "We can also look at the histogram to have an idea of the distribution of values.\n", "* Does this look like a normal distribution?\n", "* Play with the bin size and observe what happens with the histogram" ] }, { "cell_type": "code", "execution_count": 7, "id": "correct-belfast", "metadata": { "scrolled": false }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXAAAAD4CAYAAAD1jb0+AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8/fFQqAAAACXBIWXMAAAsTAAALEwEAmpwYAAAOoUlEQVR4nO3db4hc133G8efJWiFix5Uc7A6LLLqCGBcjEaUa3BSXsuvUYeuE2oFQIqiRicvmRVxcIihq3jQhDbi0tgsmL6pWRipVPTW1jIydpBWONsKQxN11ZK9kNbVxldaL0SIkK15jWuT8+mLvgljPakZ3/lz9NN8PDDv3zD1zf4fVPnt15ty7jggBAPL5SNUFAADKIcABICkCHACSIsABICkCHACSum6QB7vxxhtjfHy8VN/33ntPo6OjvS3oKseYhwNjHg7djHlubu5sRNy0un2gAT4+Pq7Z2dlSfWdmZjQxMdHbgq5yjHk4MObh0M2Ybf+8VTtTKACQFAEOAEkR4ACQFAEOAEkR4ACQFAEOAEkR4ACQFAEOAEkR4ACQ1ECvxASAKo3veb6yY++f6v2tAzgDB4CkCHAASKptgNv+mO2XbL9i+6TtbxbtW2z/xPYbtv/Z9kf7Xy4AYEUnZ+D/K+nOiPikpO2Spmx/WtJfSnosIj4h6bykB/pWJQDgQ9oGeCxbKjbXFY+QdKekfynaD0i6tx8FAgBac0S038kekTQn6ROSviPpryT9uDj7lu3Nkr4XEVtb9J2WNC1J9Xp9R7PZLFXo0tKSarVaqb5ZMebhwJgHZ37hwsCPuWLLhpHSY56cnJyLiMbq9o6WEUbEB5K2294o6RlJv97pgSNir6S9ktRoNKLsDc25AfxwYMzDoaox31/xMsJej/mKVqFExDuSjkr6LUkbba/8ArhZ0kJPKwMAXFYnq1BuKs68ZXu9pLskndJykH+x2G2XpMN9qhEA0EInUyhjkg4U8+AfkfRURDxn+zVJTdt/Iemnkvb1sU4AwCptAzwiXpX0qRbtb0q6vR9FAQDa40pMAEiKAAeApAhwAEiKAAeApAhwAEiKAAeApAhwAEiKAAeApAhwAEiKAAeApAhwAEiKAAeApAhwAEiKAAeApAhwAEiKAAeApAhwAEiKAAeApAhwAEiKAAeApAhwAEiKAAeApAhwAEiKAAeApAhwAEiqbYDb3mz7qO3XbJ+0/VDR/g3bC7aPF4+7+18uAGDFdR3sc1HS7oh42fb1kuZsHyleeywi/rp/5QEA1tI2wCPibUlvF8/ftX1K0qZ+FwYAuDxHROc72+OSjknaKulrku6X9AtJs1o+Sz/fos+0pGlJqtfrO5rNZqlCl5aWVKvVSvXNijEPB8Y8OPMLFwZ+zBVbNoyUHvPk5ORcRDRWt3cc4LZrkn4o6dsRcch2XdJZSSHpW5LGIuLLl3uPRqMRs7OzV1y8JM3MzGhiYqJU36wY83BgzIMzvuf5gR9zxf6p0dJjtt0ywDtahWJ7naSnJR2MiEOSFBFnIuKDiPilpL+TdHupygAApXSyCsWS9kk6FRGPXtI+dsluX5B0ovflAQDW0skqlDsk3Sdp3vbxou3rknba3q7lKZTTkr7Sh/oAAGvoZBXKi5Lc4qXv9r4cAECnuBITAJIiwAEgKQIcAJIiwAEgKQIcAJIiwAEgKQIcAJIiwAEgKQIcAJIiwAEgKQIcAJIiwAEgKQIcAJIiwAEgqU7uBw7gGlT1nxdD9zgDB4CkCHAASIoAB4CkCHAASIoAB4CkCHAASIplhAAGbn7hgu6vcBnjtYIzcABIigAHgKTaBrjtzbaP2n7N9knbDxXtH7d9xPbrxdcb+l8uAGBFJ2fgFyXtjojbJH1a0ldt3yZpj6QXIuIWSS8U2wCAAWkb4BHxdkS8XDx/V9IpSZsk3SPpQLHbAUn39qlGAEALjojOd7bHJR2TtFXSf0fExqLdks6vbK/qMy1pWpLq9fqOZrNZqtClpSXVarVSfbNizMOhqjHPL1wY+DFX1NdLZ96v7PCV2LJhpPT3eXJyci4iGqvbOw5w2zVJP5T07Yg4ZPudSwPb9vmIuOw8eKPRiNnZ2SurvDAzM6OJiYlSfbNizMOhqjFXeTfC3dsu6pH54VrFvH9qtPT32XbLAO9oFYrtdZKelnQwIg4VzWdsjxWvj0laLFUZAKCUTlahWNI+Saci4tFLXnpW0q7i+S5Jh3tfHgBgLZ38H+YOSfdJmrd9vGj7uqSHJT1l+wFJP5f0B32pEADQUtsAj4gXJXmNlz/T23IAAJ3iSkwASIoAB4CkCHAASIoAB4CkCHAASIoAB4CkCHAASIoAB4CkhutuMiVVddOf/VOjlRy3SlX9rcTTD39u4McEusUZOAAkRYADQFIEOAAkRYADQFIEOAAkRYADQFIEOAAkRYADQFIEOAAkRYADQFIEOAAkRYADQFLczOoqxo2dAFwOZ+AAkBQBDgBJtQ1w20/YXrR94pK2b9hesH28eNzd3zIBAKt1cga+X9JUi/bHImJ78fhub8sCALTTNsAj4pikcwOoBQBwBbqZA3/Q9qvFFMsNPasIANARR0T7nexxSc9FxNZiuy7prKSQ9C1JYxHx5TX6TkualqR6vb6j2WyWKnRpaUm1Wq1U327NL1yo5Lj19dKZ9wd/3G2bNgz+oIXFcxcY8xCo6t92lbZsGCmdYZOTk3MR0VjdXirAO31ttUajEbOzsx0VvNrMzIwmJiZK9e1WVX/UePe2i3pkfvBL9atcB/74wcOMeQhU9W+7SvunRktnmO2WAV5qCsX22CWbX5B0Yq19AQD90fZXoO0nJU1IutH2W5L+XNKE7e1ankI5Lekr/SsRANBK2wCPiJ0tmvf1oRYAwBXgSkwASCrNpwhV3dgJAK5WnIEDQFIEOAAkRYADQFIEOAAkRYADQFIEOAAklWYZIdBPVd3vRpJ2b6vs0EiOM3AASIoAB4CkCHAASIoAB4CkCHAASIoAB4CkWEaID2FJHZADZ+AAkBQBDgBJEeAAkBQBDgBJEeAAkBQBDgBJEeAAkBQBDgBJEeAAkFTbALf9hO1F2ycuafu47SO2Xy++3tDfMgEAq3VyBr5f0tSqtj2SXoiIWyS9UGwDAAaobYBHxDFJ51Y13yPpQPH8gKR7e1sWAKAdR0T7nexxSc9FxNZi+52I2Fg8t6TzK9st+k5Lmpaker2+o9lslip08dwFnXm/VNe06uvFmIcAYx4OWzaMqFarleo7OTk5FxGN1e1d340wIsL2mr8FImKvpL2S1Gg0YmJiotRxHj94WI/MD9fNE3dvu8iYhwBjHg77p0ZVNv/WUnYVyhnbY5JUfF3sXUkAgE6UDfBnJe0qnu+SdLg35QAAOtXJMsInJf1I0q2237L9gKSHJd1l+3VJv1tsAwAGqO0kVETsXOOlz/S4FgDAFeBKTABIigAHgKQIcABIigAHgKQIcABIigAHgKQIcABIigAHgKQIcABIigAHgKQIcABIigAHgKQIcABIigAHgKQIcABIigAHgKQIcABIigAHgKQIcABIigAHgKQIcABIigAHgKQIcABIigAHgKSu66az7dOS3pX0gaSLEdHoRVEAgPa6CvDCZESc7cH7AACuAFMoAJCUI6J8Z/u/JJ2XFJL+NiL2tthnWtK0JNXr9R3NZrPUsRbPXdCZ90uXmlJ9vRjzEGDMw2HLhhHVarVSfScnJ+daTVF3G+CbImLB9q9KOiLpjyPi2Fr7NxqNmJ2dLXWsxw8e1iPzvZjxyWP3touMeQgw5uGwf2pUExMTpfrabhngXU2hRMRC8XVR0jOSbu/m/QAAnSsd4LZHbV+/8lzSZyWd6FVhAIDL6+b/MHVJz9heeZ9/iojv96QqAEBbpQM8It6U9Mke1gIAuAIsIwSApAhwAEiKAAeApAhwAEiKAAeApAhwAEiKAAeApAhwAEiKAAeApAhwAEiKAAeApAhwAEiKAAeApAhwAEiKAAeApAhwAEiKAAeApAhwAEiKAAeApAhwAEiKAAeApAhwAEiKAAeApAhwAEiKAAeApLoKcNtTtn9m+w3be3pVFACgvdIBbntE0nck/Z6k2yTttH1brwoDAFxeN2fgt0t6IyLejIj/k9SUdE9vygIAtOOIKNfR/qKkqYj4o2L7Pkm/GREPrtpvWtJ0sXmrpJ+VrPVGSWdL9s2KMQ8Hxjwcuhnzr0XETasbr+uunvYiYq+kvd2+j+3ZiGj0oKQ0GPNwYMzDoR9j7mYKZUHS5ku2by7aAAAD0E2A/7ukW2xvsf1RSV+S9GxvygIAtFN6CiUiLtp+UNK/ShqR9EREnOxZZR/W9TRMQox5ODDm4dDzMZf+EBMAUC2uxASApAhwAEgqRYAP2yX7tp+wvWj7RNW1DILtzbaP2n7N9knbD1VdU7/Z/pjtl2y/Uoz5m1XXNCi2R2z/1PZzVdcyCLZP2563fdz2bE/f+2qfAy8u2f9PSXdJekvLq192RsRrlRbWR7Z/R9KSpH+IiK1V19NvtsckjUXEy7avlzQn6d5r/HtsSaMRsWR7naQXJT0UET+uuLS+s/01SQ1JvxIRn6+6nn6zfVpSIyJ6fuFShjPwobtkPyKOSTpXdR2DEhFvR8TLxfN3JZ2StKnaqvorli0Vm+uKx9V9NtUDtm+W9DlJf191LdeCDAG+SdL/XLL9lq7xH+5hZntc0qck/aTiUvqumEo4LmlR0pGIuObHLOlvJP2ppF9WXMcghaR/sz1X3FqkZzIEOIaE7ZqkpyX9SUT8oup6+i0iPoiI7Vq+ivl229f0dJntz0tajIi5qmsZsN+OiN/Q8p1bv1pMkfZEhgDnkv0hUMwDPy3pYEQcqrqeQYqIdyQdlTRVcSn9doek3y/mhJuS7rT9j9WW1H8RsVB8XZT0jJanhXsiQ4Bzyf41rvhAb5+kUxHxaNX1DILtm2xvLJ6v1/KH9P9RaVF9FhF/FhE3R8S4ln+OfxARf1hxWX1le7T4YF62RyV9VlLPVpdd9QEeERclrVyyf0rSU32+ZL9ytp+U9CNJt9p+y/YDVdfUZ3dIuk/LZ2THi8fdVRfVZ2OSjtp+VcsnKUciYiiW1Q2ZuqQXbb8i6SVJz0fE93v15lf9MkIAQGtX/Rk4AKA1AhwAkiLAASApAhwAkiLAASApAhwAkiLAASCp/weIsTvApgNC/wAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "comp.hist(bins=10) # you can play with bin size to see what happens (default=10)\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "sporting-effort", "metadata": {}, "source": [ "##### Central limit theorem - observation\n", "\n", "We would like to test empirically whether the central limit theorem is verified on this data. We randomly subsample $n$ times a subset of $k$ compounds, calculate the average, and then check its distribution with a histogram. \n", "\n", "* Change the values of $n$ to see how larger values look more and more like a normal bell-shaped curve. \n", "\n", "Of course, more objective normality tests would be required to check this intuition, but the visualisation is quite convincing." ] }, { "cell_type": "code", "execution_count": 9, "id": "mexican-finding", "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAEICAYAAACktLTqAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8/fFQqAAAACXBIWXMAAAsTAAALEwEAmpwYAAAWQ0lEQVR4nO3df5BdZ33f8fc3snEcryPZkXOrEQbZM4KpLQUF7VBKMnQ3pEEBgqFlqFRKLX4JiOmPiaaNIRmgMMx4mig0GZJQBXtsJ8RrF2PiOCbF47A1lBFkxRivDRhkI6dWXRlbRmKNx8na3/5xz+Lr9b3au/f3ffR+zdzRuc8595zPHt397rnPOee5kZlIksryE8MOIEnqPYu7JBXI4i5JBbK4S1KBLO6SVCCLuyQVyOIuSQWyuEstRMS5EXFTRDweEQ9ExL8ediapXacNO4A0wv4Q+HugBmwD/ioivpGZ9ww1ldSG8A5V6bki4izgMWBLZn6navtT4EhmXj7UcFIb7JaRmnsRsLhU2CvfAC4eUh5pVSzuUnMTwIllbceBs4eQRVo1i7vU3ALw08vafhr44RCySKtmcZea+w5wWkRsbmh7CeDJVI0FT6hKLUTEDJDAO6lfLXMr8AqvltE48Mhdau3XgTOBh4HrgPda2DUuPHKXpAJ55C5JBbK4S1KBLO6SVCCLuyQVaCQGDlu/fn1u2rRp4Nt9/PHHOeusswa+3U6Zt7/M23/jlnnU8x48ePCRzDyv2byRKO6bNm1ibm5u4NudnZ1lampq4NvtlHn7y7z9N26ZRz1vRDzQap7dMpJUIIu7JBXI4i5JBbK4S1KBLO6SVCCLuyQVyOIuSQWyuEtSgSzuklSgFe9QjYirgNcBD2fmlqrteuDF1SLrgB9k5raI2AR8C7i3mncgM9/T69BSNzZd/ldN26/eMbq3mUur1c7wA1cDnwCuXWrIzH+1NB0R+6h/K/yS+zJzW4/ySZI6sGJxz8w7qiPy54iIAN4M/FKPc0mSutDW1+xVxf2WpW6ZhvZXAr+XmZMNy91D/ZvjTwC/nZlfarHOPcAegFqttn1mZqbzn6JDCwsLTExMDHy7nTJvb8wfOd60/YK1a0Yybyujun9PZtwyj3re6enpg0v1d7luR4XcRf2Lg5c8BLwgMx+NiO3A5yLi4sw8sfyFmbkf2A8wOTmZwxh5bdRHfFvOvL2x+yR97qOYt5VR3b8nM26Zxy1vo46vlomI04B/AVy/1JaZT2bmo9X0QeA+4EXdhpQkrU43l0L+MvDtzHxwqSEizouINdX0hcBm4P7uIkqSVqudSyGvA6aA9RHxIPChzLwS2Mmzu2QAXgl8JCL+AXgaeE9mHuttZGmwWl06efiK1w44idS+dq6W2dWifXeTthuBG7uPJUnqxkh8zZ40jjyi1yhz+AFJKpDFXZIKZHGXpAJZ3CWpQBZ3SSqQxV2SCmRxl6QCWdwlqUAWd0kqkMVdkgpkcZekAlncJalAFndJKpDFXZIKZHGXpAI5nrtUmT9yvOWXZ0vjxiN3SSqQxV2SCmRxl6QCrVjcI+KqiHg4Iu5uaPtwRByJiDurx2sa5r0/Ig5FxL0R8ep+BZcktdbOkfvVwI4m7R/PzG3V41aAiLgI2AlcXL3mjyJiTa/CSpLas2Jxz8w7gGNtru8SYCYzn8zM7wGHgJd1kU+S1IHIzJUXitgE3JKZW6rnHwZ2AyeAOWBvZj4WEZ8ADmTmn1XLXQl8PjM/02Sde4A9ALVabfvMzEwvfp5VWVhYYGJiYuDb7ZR5e2P+yPGm7bUz4egT3a9/68a13a+kDaO6f09m3DKPet7p6emDmTnZbF6n17n/MfBRIKt/9wFvX80KMnM/sB9gcnIyp6amOozSudnZWYax3U6ZtzdaXcu+d+si++Z7cOvH/ONNmw9f8dru191gVPfvyYxb5nHL26ijq2Uy82hmPpWZTwN/wjNdL0eA8xsWfX7VJkkaoI6Ke0RsaHj6RmDpSpqbgZ0RcUZEXABsBr7WXURJ0mqt+Bk0Iq4DpoD1EfEg8CFgKiK2Ue+WOQy8GyAz74mIG4BvAovAZZn5VF+SS5JaWrG4Z+auJs1XnmT5jwEf6yaUJKk7DhymIm1yADCd4hx+QJIKZHGXpAJZ3CWpQBZ3SSqQJ1Q11jxxKjXnkbskFcgjd40Fj9Cl1fHIXZIK5JG7NCCtPn30erRICTxyl6QiWdwlqUAWd0kqkMVdkgpkcZekAlncJalAFndJKpDFXZIKZHGXpAJZ3CWpQCsW94i4KiIejoi7G9p+JyK+HRF3RcRNEbGuat8UEU9ExJ3V45N9zC5JaqGdI/ergR3L2m4DtmTmzwHfAd7fMO++zNxWPd7Tm5iSpNVYceCwzLwjIjYta/tCw9MDwJt6nEs6ZTigmPohMnPlherF/ZbM3NJk3l8C12fmn1XL3UP9aP4E8NuZ+aUW69wD7AGo1WrbZ2ZmOv0ZOrawsMDExMTAt9upUznv/JHjPVnPydTOhKNP9H0zbdu6ce1J54/b+wHGL/Oo552enj6YmZPN5nU15G9E/BawCHy6anoIeEFmPhoR24HPRcTFmXli+Wszcz+wH2BycjKnpqa6idKR2dlZhrHdTp3KeXcP4Ms69m5dZN/86IyCffgtUyedP27vBxi/zOOWt1HHV8tExG7gdcBbsjr8z8wnM/PRavogcB/woh7klCStQkfFPSJ2AP8ZeH1m/qih/byIWFNNXwhsBu7vRVBJUvtW/AwaEdcBU8D6iHgQ+BD1q2POAG6LCIAD1ZUxrwQ+EhH/ADwNvCczj/Upu1Q0T7SqG+1cLbOrSfOVLZa9Ebix21A6dflF2FJveIeqJBXI4i5JBbK4S1KBLO6SVCCLuyQVyOIuSQWyuEtSgSzuklSg0RklSacUb1aS+ssjd0kqkMVdkgpkcZekAlncJalAFndJKpDFXZIKZHGXpAJ5nbtUCL+5SY08cpekAnnkLo2ZpSP0vVsX2e2dvmrB4q6+cpgBaTja6paJiKsi4uGIuLuh7dyIuC0ivlv9e07VHhHxBxFxKCLuioiX9iu8JKm5dvvcrwZ2LGu7HLg9MzcDt1fPAX4V2Fw99gB/3H1MSdJqtFXcM/MO4Niy5kuAa6rpa4A3NLRfm3UHgHURsaEHWSVJbYrMbG/BiE3ALZm5pXr+g8xcV00H8FhmrouIW4ArMvPL1bzbgd/MzLll69tD/cieWq22fWZmpjc/0SosLCwwMTEx8O12ahzzfu/4U8OO0bbamXD0iWGnaF+7ebduXNv/MG0ax/fwKOednp4+mJmTzeb15IRqZmZEtPdX4pnX7Af2A0xOTubU1FQvoqzK7Owsw9hup8Yx774vPz7sGG3bu3WRffPjc41Bu3kPv2Wq/2HaNI7v4XHK26ib69yPLnW3VP8+XLUfAc5vWO75VZskaUC6Ke43A5dW05cCf9HQ/m+rq2ZeDhzPzIe62I4kaZXa+gwaEdcBU8D6iHgQ+BBwBXBDRLwDeAB4c7X4rcBrgEPAj4C39TizJGkFbRX3zNzVYtarmiybwGXdhJIkdcexZSSpQBZ3SSqQxV2SCmRxl6QCWdwlqUAWd0kqkMVdkgpkcZekAo3PKEkaac2+cWnv1kV8i0nD4ZG7JBXI4i5JBbK4S1KBLO6SVCCLuyQVyOIuSQWyuEtSgSzuklQgi7skFcjiLkkFsrhLUoE6HvgjIl4MXN/QdCHwQWAd8C7g+1X7BzLz1k63I0lavY6Le2beC2wDiIg1wBHgJuBtwMcz83d7EVCStHq96pZ5FXBfZj7Qo/VJkrrQq+K+E7iu4fn7IuKuiLgqIs7p0TYkSW2KzOxuBRHPA/4vcHFmHo2IGvAIkMBHgQ2Z+fYmr9sD7AGo1WrbZ2ZmusrRiYWFBSYmJga+3U6Nct75I8ef01Y7E44+MYQwHSo179aNa1vOa/b/ttJrujHK7+FmRj3v9PT0wcycbDavF8X9EuCyzPyVJvM2Abdk5paTrWNycjLn5ua6ytGJ2dlZpqamBr7dTo1y3lZf1rFvfny+rKPUvIeveG3Lec3+31Z6TTdG+T3czKjnjYiWxb0X3TK7aOiSiYgNDfPeCNzdg21Iklahq8OUiDgL+OfAuxua/2tEbKPeLXN42TxJ0gB0Vdwz83HgZ5a1vbWrRJKkrnmHqiQVyOIuSQWyuEtSgSzuklQgi7skFcjiLkkFGp/b8SR1pNVdqCqbR+6SVCCLuyQVyOIuSQWyz13Scwx6tEj1nkfuklQgi7skFcjiLkkFsrhLUoEs7pJUIIu7JBXISyFPcd6aLpXJI3dJKpDFXZIK1HW3TEQcBn4IPAUsZuZkRJwLXA9sAg4Db87Mx7rdliSpPb3qc5/OzEcanl8O3J6ZV0TE5dXz3+zRttQB+9alU0u/umUuAa6ppq8B3tCn7UiSmuhFcU/gCxFxMCL2VG21zHyomv5/QK0H25EktSkys7sVRGzMzCMR8bPAbcC/A27OzHUNyzyWmecse90eYA9ArVbbPjMz01WOTiwsLDAxMTHw7Xaqm7zzR473OM3KamfC0ScGvtmOmXdlWzeu7er1p9Lv3CBMT08fzMzJZvO6Lu7PWlnEh4EF4F3AVGY+FBEbgNnMfHGr101OTubc3FzPcrRrdnaWqampgW+3U93kHUaf+96ti+ybH59bKcy7sm6H/D2VfucGISJaFveuumUi4qyIOHtpGvgV4G7gZuDSarFLgb/oZjuSpNXp9s9+DbgpIpbW9eeZ+dcR8bfADRHxDuAB4M1dbkeStApdFffMvB94SZP2R4FXdbNuSVLnvENVkgpkcZekAo3PpQFqi3eiSgKP3CWpSBZ3SSqQxV2SCmRxl6QCeUJVUttanbDvdlgC9Z5H7pJUIIu7JBXI4i5JBbLPXVLX7IsfPR65S1KBLO6SVCCLuyQVyOIuSQWyuEtSgSzuklQgi7skFcjiLkkFsrhLUoE6vkM1Is4HrgVqQAL7M/P3I+LDwLuA71eLfiAzb+02qJ7hV+lJWkk3ww8sAnsz8+sRcTZwMCJuq+Z9PDN/t/t4kqROdFzcM/Mh4KFq+ocR8S1gY6+CSZI6F5nZ/UoiNgF3AFuA3wB2AyeAOepH9481ec0eYA9ArVbbPjMz03WO1VpYWGBiYmLg2+3UUt75I8eHHaUttTPh6BPDTtE+8/be1o1rn/V8pffw8uWHbdRrxPT09MHMnGw2r+viHhETwP8CPpaZn42IGvAI9X74jwIbMvPtJ1vH5ORkzs3NdZWjE7Ozs0xNTQ18u51ayjsufe57ty6yb358Bh41b/+tlHnURpEc9RoRES2Le1dXy0TE6cCNwKcz87MAmXk0M5/KzKeBPwFe1s02JEmr13Fxj4gArgS+lZm/19C+oWGxNwJ3dx5PktSJbj7T/QLwVmA+Iu6s2j4A7IqIbdS7ZQ4D7+5iG5KkDnRztcyXgWgyy2vaJWnIvENVkgpkcZekAo3XdVSnmOWXPO7dusjuMbkMUtJweeQuSQWyuEtSgeyWkTQyWt19PWp3ro4Dj9wlqUAWd0kqkMVdkgpkn/sIGJdRHiWND4v7AFnEJQ2K3TKSVCCP3LvgZVvSaPJ30yN3SSqSxV2SCmS3TB944lTSsHnkLkkF8si9gSdhJJXilCzuS0Xc8dGl8dDvrs5W69+7dZGpvm65f+yWkaQC9e3IPSJ2AL8PrAE+lZlX9GtbrfTqr70nSKUynEq/y305co+INcAfAr8KXATsioiL+rEtSdJz9evI/WXAocy8HyAiZoBLgG/2Y2On0l9jSaNttfWoXxdsRGb2fqURbwJ2ZOY7q+dvBf5JZr6vYZk9wJ7q6YuBe3seZGXrgUeGsN1Ombe/zNt/45Z51PO+MDPPazZjaFfLZOZ+YP+wtg8QEXOZOTnMDKth3v4yb/+NW+Zxy9uoX1fLHAHOb3j+/KpNkjQA/SrufwtsjogLIuJ5wE7g5j5tS5K0TF+6ZTJzMSLeB/xP6pdCXpWZ9/RjW10aardQB8zbX+btv3HLPG55f6wvJ1QlScPlHaqSVCCLuyQVqLjiHhHnR8QXI+KbEXFPRPyHJstERPxBRByKiLsi4qUN8y6NiO9Wj0tHJO9bqpzzEfGViHhJw7zDVfudETHX77yryDwVEcerXHdGxAcb5u2IiHur/X/5iOT9Tw1Z746IpyLi3GreQPdxRPxkRHwtIr5R5f0vTZY5IyKur/bhVyNiU8O891ft90bEq0ck729U+/+uiLg9Il7YMO+phn0/kAsv2sy8OyK+35DtnQ3zBlonOpKZRT2ADcBLq+mzge8AFy1b5jXA54EAXg58tWo/F7i/+vecavqcEcj7iqUc1Id0+GrDvMPA+hHcx1PALU1euwa4D7gQeB7wjeWvHUbeZcv/GvA3w9rH1ftyopo+Hfgq8PJly/w68MlqeidwfTV9UbVPzwAuqPb1mhHIOw38VDX93qW81fOFQe3bVWbeDXyiyWsHXic6eRR35J6ZD2Xm16vpHwLfAjYuW+wS4NqsOwCsi4gNwKuB2zLzWGY+BtwG7Bh23sz8SpUH4AD1+waGps193MqPh6bIzL8Hloam6JsO8u4CrutnppOp3pcL1dPTq8fyKx8uAa6ppj8DvCoiomqfycwnM/N7wCHq+3yoeTPzi5n5o+rpKLyH29nHrQy8TnSiuOLeqPqo+vPU/yo32gj8n4bnD1ZtrdoH4iR5G72D+qeOJQl8ISIOVkM6DNQKmf9p9bH38xFxcdU20vs4In6K+i/qjQ3NA9/HEbEmIu4EHqZeSFq+hzNzETgO/AxD2r9t5G20/D38kxExFxEHIuINfYz5LG1m/pdVV9JnImLpxsyhvofbVeyXdUTEBPVf0P+YmSeGnWcl7eSNiGnqvxi/2ND8i5l5JCJ+FrgtIr6dmXf0P/GKmb9OfdyLhYh4DfA5YPMgcrXS5nvi14D/nZnHGtoGvo8z8ylgW0SsA26KiC2ZeXc/t9mNdvNGxL8BJoF/1tD8wmr/Xgj8TUTMZ+Z9I5D5L4HrMvPJiHg39U9Kv9TvXL1S5JF7RJxO/Zf405n52SaLtBoeYSjDJrSRl4j4OeBTwCWZ+ehSe2Yeqf59GLiJPn8Eb8hz0syZeWLpY29m3gqcHhHrGeF9XNnJsi6ZYe3japs/AL7Icz/2/3g/RsRpwFrgUYY89MdJ8hIRvwz8FvD6zHyy4TVL+/d+YJb6J6uBaZU5Mx9tyPkpYHs1PR7Dqwy707/XD+onSq4F/ttJlnktzz6h+rV85kTJ96ifJDmnmj53BPK+gHrf6SuWtZ8FnN0w/RXqo3GOwj7+Rzxzk9zLgL+rXnca9RNQF/DMCdWLh523Wm4tcAw4a5j7GDgPWFdNnwl8CXjdsmUu49knVG+opi/m2SdU76f/J1Tbyfvz1E/ubl7Wfg5wRjW9HvgufT7BvorMGxqm3wgcqKYHXic6eZTYLfMLwFuB+ao/DeAD1AskmflJ4FbqV8wcAn4EvK2adywiPkp9bByAj+SzP54PK+8Hqfen/lH9nBmLWR+prkb94yTUi+afZ+Zf9zlvu5nfBLw3IhaBJ4CdWf/NGMbQFO3khfov8Bcy8/GG1w5jH28Aron6l978BPXCfUtEfASYy8ybgSuBP42IQ9T/IO2sfpZ7IuIG6t+dsAhclvXuh2Hn/R1gAvgf1b78u8x8PfCPgf8eEU9Xr70iM/vyvQ8dZP73EfF66vvxGPWrZ4ZVJ1bN4QckqUBF9rlL0qnO4i5JBbK4S1KBLO6SVCCLuyQVyOIuSQWyuEtSgf4/kU2o1VDt5kcAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "n = 3000 # number of resamples, increase and see what happens\n", "k = 30 # sample size\n", "avg = []\n", "for i in range(n):\n", " subsample = comp.sample(k)\n", " avg.append(subsample.mean())\n", "avgDF = pandas.DataFrame(avg)\n", "avgDF.hist(bins=min(int(n/20),50))\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "medium-priest", "metadata": {}, "source": [ "##### Standardisation\n", "\n", "Let us now standardise the average `compositionality` values sampled above so that they are centered around zero and have unit standard deviation" ] }, { "cell_type": "code", "execution_count": 10, "id": "physical-residence", "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAEICAYAAACktLTqAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8/fFQqAAAACXBIWXMAAAsTAAALEwEAmpwYAAAUI0lEQVR4nO3df4zkd33f8ecrhlDLCzbU7up6ON0gGVR8l1x7K1q1FdotCXEAxVCl1FeLciHhcAttql4VHIhiCrLkNjE0TdK0l9rCNK7XqMbBNU6LS7M1/OEmd8hwNsbUpmfFV/cu/sGZNRbJ2e/+sbOwOc/ezs7M7sx89vmQVp75zHe+39edd1/32c985zupKiRJbfmBUQeQJA2f5S5JDbLcJalBlrskNchyl6QGWe6S1CDLXZIaZLlLa0jyqiS3J3k2yaNJ/v6oM0m9esmoA0hj7DeBPwGmgT3A55J8paoeGGkqqQfxHarSiyU5D3ga2FVV3+iM/UfgeFVdPdJwUg9clpG6ey1weqXYO74CXDqiPNKGWO5Sd1PAM2eMnQJePoIs0oZZ7lJ3S8Arzhh7BfDtEWSRNsxyl7r7BvCSJJesGvtRwBdTNRF8QVVaQ5IFoICfY/lsmbuAv+HZMpoEztyltf0j4FzgJHAL8A8tdk0KZ+6S1CBn7pLUIMtdkhpkuUtSgyx3SWrQWFw47MILL6yZmZmRHPvZZ5/lvPPOG8mx+zVpmSctL5h5K0xaXhi/zEeOHHmiqi7q9thYlPvMzAyHDx8eybEXFxeZm5sbybH7NWmZJy0vmHkrTFpeGL/MSR5d6zGXZSSpQZa7JDXIcpekBlnuktQgy12SGmS5S1KDLHdJapDlLkkNstwlqUHrvkM1yY3A24CTVbWrM3Yr8LrOJhcA36qqPUlmgAeBhzqP3VtVVw07tDSImas/13X82HVv3eIk0ubp5fIDnwR+A/jUykBV/b2V20muZ/lT4Vc8UlV7hpRPktSHdcu9qu7pzMhfJEmAdwJ/e8i5JEkD6Olj9jrlfufKssyq8TcCH6+q2VXbPcDyJ8c/A/xSVX1xjX0eAA4ATE9P711YWOj/TzGApaUlpqamRnLsfk1a5nHLe/T4qa7ju3ee/73b45a5F5OWedLywvhlnp+fP7LSv2ca9KqQ+1j+4OAVjwM/VFVPJtkL/G6SS6vqmTOfWFWHgEMAs7OzNaorrY3bVd56MWmZxy3v/rXW3K+c+97tccvci0nLPGl5YbIy9322TJKXAH8HuHVlrKq+W1VPdm4fAR4BXjtoSEnSxgxyKuSPAV+vqsdWBpJclOSczu3XAJcA3xwsoiRpo3o5FfIWYA64MMljwDVVdQNwBX92SQbgjcBHk/wp8AJwVVU9NdzI0tby1ElNol7Oltm3xvj+LmO3AbcNHkuSNIix+Jg9aRI5o9c48/IDktQgy12SGmS5S1KDLHdJapDlLkkNstwlqUGWuyQ1yHKXpAZZ7pLUIMtdkhpkuUtSgyx3SWqQ5S5JDbLcJalBlrskNcjruUsdq6/PfnD36TU/SFuaBM7cJalBlrskNchyl6QGrVvuSW5McjLJ/avGPpLkeJL7Ol9vWfXYLyZ5OMlDSX5is4JLktbWy8z9k8BlXcY/UVV7Ol93ASR5PXAFcGnnOf82yTnDCitJ6s265V5V9wBP9bi/y4GFqvpuVf0f4GHgDQPkkyT1IVW1/kbJDHBnVe3q3P8IsB94BjgMHKyqp5P8BnBvVf1OZ7sbgN+rqv/cZZ8HgAMA09PTexcWFobx59mwpaUlpqamRnLsfk1a5nHLe/T4qXW3mT4XTjzX3/537zy/vycOaNz+ntczaXlh/DLPz88fqarZbo/1e577bwEfA6rz3+uB92xkB1V1CDgEMDs7W3Nzc31GGczi4iKjOna/Ji3zuOXt5fz1g7tPc/3RPn88jj7bdfjYdW/tb389Gre/5/VMWl6YrMx9nS1TVSeq6vmqegH4bb6/9HIcuHjVpq/ujEmStlBf5Z5kx6q77wBWzqS5A7giycuS/DBwCfAHg0WUJG3Uur93JrkFmAMuTPIYcA0wl2QPy8syx4D3AVTVA0k+DXwNOA28v6qe35TkkqQ1rVvuVbWvy/ANZ9n+WuDaQUJJkgbjhcPUpBkv+qVtzssPSFKDLHdJapDlLkkNstwlqUG+oKqJ5gunUnfO3CWpQc7cNRGcoUsb48xdkhrkzF3aImv99rHZV4vU9uTMXZIaZLlLUoMsd0lqkOUuSQ2y3CWpQZa7JDXIcpekBlnuktQgy12SGmS5S1KD1i33JDcmOZnk/lVjv5Lk60m+muT2JBd0xmeSPJfkvs7Xv9vE7JKkNfQyc/8kcNkZY3cDu6rqR4BvAL+46rFHqmpP5+uq4cSUJG3EuhcOq6p7ksycMfb5VXfvBX56yLmkbcMLimkzpKrW32i53O+sql1dHvsvwK1V9Tud7R5geTb/DPBLVfXFNfZ5ADgAMD09vXdhYaHfP8NAlpaWmJqaGsmx+zVpmYeR9+jxU0NK05vpc+HEc1t6yBfZvfP8DW2/Hb8vttq4ZZ6fnz9SVbPdHhvokr9JPgycBm7uDD0O/FBVPZlkL/C7SS6tqmfOfG5VHQIOAczOztbc3NwgUfq2uLjIqI7dr0nLPIy8+7f4wzoO7j7N9UdHe0XsY1fObWj77fh9sdUmKXPfZ8sk2Q+8DbiyOtP/qvpuVT3ZuX0EeAR47RBySpI2oK9yT3IZ8AvAT1XVd1aNX5TknM7t1wCXAN8cRlBJUu/W/b0zyS3AHHBhkseAa1g+O+ZlwN1JAO7tnBnzRuCjSf4UeAG4qqqe2qTsUtN8oVWD6OVsmX1dhm9YY9vbgNsGDaXtyw/ClobDd6hKUoMsd0lqkOUuSQ2y3CWpQZa7JDXIcpekBlnuktQgy12SGjTaKyNp2/LNStLmcuYuSQ2y3CWpQZa7JDXIcpekBlnuktQgy12SGmS5S1KDPM9dasTR46e6fpC4n9y0PTlzl6QGOXOXJsxa7+49uHuLg2isWe7aVDNXf46Du093XS6QtHl6WpZJcmOSk0nuXzX2qiR3J/nfnf++sjOeJP8mycNJvprkr25WeElSd72uuX8SuOyMsauBL1TVJcAXOvcBfhK4pPN1APitwWNKkjaip3KvqnuAp84Yvhy4qXP7JuDtq8Y/VcvuBS5IsmMIWSVJPUpV9bZhMgPcWVW7Ove/VVUXdG4HeLqqLkhyJ3BdVX2p89gXgA9W1eEz9neA5Zk909PTexcWFobzJ9qgpaUlpqamRnLsfk1S5qPHTzF9Lpx4btRJNqalzLt3nr/1YXowSd/HK8Yt8/z8/JGqmu322FBeUK2qStLbvxLff84h4BDA7Oxszc3NDSPKhi0uLjKqY/drkjLv77ygev3RyXrtvqXMx66c2/owPZik7+MVk5R5kPPcT6wst3T+e7Izfhy4eNV2r+6MSZK2yCDlfgfw7s7tdwOfXTX+Dzpnzfx14FRVPT7AcSRJG9TT751JbgHmgAuTPAZcA1wHfDrJzwKPAu/sbH4X8BbgYeA7wM8MObMkaR09lXtV7VvjoTd12baA9w8SSpI0GK8tI0kNstwlqUGWuyQ1yHKXpAZZ7pLUIMtdkhpkuUtSgyx3SWrQZF0ZSWNrrY9+kzQaztwlqUGWuyQ1yHKXpAZZ7pLUIMtdkhpkuUtSgyx3SWqQ5S5JDbLcJalBlrskNchyl6QG9X1tmSSvA25dNfQa4JeBC4D3An/cGf9QVd3V73EkSRvXd7lX1UPAHoAk5wDHgduBnwE+UVW/OoyAkqSNG9ayzJuAR6rq0SHtT5I0gGGV+xXALavufyDJV5PcmOSVQzqGJKlHqarBdpD8IPB/gUur6kSSaeAJoICPATuq6j1dnncAOAAwPT29d2FhYaAc/VpaWmJqamokx+7XOGY+evzUmo9NnwsnntvCMEPQUubdO89f8zlr/X8723OGZRy/j9czbpnn5+ePVNVst8eGUe6XA++vqjd3eWwGuLOqdp1tH7Ozs3X48OGBcvRrcXGRubm5kRy7X+OY+Wwf1nFw92muPzpZnwvTUuZj1711zees9f/tbM8ZlnH8Pl7PuGVOsma5D2NZZh+rlmSS7Fj12DuA+4dwDEnSBgw0NUlyHvDjwPtWDf+rJHtYXpY5dsZjkqQtMFC5V9WzwJ8/Y+xdAyWSJA3Md6hKUoMsd0lqkOUuSQ2y3CWpQZa7JDXIcpekBk3WW/AkbdjZ3j2sdjlzl6QGWe6S1CDLXZIa5Jq7pBcZ5dUiNRzO3CWpQZa7JDXIcpekBlnuktQgy12SGmS5S1KDPBVym/Ot6VKbnLlLUoMsd0lq0MDLMkmOAd8GngdOV9VsklcBtwIzwDHgnVX19KDHkiT1Zlhr7vNV9cSq+1cDX6iq65Jc3bn/wSEdS31wbV3aXjZrWeZy4KbO7ZuAt2/ScSRJXQyj3Av4fJIjSQ50xqar6vHO7f8HTA/hOJKkHqWqBttBsrOqjif5C8DdwD8G7qiqC1Zt83RVvfKM5x0ADgBMT0/vXVhYGChHv5aWlpiamhrJsfvVT+ajx09tUpr1TZ8LJ54b2eH7Yubudu88f2j72i4/e5tpfn7+SFXNdnts4HL/MztLPgIsAe8F5qrq8SQ7gMWqet1az5udna3Dhw8PLcdGLC4uMjc3N5Jj96ufzKNccz+4+zTXH52st1SYubthXvJ3u/zsbaYka5b7QMsySc5L8vKV28CbgfuBO4B3dzZ7N/DZQY4jSdqYQf+ZnwZuT7Kyr/9UVf81yR8Cn07ys8CjwDsHPI4kaQMGKveq+ibwo13GnwTeNMi+JUn98x2qktQgy12SGjRZpwNoXb4TVRI4c5ekJlnuktQgy12SGmS5S1KDfEFVUs/WesF+mJcl0HA4c5ekBlnuktQgy12SGuSau6SBuRY/fpy5S1KDLHdJapDlLkkNstwlqUGWuyQ1yHKXpAZZ7pLUIMtdkhpkuUtSg/p+h2qSi4FPAdNAAYeq6teSfAR4L/DHnU0/VFV3DRpU33f0+Cn2+3F6ks5ikMsPnAYOVtWXk7wcOJLk7s5jn6iqXx08niSpH32Xe1U9Djzeuf3tJA8CO4cVTJLUv1TV4DtJZoB7gF3APwP2A88Ah1me3T/d5TkHgAMA09PTexcWFgbO0Y+lpSWmpqZGcux+nXzqFCeeG3WK3k2fy0TlBTMPy+6d53cdP3r8VNe8a20/LsatL+bn549U1Wy3xwYu9yRTwP8Erq2qzySZBp5geR3+Y8COqnrP2fYxOztbhw8fHihHvxYXF5mbmxvJsfv16zd/luuPTs4FPQ/uPj1RecHMW6Fb3nG/iuS49UWSNct9oLNlkrwUuA24uao+A1BVJ6rq+ap6Afht4A2DHEOStHF9l3uSADcAD1bVx1eN71i12TuA+/uPJ0nqxyC/w/1N4F3A0ST3dcY+BOxLsoflZZljwPsGOIYkqQ+DnC3zJSBdHvKcdkkaMd+hKkkNstwlqUGTc97UNrTWhw4f3L3FQSRNHGfuktQgy12SGuSyjKSxsdZS5Li/c3UcOXOXpAZZ7pLUIMtdkhrkmvsYWGudUZL6ZblvIUtc0lZxWUaSGuTMfQCetiWNJ382nblLUpMsd0lqkMsym8AXTiWNmjN3SWqQM/dVfBFGUiu2ZbmvLvGDu0+z32UUaaxt9lLn2fY/qZM7l2UkqUGbNnNPchnwa8A5wH+oqus261hrGda/9r5AKrVhO/0sb8rMPck5wG8CPwm8HtiX5PWbcSxJ0ott1sz9DcDDVfVNgCQLwOXA1zbjYNvpX2NJ422jfbRZa/qpquHvNPlp4LKq+rnO/XcBf62qPrBqmwPAgc7d1wEPDT1Iby4EnhjRsfs1aZknLS+YeStMWl4Yv8x/qaou6vbAyM6WqapDwKFRHX9FksNVNTvqHBsxaZknLS+YeStMWl6YrMybdbbMceDiVfdf3RmTJG2BzSr3PwQuSfLDSX4QuAK4Y5OOJUk6w6Ysy1TV6SQfAP4by6dC3lhVD2zGsYZg5EtDfZi0zJOWF8y8FSYtL0xQ5k15QVWSNFq+Q1WSGmS5S1KDLHcgyceSfDXJfUk+n+QvjjrTepL8SpKvd3LfnuSCUWc6myR/N8kDSV5IMtankiW5LMlDSR5OcvWo86wnyY1JTia5f9RZepHk4iS/n+Rrne+Jnx91prNJ8ueS/EGSr3Ty/otRZ+qFa+5AkldU1TOd2/8EeH1VXTXiWGeV5M3A/+i8eP0vAarqgyOOtaYkfxl4Afj3wD+vqsMjjtRV59IZ3wB+HHiM5TO/9lXVpry7ehiSvBFYAj5VVbtGnWc9SXYAO6rqy0leDhwB3j6uf8dJApxXVUtJXgp8Cfj5qrp3xNHOypk7sFLsHecBY/8vXlV9vqpOd+7ey/J7CcZWVT1YVaN6F/JGfO/SGVX1J8DKpTPGVlXdAzw16hy9qqrHq+rLndvfBh4Edo421dpq2VLn7ks7X2PfEZZ7R5Jrk/wRcCXwy6POs0HvAX5v1CEasRP4o1X3H2OMi2fSJZkB/grwv0Yc5aySnJPkPuAkcHdVjXVe2EblnuS/J7m/y9flAFX14aq6GLgZ+MDZ97Y11svc2ebDwGmWc49UL3mlFUmmgNuAf3rGb89jp6qer6o9LP+G/IYkY7/8tW0+iamqfqzHTW8G7gKu2cQ4PVkvc5L9wNuAN9UYvHiygb/jcealM7ZAZ+36NuDmqvrMqPP0qqq+leT3gcuAsX4Be9vM3M8mySWr7l4OfH1UWXrV+TCUXwB+qqq+M+o8DfHSGZus8wLlDcCDVfXxUedZT5KLVs5GS3Iuyy+2j39HjMGEb+SS3MbyZYdfAB4FrqqqsZ6tJXkYeBnwZGfo3nE+wyfJO4BfBy4CvgXcV1U/MdJQa0jyFuBf8/1LZ1w72kRnl+QWYI7ly9GeAK6pqhtGGuoskvwt4IvAUZZ/5gA+VFV3jS7V2pL8CHATy98PPwB8uqo+OtpU67PcJalBLstIUoMsd0lqkOUuSQ2y3CWpQZa7JDXIcpekBlnuktSg/w9E1XzAuso+LgAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "std_avgDF = (avgDF - avgDF.mean()) / avgDF.std()\n", "std_avgDF.hist(bins=min(int(n/20),50))\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "rational-exchange", "metadata": {}, "source": [ "### Compositionality and frequency\n", "\n", "We would like to study the relationship between compositionality and compound frequency. Let us extract these two variables from the dataset." ] }, { "cell_type": "code", "execution_count": 90, "id": "heard-bishop", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
compositionalityfreq.w1&w2
04.933313292
13.600019681
24.600014437
33.6364540
41.30771259
.........
1753.800010067
1764.69235529
1774.35711395
1783.923119950
1793.2000859
\n", "

180 rows × 2 columns

\n", "
" ], "text/plain": [ " compositionality freq.w1&w2\n", "0 4.9333 13292\n", "1 3.6000 19681\n", "2 4.6000 14437\n", "3 3.6364 540\n", "4 1.3077 1259\n", ".. ... ...\n", "175 3.8000 10067\n", "176 4.6923 5529\n", "177 4.3571 1395\n", "178 3.9231 19950\n", "179 3.2000 859\n", "\n", "[180 rows x 2 columns]" ] }, "execution_count": 90, "metadata": {}, "output_type": "execute_result" } ], "source": [ "results_df\n", "compfreq = results_df[['compositionality','freq.w1&w2']]\n", "compfreq" ] }, { "cell_type": "markdown", "id": "moderate-banana", "metadata": {}, "source": [ "##### Scatter plot\n", "\n", "Let's start by visually inspecting the relation between the two quantities with a scatter plot." ] }, { "cell_type": "code", "execution_count": 93, "id": "efficient-corrections", "metadata": { "scrolled": true }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYkAAAD4CAYAAAAZ1BptAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8/fFQqAAAACXBIWXMAAAsTAAALEwEAmpwYAAAhfUlEQVR4nO3df6xc5X3n8ffX10MYSOFCYyG4htraIkeQKHG4S9hlVTWwiw35gZV082O3xRuh8EeS3cBGbk1VifzoKq68W1qkNBIKbKDJBgiwN04hvbECqyrRmnDNNTiGeHGTAh5IcGtfkuBbuL7+7h/zjBnPPc+cc2bOzJmZ+3lJlmeeOWfmOefOPN/z/Dzm7oiIiCRZUXYGRERkcClIiIhIlIKEiIhEKUiIiEiUgoSIiEStLDsDRXvLW97ia9asKTsbIiJDZffu3f/o7qta00cuSKxZs4aZmZmysyEiMlTM7LmkdDU3iYhIlIKEiIhEKUiIiEiUgoSIiEQpSIiISNTIjW4SERkWU7M1tk/v58W5ec4br7Jlwzo2rZ8oO1snUZAQESnB1GyNmx/cy/zCIgC1uXlufnAvwEAFCjU3iYiUYPv0/hMBomF+YZHt0/tLylEyBQkRkRK8ODefK70sChIiIiU4b7yaK70sChIiIiXYsmEd1crYSWnVyhhbNqwrKUfJ1HEtIlKCRue0RjeJiEiiTesnBi4otFJzk4iIRClIiIhIlIKEiIhEKUiIiEiUgoSIiEQpSIiISJSChIiIRClIiIhIVGqQMLM7zexlM/txU9rZZrbTzJ4N/58V0s3MbjOzA2b2lJm9q2mfzWH7Z81sc1P6JWa2N+xzm5lZu88QEZH+yVKT+BqwsSVtK/B9d78Q+H54DnA1cGH4dwPwFagX+MAtwLuBS4Fbmgr9rwCfaNpvY8pniIhIn6QGCXf/O+BwS/K1wF3h8V3Apqb0u71uFzBuZucCG4Cd7n7Y3Y8AO4GN4bUz3H2Xuztwd8t7JX2GiIj0Sad9Eue4+0vh8c+Bc8LjCeCFpu0OhrR26QcT0tt9xhJmdoOZzZjZzKFDhzo4HBERSdJ1x3WoAXgBeen4M9z9dnefdPfJVatW9TIrIiLLSqdB4hehqYjw/8shvQac37Td6pDWLn11Qnq7zxARkT7pNEjsABojlDYD325Kvy6McroMeCU0GU0DV5nZWaHD+ipgOrz2SzO7LIxquq7lvZI+Q0RE+iT1fhJm9k3gd4G3mNlB6qOUtgH3mdn1wHPAh8PmDwPXAAeAo8DHAdz9sJl9EXg8bPcFd290hn+S+giqKvDd8I82nyEiIn1i9eb+0TE5OekzMzNlZ0NEZKiY2W53n2xN14xrERGJUpAQEZEoBQkREYlSkBARkSgFCRERiVKQEBGRKAUJERGJUpAQEZEoBQkREYlSkBARkSgFCRERiVKQEBGRKAUJERGJUpAQEZEoBQkREYlSkBARkSgFCRERiVKQEBGRKAUJERGJUpAQEZEoBQkREYlSkBARkSgFCRERiVKQEBGRKAUJERGJUpAQEZEoBQkREYlSkBARkaiugoSZ3WRm+8zsx2b2TTM71czWmtljZnbAzO41s1PCtm8Kzw+E19c0vc/NIX2/mW1oSt8Y0g6Y2dZu8ioiIvl1HCTMbAL4L8Cku78NGAM+CvwZcKu7/zZwBLg+7HI9cCSk3xq2w8wuCvtdDGwE/srMxsxsDPgycDVwEfCxsK2IiPRJt81NK4Gqma0ETgNeAq4A7g+v3wVsCo+vDc8Jr19pZhbS73H319z9Z8AB4NLw74C7/9TdXwfuCduKiEifdBwk3L0G/HfgeerB4RVgNzDn7sfCZgeBifB4Angh7HssbP+bzekt+8TSlzCzG8xsxsxmDh061OkhiYhIi26am86ifmW/FjgPOJ16c1Hfufvt7j7p7pOrVq0qIwsiIiOpm+amfwv8zN0PufsC8CBwOTAemp8AVgO18LgGnA8QXj8T+Kfm9JZ9YukiItIn3QSJ54HLzOy00LdwJfA08Cjwe2GbzcC3w+Md4Tnh9Ufc3UP6R8Pop7XAhcCPgMeBC8NoqVOod27v6CK/IiKS08r0TZK5+2Nmdj/wBHAMmAVuBx4C7jGzPw1pd4Rd7gD+2swOAIepF/q4+z4zu496gDkGfMrdFwHM7NPANPWRU3e6+75O8ysiIvlZ/WJ+dExOTvrMzEzZ2RARGSpmttvdJ1vTNeNaRESiFCRERCRKQUJERKIUJEREJKrj0U0iIt2amq2xfXo/L87Nc954lS0b1rFpfeLCClISBQkRKcXUbI2bH9zL/MIiALW5eW5+cC+AAsUAUXOTiJRi+/T+EwGiYX5hke3T+0vKkSRRkBCRUrw4N58rXcqhICEipThvvJorXcqhICEipdiyYR3VythJadXKGFs2rCspR5JEHdciUopG57RGNw02BQkRKc2m9RMKCgNOzU0iIhKlmoSIlEaT6QafgoSIlEKT6YaDmptEpBSaTDccFCREpBSaTDccFCREpBSaTDccFCREpBSaTDcc1HEtIqXQZLrhoCAhIqXRZLrBp+YmERGJUpAQEZEoBQkREYlSkBARkSgFCRERiVKQEBGRqK6ChJmNm9n9ZvYTM3vGzP6VmZ1tZjvN7Nnw/1lhWzOz28zsgJk9ZWbvanqfzWH7Z81sc1P6JWa2N+xzm5lZN/kVEZF8uq1J/CXwt+7+VuAdwDPAVuD77n4h8P3wHOBq4MLw7wbgKwBmdjZwC/Bu4FLglkZgCdt8omm/jV3mV0REcug4SJjZmcDvAHcAuPvr7j4HXAvcFTa7C9gUHl8L3O11u4BxMzsX2ADsdPfD7n4E2AlsDK+d4e673N2Bu5veS0RE+qCbGddrgUPA/zSzdwC7gc8A57j7S2GbnwPnhMcTwAtN+x8Mae3SDyakL2FmN1CvnXDBBRd0fkQiMvR0I6NiddPctBJ4F/AVd18PvMobTUsAhBqAd/EZmbj77e4+6e6Tq1at6vXHiciAatzIqDY3j/PGjYymZmtlZ21odRMkDgIH3f2x8Px+6kHjF6GpiPD/y+H1GnB+0/6rQ1q79NUJ6SIiiXQjo+J1HCTc/efAC2bWWNf3SuBpYAfQGKG0Gfh2eLwDuC6McroMeCU0S00DV5nZWaHD+ipgOrz2SzO7LIxquq7pvUREltCNjIrX7Sqw/xn4hpmdAvwU+Dj1wHOfmV0PPAd8OGz7MHANcAA4GrbF3Q+b2ReBx8N2X3D3w+HxJ4GvAVXgu+GfiEii88ar1BICgm5k1DmrdxuMjsnJSZ+ZmSk7GyJSgkafRHOTU7Uyxpc++HZ1Xqcws93uPtmarvtJiMjI0I2MiqcgISIjRTcyKpbWbhIRkSgFCRERiVKQEBGRKPVJiMjQ0JIb/acgISJDoXV4a2PJDUCBoofU3CQiQ0FLbpRDQUJEhoKW3CiHgoSIDIXY0hpacqO3FCREZChs2bCOamXspLRqZYwtG9ZF9pAiqONaRIaCltwoh4KEiAwNLbnRf2puEhGRKAUJERGJUpAQEZEoBQkREYlSkBARkSgFCRERiVKQEBGRKAUJERGJUpAQEZEoBQkREYlSkBARkSgFCRERiVKQEBGRKAUJERGJ6jpImNmYmc2a2d+E52vN7DEzO2Bm95rZKSH9TeH5gfD6mqb3uDmk7zezDU3pG0PaATPb2m1eRUQknyJqEp8Bnml6/mfAre7+28AR4PqQfj1wJKTfGrbDzC4CPgpcDGwE/ioEnjHgy8DVwEXAx8K2IiLSJ10FCTNbDbwX+Gp4bsAVwP1hk7uATeHxteE54fUrw/bXAve4+2vu/jPgAHBp+HfA3X/q7q8D94RtRUSkT7qtSfwF8IfA8fD8N4E5dz8Wnh8EGreRmgBeAAivvxK2P5Hesk8sfQkzu8HMZsxs5tChQ10ekoiINHQcJMzsfcDL7r67wPx0xN1vd/dJd59ctWpV2dkRERkZ3dzj+nLgA2Z2DXAqcAbwl8C4ma0MtYXVQC1sXwPOBw6a2UrgTOCfmtIbmveJpYuISB90XJNw95vdfbW7r6He8fyIu/9H4FHg98Jmm4Fvh8c7wnPC64+4u4f0j4bRT2uBC4EfAY8DF4bRUqeEz9jRaX5FRCS/bmoSMX8E3GNmfwrMAneE9DuAvzazA8Bh6oU+7r7PzO4DngaOAZ9y90UAM/s0MA2MAXe6+74e5FdEhtDUbI3t0/t5cW6e88arbNmwjk3rE7stpQtWv5gfHZOTkz4zM1N2NkSkh6Zma9z84F7mFxZPpFUrY3zpg29XoOiQme1298nW9F7UJEREemr79P6TAgTA/MIi26f3j1SQGITakoKEiAydF+fmc6UPo9baUm1unpsf3AvQ10ChtZtEZOicN17NlT6M2tWW+klBQkSGzpYN66hWxk5Kq1bG2LJhXUk5ymZqtsbl2x5h7daHuHzbI0zNxkf1D0ptSc1NIjJ0Gs0tZbfXN0vrP8jbfHTeeJVaQkDod21JQUJkhA1Cx2evbFo/MTDHkiUA5O1s37JhXeIIrn7XlhQkREbUoHR8DrOsQTZLAMjbfDQotSUFCZERtVyGifZKniCbJQB00nw0CLUldVyLjKhB6fgcVnlGF2UZbTWsne0KEiIjajkME+2lrEF2arbGq68dW7JdawDYtH6CL33w7UyMVzFgYrw6FDPE1dwkMqIGpeNzWGVpHkpaHgTgrNMq3PL+i5cEgEFoPspLNQmRETWsV66DIkvzUFKTFMBpp6wcmfOsmoTICBvGK9dBkWV00XLo91GQEBGJSAuygzLhrZfU3CSSUZ4lFWR5GNYRS3moJiGSgSamSZJBmfDWSwoSIhloYprE9LLfZxCWVVGQEMlgOXRQymAZlNqr+iREMtDEtOWtjP4o3U9CZIgshw5KSda4oq/NzeO8cUXf60AxKLVXBQmRDDQxbfkq64p+UGqv6pMQyUgT05ansq7oB2VZFdUkRETaKOuKflBqr6pJiAyhQRgauVyUeUU/CLVXBQmRITMoQyOXi+UwYa4dBQmRIaOJff03CFf0ZVGQkL5Q80hxYh2mtbl51m59aODPr74Lw0VBQnpOzSPFiq08Cpw0jh8G7/zquzB8Oh7dZGbnm9mjZva0me0zs8+E9LPNbKeZPRv+Pyukm5ndZmYHzOwpM3tX03ttDts/a2abm9IvMbO9YZ/bzMy6OVgpx6DMHB0VSRP7WrU7v2WuZvv57+wr9LuglXl7fw66qUkcAz7r7k+Y2W8Au81sJ/CfgO+7+zYz2wpsBf4IuBq4MPx7N/AV4N1mdjZwCzBJ/UJot5ntcPcjYZtPAI8BDwMbge92kWcpwaDMHB0VrR2pHtku6fyWeSU/NVvjyNGFxNc6+S6MYq0kb1NcP85BxzUJd3/J3Z8Ij38FPANMANcCd4XN7gI2hcfXAnd73S5g3MzOBTYAO939cAgMO4GN4bUz3H2Xuztwd9N7yRAZlJmjo2TT+gl+uPUKfrbtvUzkOL9l1urafUYn34VBrqF2cnXfyfIf/TgHhUymM7M1wHrqV/znuPtL4aWfA+eExxPAC027HQxp7dIPJqQnff4NZjZjZjOHDh3q7mCkcFr3qLfynN8ya3XtPqOT78Kg1lD/ZGovN927J/daT2kFflLg6cc56DpImNmbgQeAG939l82vhRpArDZcGHe/3d0n3X1y1apVvf44yWlQZo6Oqjznt6xa3dRsjRWRLsXxaqWj70KvjqWbNv6p2Rrf2PX8kkIvy9V9uwI/FnjOrFYS9yny79nV6CYzq1APEN9w9wdD8i/M7Fx3fyk0Gb0c0mvA+U27rw5pNeB3W9L/T0hfnbC9DKHlPM68H7Ke3zJmDzeaURZ96fVitTLG5z5wcUfv24tj6baNf/v0/uhVcWxEWkNs1NqplRV8fdfzS9LnFxY5tbKCamWsp3/PbkY3GXAH8Iy7/3nTSzuAxgilzcC3m9KvC6OcLgNeCc1S08BVZnZWGAl1FTAdXvulmV0WPuu6pvcSkQ6UUatLakYBGDPr6rN7cSzdtvG3a+YxaFsrSWo2rKww5heOR/eZO7rQ879nNzWJy4E/APaa2Z6Q9sfANuA+M7seeA74cHjtYeAa4ABwFPg4gLsfNrMvAo+H7b7g7ofD408CXwOq1Ec1aWSTSJf6XauLFZzH3bvOR9HH0m0bf9oclnaz4pOW/zj6+rHoiLDG5/X679lxkHD3H1APjkmuTNjegU9F3utO4M6E9BngbZ3mUUTKFys4e9UP0s2M7k7y2vx5Z1YrVMaMhcXkRqe0YNNa4K/d+lDb7Y++foyp2VpPg4SWCheRnurn6LZu7yKXN6+tnzc3vwAOKyKXz3kDY9r2R44usOVbT7L+C9/r2WQ6BQkR6al+9oN026eQJ69TszU+e9+TSz5v4bhzxqmVxFnxr752LFchnmV2/cJx58jRhZ7dWlVrN4kMmLTmkmFcIK9f/SCx/oC0kUXNWvPaGBLbfL6B6IgtgFfmF7j1I+/k89/Zd1Kfwtz8Qq7RUlln1zcrekVgBQnpiWEsyAZB2hDMUVyKokhjZokF91iHy77FzveplRWJI7YaGh3K26f3L+l4bleIx343jW0v3/ZIpoCXJyimUXOTFK7bduHlLK25ZJCXohgEsSv7WHqa2PluN+KouQ8jz2ipLL+b97w122ThToNiEgUJKZwKss6lFSqDuhTFoIitYxVLT9I84zrvFXnr3I88s8Kz/G4e/Um2ZYc6DYpJFCSkcCrIOpdWqGixxJO1LqHxnreu6mokVevVfB7Vyhj/48PvOKkZqeh1tbL+hvIExTQKElI4FWSdSytUtFjiG5KaZx7YXeNDl0wsGZ0EZFqPKTY7PE1s9njR62pl+Q0V/X1Qx7UUroz1gcpUZCd90qzb5vdLe305iTXPPPqTQ/xw6xUn0vJ09qctqxGrXTTPHk/6PjTnJybL7yZpm8oK482nrmTu6EJPvg8KElK4USzIYoGgF6ON0oaLLqfFEtsF4KzNmu3a+lvPY2zG9cR4lR9uvSI6uqhxhd/N9yHL76aM35aChPTEKBVk7X74eQqgMgzzUOS0AjfrEhppwaT5HI2fVqGywlg4/kadobW5r93Vfuz78Nn7njyR73ay/G76/dtSkBBJ0S4Q9LKTvtsCPstVbVFBpBfBKC0AZ23WbBdMWs/RkaMLVMaM8WqFV+aXNt+kXcnHRkMtug/tfBYFCRkZndwfOMv2sQK/NjcfnbxVxI1vum3GSitki2oq69UEv7QAnLXppV0wSTpHC4vO6W9ayZ5brkr8/HZX8rHvAyytYQ5LLU9BQkZC3oIqz/axK1EjeTx6EZ307Zotbrp3T6ZCJa2QjX3GjffuYfv0/syFVpFNbs0F54oMAThr88zMc4f55mMvsOjOmBkfuqS+30337kncp9OaYNr8hOYmrmGZOa8hsAJ0d8vGQZB3Al+e7ZOGncZGunR7I52GWCG16J55FnvakMp2BWGeWfJFNbm1DmktKgBPzdZ4YHftxPstuvPA7hpTs7XCh2unzXRuvO8wTThVkJCRWEYjb0GVJz1prHuWoZDdyFJIpRUqaXMq0j4ja6GVtaBNuhBpTktaURXqBW83q8e2K5CLnnfSribR6XIdZVNzkwz8CJ0s8t4sJu/2rc0aaUMhu5XUjp6kXaGS1maf5TOyFFpZOpCTmle23P8kOCdGEsUK2OPu/Gzbe1PzkfcYXpybL3xI6UTke5W0XEc/b8TUDQUJ6fiqZpA63vJO4Ot2wl/S/ka98Lt82yMnDYnMe34a53V+YfFER2haB3m71UNbP7N521MrK1hhcDxyAZyl0MpS0MY6iLPotuBMK5CLHFIa+1611oC2bFjHlm89edJQ28oKG8gJpwoSIyhv4d3pLRsHqeOtXedkbHvo/Aqyef/a3PxJfRS1uXm2fOtJsDcKwqznp/W8LrpTrYzxoUsmeGB3LTGo5flbtG47v3A8mpc8QTOtoO20GaWIQQD9XAEg1/eqtfuiw4Vbe32xZl7gaoGDYHJy0mdmZsrORk+1+1K0FgKQfCXT+n5594k1tzRmpvbT1Gxtyc1dIP0YipJ1jX944/zE/obtzmtjyGaefVr/FlnzOlFwYZPnHDWMVyt87gMXF5KHQar1QnG/n05+uzFmttvdJ1vTVZPoQhlfvLSrxk76Fzq5qh6Ujrc/mdrLN3Y9n9iR3K9+lTzHXJubb/s3TGs/zzOPIyk9S0FtUHigT1xzaMxO6pNodfqbVhY6sa8xN+RzO/Zx4717uPHePZx1WoVb3l9MIMqjk99P0nH1oz9RQaJD3TS3dBNc0r4UnRbeedplp2ZrmcawF6XdukmxANHQj6AVa65LMmbW9m/YSdNf1n2mZmttF6nL8lmdil2IANxY4FyFdr9LYEk/wJGjC/UOdPrbTJr37xw7rtiggyK/9woSHcpyB7HGj+E9b13Foz85xItz85xZrfDq68dyt1U3tJv9e/m2R6IFQFE//MaXtdMx7J3Mit5y/5Mnna/Gj3r79P7UAm+FGWu3PtSTml7jWFr7JNpZdI8GlBfn5rn1I+/M3X6etc09y/nq5Wq9sU70Imetp/0uk2otC4ve95F8eftJYsfVqxn/zRQkOjA1W2t7w/XWiP/1Xc+feH1ufultD7Pc87axBETsR94YWRN7Le22h2n9HGmzYA04tbKCm9rM1u2k9vXHDz61ZBTMwqLz+e/sY67NLSQbGnnNum4RZGt2az2W1jPSLmjEXhs/rdJR01+7fZqPsV2AMOg6kOY5n/Xz91S047zTYNVpTbrfzaTtalaXb3tkyflqN7myWhnraae8Oq5zSuooatZu7ZZ2DJaMBU/7rOZ9s3xic/tr68qXv/7nY0tWvmzcrCVLHlolrXHfCHatYp11U7O1aFNEY79Obi953H1JjQ6S28hja/V30hGbprLC2P7v31HYFW3W70+WztK0GmDSZ1VW2EkjvIATI7X+167niY2rGjNbcoe3rNp1CEP8QqqMARet2nVCt/vtxAY05KWO64K0u3NVa0TPI+s9b1vlKSiPHF3gpnv38K2Z53ni+VdOWvmyVXMVvZNjWjjuJ9630/bTtNm+sbkK//pfnM0P//5w4j6NAJ5Uo0sat590HO3y3I2F48U2e2T5/mRtIkyrASbOg0ho2plfWEztR+pm1npaM05rnwTULw5aJ/41194X3RmvVjCjZzf2gfSZ4bHjKnKeRxIFiQyyVtm/9MG387kd+xILoDRHXn2Nd37+eyctT5yl8N+yYV3bq+1WDtECtFWjrb0IedpPm3+kMePVN5pmmoe/nlmt8PRLvyoo10uldTB3q8jg0y5/sealpAIy6e/W2kSaJ99ptd4zq5XEJpcssjTXNf9GW0c3Jc1TgZMvKrIsHpn3yr5dE3YvZobnMfBBwsw2An8JjAFfdfdtRX9G3nkHyfmEmecO8+rrxzrKw9GF4xwN7bO1ufnMBf+Wbz3Z0eeVIUv7adbz/bkPXHzi8T83tWt3EqDzinUwF6HIDsdYUB4z4++/dM2S9FgBGWs+bQ4MRQbNV18/duLv2MkkzXZX1mlX3VlqXxDvR0yqdd0UhtvG5p409onpxczwPAZ6gT8zGwO+DFwNXAR8zMwuKvIz0ha3y/qlcYev73o+81IDRYmNMS9Kke/eWKCt3U3h0863Ab9/2QVtmzl67bzx6pJF/846rdL1+xoU2uEYK9xj6XnPZXNAS1oor7LC6v08OZxWWbHkN9TP1VHz1IiStk06h80z8ZMWzkxrwi57qY5Br0lcChxw958CmNk9wLXA00V9QKfzDspWGbO+B6RuZG0/bXe+k67Esv59kjpRWzulX33tWGpNpPlH23osa7Y+lJqPdoMMnGLH6sf6qyYitZU83/XWwquTeRBJYqOd+vU7zFMjSqr1peUzqQbSbp9+rBiQZtCDxATwQtPzg8C7WzcysxuAGwAuuOCCXB+QNmSuV23P3Wi0o+b58TXLOhqqU40RLI25IXnaT9NuRJ91+/FqhdPftDLX8NbWORlJeWh3HFkGETjxZqBY4d2pvGPx077rjZFhsb9n7AIgrX+pod0IpH6tjpp19d3YecxSXrSWOe2+82UHCBj8IJGJu98O3A71IbB59k2b+RgbQZN1DHzrTdU7EftxJq1XBLRd1XMiTO5rXSwubYmEdipjxumnrEy8J3BeRa3mGlvzJ8t8g+bzmmf9oCwFTCPQ9GPBubydne3y3806WFnOS/Px92sxviStCzfmHd2U5VhbA14/FyDsxKAHiRpwftPz1SGtMGl/oKQf2nveuop7f/RC4lC6j/zL85dcQc88dzh12F9Mux/nLe+/eMmVbyMPSSuGNr/P5G+d3XbiU9I8AqgXmu97x7kd1RKyyFuwFT3qo5vOwdYCpvWCobnJrcg8p+Wpk5FBzQVkt4v9xX5D7b5DZS7G14/vQGyfQVmAsNlAT6Yzs5XA/wOupB4cHgf+g7vvi+3TyWS6ToestRtKl+UzILkNN++yFWkzbTv90g3aypnDRudPhuk7EJtMN9BBAsDMrgH+gvoQ2Dvd/b+12345LBUuIlK0oZ1x7e4PAw+XnQ8RkeVooOdJiIhIuRQkREQkSkFCRESiFCRERCRq4Ec35WVmh4DnOtz9LcA/FpidYaBjXh50zKOv2+P9LXdfcneykQsS3TCzmaQhYKNMx7w86JhHX6+OV81NIiISpSAhIiJRChInu73sDJRAx7w86JhHX0+OV30SIiISpZqEiIhEKUiIiEiUgkRgZhvNbL+ZHTCzrWXnp9fM7E4ze9nMflx2XvrBzM43s0fN7Gkz22dmnyk7T71mZqea2Y/M7MlwzJ8vO0/9YmZjZjZrZn9Tdl76wcz+wcz2mtkeMyt0GWz1SVD/QlG/b8W/o36L1MeBj7l7YffSHjRm9jvAr4G73f1tZeen18zsXOBcd3/CzH4D2A1sGvG/sQGnu/uvzawC/AD4jLvvKjlrPWdm/xWYBM5w9/eVnZ9eM7N/ACbdvfDJg6pJ1F0KHHD3n7r768A9wLUl56mn3P3vgMNl56Nf3P0ld38iPP4V8Az1e6iPLK/7dXhaCf9G/qrQzFYD7wW+WnZeRoGCRN0E8ELT84OMeAGynJnZGmA98FjJWem50OyyB3gZ2OnuI3/M1G9S9ofA8ZLz0U8OfM/MdpvZDUW+sYKELCtm9mbgAeBGd/9l2fnpNXdfdPd3Ur8//KVmNtJNi2b2PuBld99ddl767N+4+7uAq4FPhebkQihI1NWA85uerw5pMkJCu/wDwDfc/cGy89NP7j4HPApsLDkrvXY58IHQRn8PcIWZfb3cLPWeu9fC/y8D/5t6E3ohFCTqHgcuNLO1ZnYK8FFgR8l5kgKFTtw7gGfc/c/Lzk8/mNkqMxsPj6vUB2b8pNRM9Zi73+zuq919DfXf8SPu/vslZ6unzOz0MBgDMzsduAoobNSiggTg7seATwPT1Ds073P3feXmqrfM7JvA/wXWmdlBM7u+7Dz12OXAH1C/stwT/l1TdqZ67FzgUTN7ivqF0E53XxZDQpeZc4AfmNmTwI+Ah9z9b4t6cw2BFRGRKNUkREQkSkFCRESiFCRERCRKQUJERKIUJEREJEpBQkREohQkREQk6v8DuD3PfmFh9xcAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.scatter(compfreq['compositionality'],compfreq['freq.w1&w2'])\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "viral-baseline", "metadata": {}, "source": [ "We can see that maybe some relation exists, but it is not so straightforward to visualise it. This may be because frequency does not increase linearly (its distribution is [Zipfian](https://en.wikipedia.org/wiki/Zipf%27s_law)), and is easier to analyse in log domain.\n", "* Build a scatter plot to compare compositionality with the logarithm of frequency (instead of raw frequency)" ] }, { "cell_type": "markdown", "id": "biblical-wagon", "metadata": {}, "source": [ "##### Correlation\n", "\n", "Pearson's correlation coefficient can only measure linear correlation. Spearman correlation, on the other hand, measures monotonic correlation, as shown below." ] }, { "cell_type": "code", "execution_count": 122, "id": "weird-border", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Pearson's r = 0.91, Spearman's rho = 1.00\n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX8AAAD4CAYAAAAEhuazAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8/fFQqAAAACXBIWXMAAAsTAAALEwEAmpwYAAAYGklEQVR4nO3df4wc9XnH8c9z63V651Y5I1yKDztGkesI6hgnJ+zUVZVAmoO0YOMQggMpiqK6f5A2adG1drACUSFGvfxoWqVRnZSGCjAYYxYnQXESiBQJ1Q52FnMY4sYlYHsxwSmYRvgE5/PTP27XrNcze7u3OzO7M++XZHl3Zn98j5DPDd/vM8/X3F0AgGzpSXoAAID4Ef4AkEGEPwBkEOEPABlE+ANABs1IegCNOPvss33BggVJDwMAusqePXt+7e5zgs51RfgvWLBAu3fvTnoYANBVzOyFsHNM+wBABhH+AJBBhD8AZBDhDwAZRPgDQAZ1RbUPAGTNhsKoNu86pAl35cy0Ztk83bZqcds+n/AHgA6zoTCqu3cePPV8wv3U83b9AmDaBwA6zOZdh5o6Ph2EPwB0mImQfVbCjk8H4Q8AHaRQLIWey5m17XsIfwDoICM79oeeW7NsXtu+h/AHgA5RKJZUOjYWer6d1T6EPwB0gEKxpPXbRkPPD/T3tvX7CH8A6AAjO/ZrbHwi8FxvPqfhoUVt/T7CHwASNtV0z8bVi7Vq6UBbv5PwB4AEFYolDW/dG3p+oL+37cEvtSn8zexOM3vZzJ6uOnaWmf3QzH5R/nt2+biZ2T+b2QEze8rM3tOOMQBAN7r5oVGNTwTX70cx3VPRriv/b0u6rObYOkmPuvtCSY+Wn0vS5ZIWlv+slfSNNo0BALpKoVjS628Gz/NL0Uz3VLQl/N39J5JeqTm8UtJd5cd3SVpVdfw/fdJOSf1mdm47xgEA3aReTb+kyIJfinbO/xx3P1J+/JKkc8qPByRVN6g4XD52GjNba2a7zWz30aNHIxwmACTjxTqLvFGLZcHX3V1SU00p3H2Tuw+6++CcOYGbzwNAV5tbp3Z/dl8+0u+OMvx/VZnOKf/9cvl4SVL1PcrnlY8BQKYMDy1SvufMfj25HtMtV1wY6XdHGf7bJd1QfnyDpIerjv95uepnuaTXqqaHACAzVi0d0MhHl6i/962r/Nl9eX35o0sine+X2rSZi5ltlvR+SWeb2WFJt0i6Q9IWM/uUpBckXVN++SOSPizpgKTjkj7ZjjEAQDdatXQg8qAP0pbwd/c1IacuDXitS7qxHd8LAN2mUCxpZMd+vXhsTHP7ezU8tKh7wx8AMLVK87ZKD5/SsbFTzdzi/gVAewcAiElQ87ax8Ykp6/2jQPgDQAzqNW9Lot6f8AeAiE3Vq79evX9UCH8AiNgXvrMv1l79jSD8ASBChWJJrx4fDz0fZfO2egh/AIhQvcXcqHr1N4LwB4CITLVDVxLTPRWEPwBEYKpF3v7efGJX/RLhDwCRmGpD9luvjLZx21QIfwCIQL3a/aQWeasR/gAQgbDa/SQXeasR/gAQgeGhRerN5047llRNfxAauwFABCpX953QwTMI4Q8AbRDWqrlTwr4W4Q8ALeqkVs2NYs4fAFrUSa2aG0X4A0ALOq1Vc6MIfwCYpk5s1dwowh8Apmmqu3g7pawzCOEPANPU6Xfx1kO1DwA0aUNhVJt3HZKHnO+Uu3jrIfwBoAkbCqO6e+fB0POdPt1TQfgDQBM27zoUem6gw+7irYfwB4AGFYolTXjYZI/0+LpLYhxNa1jwBYAGTFXWmTOLcTStI/wBoAH1yjolac2yeTGOpnWEPwA0oF5Z5/XL5+u2VYtjHE3rCH8AaEC9zVm6LfglFnwBIFR1m+a39+aVz5nGJ95a8O2Wss4ghD8ABKht03xsbFz5HtPsvryOHR/vuM1ZmkX4A0CAoAXe8ZOuvpkzVPz8hxIaVfsw5w8AAcIWeDu5TXMzuPIHgCqVef6wW7k6uU1zMwh/ACirneev1c0LvLUiD38ze17SbyRNSDrh7oNmdpak+yUtkPS8pGvc/dWoxwIA9dS7kaub+vY0Iq45/w+4+0XuPlh+vk7So+6+UNKj5ecAkKiw+XzTZN+etAS/lNyC70pJd5Uf3yVpVULjAAAViiWtuOOx1M/zV4tjzt8l/cDMXNK/ufsmSee4+5Hy+ZcknVP7JjNbK2mtJM2fPz+GYQLImkKxpJsfGtXrb4b37EnTPH+1OML/j9y9ZGa/K+mHZvbz6pPu7uVfDKo5vknSJkkaHBwM76EKANNQKJY0vHXvaXfs1krbPH+1yMPf3Uvlv182s4ckXSzpV2Z2rrsfMbNzJb0c9TgAoNrIjv11g78yz59Wkc75m9ksM/udymNJH5L0tKTtkm4ov+wGSQ9HOQ4AqDXVzVppnOevFvWV/zmSHrLJTQ5mSLrX3b9vZk9I2mJmn5L0gqRrIh4HAJxmbn+vSnWqe9I4z18t0vB39+ckLQk4/r+SLo3yuwGgnuGhRaFz/tctn5/Kef5q3OELIBOq2zNXOnKOXL1EX/jOPr16fFyS1N+b161XXpj64JcIfwAZUNu2oXRsTOu3jWrj6sWp6NA5HXT1BJB6QW0bxsYnNLJjf0IjSh7hDyD10t6eeToIfwCpF1a2mfZyznoIfwCpNzy0SL353GnH0tq2oVEs+AJIvUr1Tm21TxaqesIQ/gC6XlAZZ22wr1o6kOmwr0X4A+hqhWJJww/s1fjJyZu1SsfGNPzAXkki7Otgzh9A1yoUS/rs/U+eCv6K8ZOuW7fvS2hU3YHwB9CVKlf8YY6Njcc4mu7DtA+ArlMolnTTlr2acLb6mC7CH0DXKBRL+ty2p3R8/OSUr53dl49hRN2L8AfQFTYURnX3zoMNv/6WKy6McDTdj/AH0PGu++Z/6fH/eaXh11+fgZbMrSL8AXS0DYXRhoN/dl9et1yRjZbMrSL8AXSkyo1bYbttVTNJX/3YRYR+Ewh/AB2ntv/+VLKw81a7Ef4AOkYzV/uS1Jvv0cbV7yb4p4HwB9ARNhRGdc/Og2q0cn/FO8/SPX/xvkjHlGaEP4DEFYqlhoM/Z6Y1y+bptlWLIx9XmhH+ABKxoTCqzbsONXyXbm8+p42rFzPF0yaEP4DYNXvD1gD999uO8AcQu827DjX0Oko4o0NXTwCxa2Sqx0QJZ5S48gfQVoViSV/4zj69evytlsq1d97mzEJ/AZjENosxIPwBtEW9Us1Xj49reOtbu2utWTYvcM7/+uXzqeKJCeEPoCWFYkl//+BTeuNE/TbL4xOukR37tWrpwKmAr1T7UL4ZP8IfQFOqN0t/e29ev3njhCZONlau+WLVnbu3rVpM2CeI8AfQkEKxpJsfGtXrb77Vb6fZrRLn9ve2e1iYJsIfQKBm2y1MJZ8zDQ8tatOnoVWEP4AzKnTyPVIDOyU2jD77nYfwBzJmQ2FU9+46qHrT9O0IfhqvdTbCH0ihoFp7ScqZNNGueZwqs2bmTq0F9PfmdeuVXOV3OsIf6GJBIV9vyiaK4Kc2vzslFv5mdpmkr0nKSfqWu9+R1FiQHdWdJCu15YPvOOtU6eLc/l594F1z9NDPSqdVtUhv3ZVa3WSsevOR2rtWTVLfzJyOvzmh/r683KXXxsZD714tFEu6dfu+UxU0s2bmlM/1hL6nUCxpeOtejdckejvn6uvhCr+7mTfYTrWtX2qWk/Tfkv5E0mFJT0ha4+7PBL1+cHDQd+/eHeMI0amqw9akMypR6l2FhnWS7DHVnf8O0pvP6SPvHdCDe0oNbzVY+/7q9sSFYknDD+zVeJ2B1L5nxR2PNbzjVTvMmpnT7VfRUrmbmNkedx8MOpfUlf/Fkg64+3OSZGb3SVopKTD8kR21NxCZSceOj6uvak65IigmK+Ee9AsgrJNks8EvSWPjE031og96f+VuV0ka2bG/bvAHvefFCIO/L9+jL7I9YqolFf4Dkqr/n3hY0rLqF5jZWklrJWn+/PnxjQyxC9vUo/oGotrgr2fzrkOB4T/doA7T6udVh3ejQV79urn9vW2/8mcqJzs6dsHX3TdJ2iRNTvskPBy0qHZhshIyu194palNPRoRFsr1OklOR6ufV323a6NBXv2e4aFFgXP+9Uy1joDsSCr8S5LmVT0/r3wMKRJWbihNXtUPP7C37Vfj0mQoBwnrJJnUnH/13a7DQ4samvOvfk8ltGv/GfflezRzRo6AR11Jhf8Tkhaa2fmaDP1rJX08obGgDarn6isVM/c/cajuVelUc9zTtWbZvMDjYZ0kW6n2qby31WqfyuNmqn0q7yPYMR2JVPtIkpl9WNI/abLU8053vz3stVT7dLZCsaT120ZPuwIOqsSJAzXnwFs6sdpH7v6IpEeS+n60pvpKvydg7rvR4O/N92isTmH6rPKVc+UK2iV6vwNt0LELvug8YVU50523z/eYNq5+t3a/8MoZnzvAXDUQKcIfdVXfVNVOvfkebSzXkVfv7AQgHoQ/QrXSz71SDfO9p47U3cgbQDIIfwQqFEtNB3/OTCfdT6tM4Yoe6EyEPwKbk+XMmgr+2r4zADob4Z9xtWWalUXXZhZxWZwFug/hn2GFYkk3bZneXbaUWwLdjfDPmKlaItdjkq7jJiogFQj/jAjqs9NI8Act4gLofoR/Bky3ZJNFXCC9CP+UC9u9aios4gLpRvinVL12ymG40geyg/BPoaAum2Eqi75c6QPZQvinRKFYOq0XfCPYsg/ILsI/BQrF0pS7QNWi7z2QbYR/Cozs2N9w8FOrD0Ai/Ltas+2WmeYBUEH4d6lmavdZzAVQi/DvQo22W87nTCNXLyH0AZyB8O8izUzzsGkKgHoI/y6woTCqe3cdVCNrugP9vXp83SXRDwpAVyP8O1wz7RlM0vDQomgHBCAVepIeAOq7d1fjwX/d8vlM8wBoCFf+HapQLOlz255qeKqHah4AzSD8O1ChWNLw1r0an6if/DRiAzBdTPt0oJEd+xsI/h6CH8C0ceXfgV6copSzN9+jZ//h8phGAyCNuPLvQHP7e0PPmaSNq98d32AApBLh34GGhxYpn7PAc1T0AGgHpn06QOXO3RePjZ3aKH3k6iWn7cRFUzYA7UT4J6x2163SsTGt3zaqjasXq/j5DyU8OgBpxbRPggrFkm7asveM7RbHxic0smN/QqMCkAVc+SegkV49U1X8AEArCP+YNdqrp17FDwC0immfmN3TQK+e3nyOBm0AIhVZ+JvZrWZWMrMny38+XHVuvZkdMLP9ZjYU1Rg6zYbCqHyKXj05M+7cBRC5qKd9vuruX6o+YGYXSLpW0oWS5kr6kZn9vrtPBH1AWlR236qHXj0A4pLEtM9KSfe5+xvu/ktJByRdnMA4YjWyY3/dbRfp1QMgTlGH/6fN7Ckzu9PMZpePDUg6VPWaw+VjpzGztWa228x2Hz16NOJhRq9e9U5fuVcPwQ8gLi2Fv5n9yMyeDvizUtI3JL1T0kWSjkj6cjOf7e6b3H3Q3QfnzJnTyjA7Qlj1jkn6Ir16AMSspTl/d/9gI68zs29K+m75aUnSvKrT55WPpdrw0KLT7uSV2H0LQHKirPY5t+rpVZKeLj/eLulaM3ubmZ0vaaGkn0Y1jqQUiiWtuOMxnb/ue1pxx2OSpI2rF2ugv1emyd23vvqxi3TbqsXJDhRAJkVZ7fOPZnaRJJf0vKS/lCR332dmWyQ9I+mEpBvTVulTr1/P4+suSXh0ABBh+Lv7J+qcu13S7VF9d9JGduwP7dfDFA+ATsAdvhEIq+yhXw+ATkH4RyCssod+PQA6BY3d2qR6Q5b+vrzyPabxqrad9OsB0EkI/zaoXeB99fi48jlTf29er42Nn9qdi/l+AJ2C8G+DoAXe8QnXrLfN0JO3sBsXgM7DnH8bsMALoNtw5d+Cyjx/WMM2FngBdCrCf5pq5/lrscALoJMR/tMUNM9fMcACL4AOR/hPU9h8vkm0cADQ8VjwnSZu5ALQzQj/aRoeWqTefO60Y8zzA+gWTPtMU2U+v3JXLzdyAegmhH+Dqts3VAc9YQ+gGxH+DdhQGNU9Ow+equev9OeXRPgD6ErM+U9hQ2FUd1cFf0WlPz8AdCPCv45CsaS7dx4MPU/7BgDdivCv49bt++qep6wTQLci/Os4NjYees4kyjoBdC3CP0ShWKp7/rrl81nsBdC1CP8AlaZtYWbNzOm2VYtjHBEAtBfhH6Be07Z8znT7VQQ/gO5G+AeoV8UzcvUSpnsAdD3CP0BYFc9Afy/BDyAVCP8ANG0DkHa0dyir7d3zkfcO6Mc/P0rTNgCpRPjrzC0ZS8fG9OCekjauXkzgA0glpn0UXN1D7x4AaUb4a/JKPwi9ewCkVebDf0Mh/GYuevcASKtMh/9UXTup7gGQVpkN/0KxpL+9/8m6r2GxF0BaZTb8hx94UifrnM+ZxTYWAIhbJsO/UCxpvF7yS1qzbF48gwGABGQy/Kcq4bx++Xy6dgJItZbC38w+amb7zOykmQ3WnFtvZgfMbL+ZDVUdv6x87ICZrWvl+6erXglnj4ngB5B6rV75Py1ptaSfVB80swskXSvpQkmXSfpXM8uZWU7S1yVdLukCSWvKr41VvRLOjy+bH+NIACAZLYW/uz/r7kFzKCsl3efub7j7LyUdkHRx+c8Bd3/O3d+UdF/5tbEKatwmSSveeRZX/QAyIarePgOSdlY9P1w+JkmHao4vC/oAM1sraa0kzZ/f3qvxSglndSM3GrcByJIpw9/MfiTp9wJO3ezuD7d/SJPcfZOkTZI0ODjo7f78VUsHCHsAmTVl+Lv7B6fxuSVJ1bWS55WPqc5xAEBMopr22S7pXjP7iqS5khZK+qkkk7TQzM7XZOhfK+njEY3hlNpe/UzxAMi6lsLfzK6S9C+S5kj6npk96e5D7r7PzLZIekbSCUk3uvtE+T2flrRDUk7Sne6+r6WfYApBvfrXb5ts5sYvAABZZe5tn05vu8HBQd+9e/e03rvijscCWzYP9Pfq8XWXtDo0AOhYZrbH3QeDzqX+Dt+wG7ro1Q8gy1If/mE3dNGrH0CWpTr8C8WSXn/jxBnHe/M5evUDyLTUbuBeu9BbMbsvr1uuuJDFXgCZltor/6BN2SWpb+YMgh9A5qU2/FnoBYBwqQ1/FnoBIFxqwz+ocycLvQAwKbULvnTuBIBwqQ1/ic6dABAmtdM+AIBwhD8AZBDhDwAZRPgDQAYR/gCQQV3Rz9/Mjkp6oYGXni3p1xEPp1Pxs2cTP3s2Nfqzv8Pd5wSd6Irwb5SZ7Q7buCDt+Nn52bOGn721n51pHwDIIMIfADIobeG/KekBJIifPZv42bOp5Z89VXP+AIDGpO3KHwDQAMIfADIoNeFvZpeZ2X4zO2Bm65IeT1zMbJ6Z/djMnjGzfWb2maTHFDczy5lZ0cy+m/RY4mRm/Wa21cx+bmbPmtn7kh5TXMzsb8r/vj9tZpvN7LeSHlNUzOxOM3vZzJ6uOnaWmf3QzH5R/nt2s5+bivA3s5ykr0u6XNIFktaY2QXJjio2JyTd5O4XSFou6cYM/ewVn5H0bNKDSMDXJH3f3d8laYky8s/AzAYk/bWkQXf/A0k5SdcmO6pIfVvSZTXH1kl61N0XSnq0/LwpqQh/SRdLOuDuz7n7m5Luk7Qy4THFwt2PuPvPyo9/o8kAyMwmBmZ2nqQ/lfStpMcSJzN7u6Q/lvTvkuTub7r7sUQHFa8ZknrNbIakPkkvJjyeyLj7TyS9UnN4paS7yo/vkrSq2c9NS/gPSDpU9fywMhSAFWa2QNJSSbsSHkqc/knS30k6mfA44na+pKOS/qM85fUtM5uV9KDi4O4lSV+SdFDSEUmvufsPkh1V7M5x9yPlxy9JOqfZD0hL+Geemf22pAclfdbd/y/p8cTBzP5M0svuvifpsSRghqT3SPqGuy+V9Lqm8Z/+3ag8v71Sk78A50qaZWbXJzuq5PhkvX7TNftpCf+SpHlVz88rH8sEM8trMvjvcfdtSY8nRiskXWlmz2tyqu8SM7s72SHF5rCkw+5e+a+8rZr8ZZAFH5T0S3c/6u7jkrZJ+sOExxS3X5nZuZJU/vvlZj8gLeH/hKSFZna+mc3U5OLP9oTHFAszM03O+z7r7l9Jejxxcvf17n6euy/Q5P/mj7l7Jq4A3f0lSYfMbFH50KWSnklwSHE6KGm5mfWV//2/VBlZ7K6yXdIN5cc3SHq42Q9IxQbu7n7CzD4taYcmV/7vdPd9CQ8rLiskfULSqJk9WT72OXd/JLkhISZ/Jeme8gXPc5I+mfB4YuHuu8xsq6SfabLaragUt3ows82S3i/pbDM7LOkWSXdI2mJmn9Jku/trmv5c2jsAQPakZdoHANAEwh8AMojwB4AMIvwBIIMIfwDIIMIfADKI8AeADPp/crizuZn98wAAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "x = np.random.uniform(0,10,150)\n", "cube = (x-5)**3\n", "r = scipy.stats.pearsonr(x,cube)\n", "rho = scipy.stats.spearmanr(x,cube)\n", "print(\"Pearson's r = {:.2f}, Spearman's rho = {:.2f}\".format(r[0],rho[0]))\n", "plt.scatter(x,cube)\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "administrative-proposition", "metadata": {}, "source": [ "Check the difference between the two correlation scores for the compositionality vs. frequency data:\n", "* Calculate the pearson and Spearman correlation between both variables using Pandas or scipy\n", "* Calculate the correlation manually, using the formula for linear correlation between ranks, and check the results" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" } }, "nbformat": 4, "nbformat_minor": 5 }