{ "cells": [ { "cell_type": "markdown", "source": [ "## Introduction to Modeling" ], "metadata": { "id": "gIPdySTgL9k7" } }, { "cell_type": "markdown", "source": [ "\n", "\n", "---\n", "\n" ], "metadata": { "id": "eeMKpX2jMDqM" } }, { "cell_type": "markdown", "source": [ "### Demonstrate idea behind MSE" ], "metadata": { "id": "6uZyaJdzL61x" } }, { "cell_type": "markdown", "source": [ "Complete below" ], "metadata": { "id": "RM5qknxdMU8T" } }, { "cell_type": "code", "source": [], "metadata": { "id": "i-e56REtMXSd" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "\n", "\n", "---\n", "\n" ], "metadata": { "id": "OLNuc4ZfMEwG" } }, { "cell_type": "markdown", "metadata": { "id": "BruPxyad0fWj" }, "source": [ "### Linear regression" ] }, { "cell_type": "markdown", "metadata": { "id": "tObKDZrP0fWk" }, "source": [ "**Simple Example with Simulated Data**\n", "\n", "For this example, we are going to keep it simple, stay in 2 dimensions, and use OLS to fit a line to some data." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "Tmxldb2C0fWk" }, "outputs": [], "source": [ "import numpy as np\n", "%matplotlib inline\n", "# this accommodates high resolution displays\n", "%config InlineBackend.figure_format = 'retina'\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "tzo-Y8-C0fWl" }, "outputs": [], "source": [ "n = 10\n", "np.random.seed(146)\n", "x = np.random.normal(size=(n,1))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "Rixag8Kr0fWl", "collapsed": true }, "outputs": [], "source": [ "noise_strength = 0.5\n", "np.random.seed(147)\n", "noise = np.random.normal(scale=noise_strength, size=(n,1))\n", "y = 1 + 2*x + noise\n", "plt.scatter(x,y, label='Original data', color='k')\n", "plt.xlabel('x')\n", "plt.ylabel('y')\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "zfkOmO1v0fWl" }, "outputs": [], "source": [ "from sklearn.linear_model import LinearRegression as LR\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [], "id": "r2FlKS5w0fWl", "outputId": "b74fc26d-5825-4cb5-a5a5-e2981d7de01f", "colab": { "base_uri": "https://localhost:8080/", "height": 78 } }, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "LinearRegression()" ], "text/html": [ "
LinearRegression()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" ] }, "metadata": {}, "execution_count": 17 } ], "source": [ "lin_reg = LR()\n", "lin_reg.fit(x,y)" ] }, { "cell_type": "code", "source": [ "np.shape(lin_reg.coef_)\n", "print(lin_reg.coef_)" ], "metadata": { "id": "wuYxG8pnAJgu" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "MBNgMrAU0fWm" }, "outputs": [], "source": [ "print('Model coefficient: ', lin_reg.coef_[0][0])\n", "print('Model intercept: ', lin_reg.intercept_[0])\n", "#y = 1.17 + 2.20x" ] }, { "cell_type": "code", "source": [], "metadata": { "id": "3I0cNfP-cvBX" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [], "id": "27KLyRGK0fWm" }, "outputs": [], "source": [ "x" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "gaU9JsPR0fWm" }, "outputs": [], "source": [ "y_pred = lin_reg.predict(x)\n", "y_pred" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "A6fd5KrP0fWm" }, "outputs": [], "source": [ "#What do you notice?\n", "for i in range(len(y)):\n", " print(y[i], y_pred[i])" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "b7DDODQq0fWm" }, "outputs": [], "source": [ "x_range = [min(x), max(x)]\n", "y_pred = lin_reg.predict(x_range)\n", "\n", "plt.figure(figsize = (10,6))\n", "plt.scatter(x,y, label='Original data', color='k')\n", "plt.plot(x_range, y_pred, label='Model', color='r')\n", "plt.legend()\n", "plt.xlabel('x')\n", "plt.ylabel('y')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "id": "fNXDdxGL0fWm" }, "source": [ "We can use the model to make predictions for new x values:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "4OvhmoXb0fWm" }, "outputs": [], "source": [ "np.random.seed(201)\n", "new_x = np.random.normal(size=(20,1))\n", "y_pred = lin_reg.predict(new_x)\n", "\n", "plt.scatter(x,y, label='Original data', color='k')\n", "plt.scatter(new_x, y_pred, label='Predicted values', color='r')\n", "plt.title('Our randomly generated data')\n", "plt.xlabel('x')\n", "plt.ylabel('y')\n", "plt.legend()\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "id": "E7PehEyR0fWm" }, "source": [ "Why did the model pick the line that it did? The goal was to minimize the sum of the squared errors between the model and the data. Let's plot the errors:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "_3fEpRcf0fWn" }, "outputs": [], "source": [ "def VizSquaredErrors(x,y,model):\n", " # Function will plot x and y, show the best fit line as well as the squared errors, and return the raw error terms\n", "\n", " # Fit the model and plot the data\n", " model.fit(x,y)\n", " yp = model.predict(x)\n", " errors = abs(y - yp)\n", " plt.scatter(x,y,color='black',label='Actual Data')\n", "\n", " # Compute a range of x values to plot the model as a continuous line\n", " x_rng = np.linspace(min(x),max(x),20)\n", " y_pred = model.predict(x_rng)\n", " plt.plot(x_rng,y_pred,color='red',label='Fitted Model')\n", "\n", " # Draw squares at each data point indicating the squared error\n", " ax = plt.gca() #get current axis\n", " for i,xi in enumerate(x):\n", " r = plt.Rectangle((xi, min(y[i],yp[i])),width=errors[i],height=errors[i],facecolor='blue',fill=True,alpha=0.1)\n", " ax.add_patch(r) #in this case a square\n", " plt.axis('equal')\n", " plt.xlabel('$x$')\n", " plt.ylabel('$y$')\n", " plt.legend()\n", " plt.show()\n", " return errors" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "34uB7Vzx0fWn" }, "outputs": [], "source": [ "VizSquaredErrors(x,y,lin_reg)" ] }, { "cell_type": "markdown", "metadata": { "id": "RaF2RcYG0fWn" }, "source": [ "The red line is the line that minimizes the sum of the squared errors between the model and the data. That is, **it makes the total area of all the blue squares as small as possible**." ] }, { "cell_type": "markdown", "source": [ "\n", "\n", "---\n", "\n" ], "metadata": { "id": "bVH4RnZPMOJz" } }, { "cell_type": "markdown", "metadata": { "id": "vKeDwP3o0fWn" }, "source": [ "### Scoring the model" ] }, { "cell_type": "markdown", "metadata": { "id": "X4WVwEPq0fWn" }, "source": [ "The score of a model refers to how well the model fits the data. There is also usually more than one way to score a model!\n", "\n", "**Mean-squared error (MSE)** is one way to score a model like this, and it is pretty easy to compute. It is exactly what it sounds like - it is the mean of the squared errors!\n", "\n", "We could calculate this by hand:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "Rsld8xbk0fWn" }, "outputs": [], "source": [ "#We will recreate the model just in case we ran other code in between\n", "lin_reg.fit(x, y)\n", "y_pred = lin_reg.predict(x)\n", "errors = y-y_pred\n", "mse = np.mean(errors**2)\n", "mse" ] }, { "cell_type": "markdown", "metadata": { "id": "9Cf9mNQq0fWn" }, "source": [ "Or use scikit-learn:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "kMa-JSdx0fWn" }, "outputs": [], "source": [ "from sklearn.metrics import mean_squared_error" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "nL37LjY20fWn" }, "outputs": [], "source": [ "y_pred = lin_reg.predict(x)\n", "mse = mean_squared_error(y, y_pred)\n", "print(mse)" ] }, { "cell_type": "markdown", "metadata": { "id": "fLaAsnfW0fWn" }, "source": [ "MSE is useful for comparing between models, but we only have one model with nothing to compare it to!\n", "\n", "The default scoring method that is used when we call lin_reg.score(x,y) is called $R^2$, or the **coefficient of determination**." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "v4_HN3zg0fWn" }, "outputs": [], "source": [ "print('Model score: ', lin_reg.score(x,y))\n", "print('Model MSE: ', mean_squared_error(lin_reg.predict(x),y))" ] }, { "cell_type": "markdown", "metadata": { "id": "vUG1rY3B0fWn" }, "source": [ "**The coefficient of determination is the correlation coefficient squared in simple linear regression**\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "7F6Tq7Ej0fWn" }, "outputs": [], "source": [ "r = np.corrcoef(x,y,rowvar=False)[0][1] #rowvar=False indicates that the input variables are stored as columns, not as rows\n", "r" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "urospfFM0fWn", "collapsed": true }, "outputs": [], "source": [ "print(r**2)" ] }, { "cell_type": "markdown", "source": [ "\n", "\n", "---\n", "\n" ], "metadata": { "id": "SpZx8V_jY7vf" } }, { "cell_type": "markdown", "source": [ "### Complete in Class: Polynomial Regression\n" ], "metadata": { "id": "M7JtQVmhY99-" } }, { "cell_type": "code", "source": [ "b = 10\n", "a1 = 2\n", "a2 = 3\n", "a3=1.5\n", "\n", "n_examples = 100\n", "\n", "X = np.random.uniform(-10,10, n_examples)\n", "\n", "\n", "y = b +a1*X + a2*X**2 + a3*X**3 + np.random.normal(0,50,n_examples)\n", "\n", "plt.scatter(X,y)\n" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 445 }, "id": "N_MYO_9pY847", "outputId": "0bff2877-7c78-4681-8e15-8bdd1d3597ef" }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "" ] }, "metadata": {}, "execution_count": 127 }, { "output_type": "display_data", "data": { "text/plain": [ "
" ], "image/png": "\n" }, "metadata": { "image/png": { "width": 572, "height": 413 } } } ] }, { "cell_type": "code", "source": [ "# let's see if we can fit it with a linear regression using sklearn\n", "\n", "\n", "# declare how many input features for linear model\n", "\n", "\n", "\n" ], "metadata": { "id": "exXYEoTnZhMy" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "poly_reg.intercept_" ], "metadata": { "id": "aeBjBncSauFp" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "poly_reg.coef_\n" ], "metadata": { "id": "9wSurqBEaxvR" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "plt.scatter(X,y)\n", "\n", "# make a dotted curve for fit predictions\n", "Xt = np.linspace(X.min(), X.max(), 1000)\n", "Xt1 = Xt.reshape(-1,1)\n", "Xt2 = Xt1**2\n", "Xt3 = Xt1**3\n", "Xt123 = np.concatenate([Xt1,Xt2,Xt3], axis=1)\n", "plt.plot(Xt, poly_reg.predict(Xt123), color='r', linestyle='--', linewidth=2) #line width\n" ], "metadata": { "id": "_vvVrxhha2nN" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "\n", "\n", "---\n", "\n" ], "metadata": { "id": "4BnWFjIDMP16" } }, { "cell_type": "markdown", "metadata": { "id": "6kjoMbH10fWn" }, "source": [ "### OLS on real data (Multiple linear regression)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "XFi9DLXA0fWn" }, "outputs": [], "source": [ "import pandas as pd\n", "concrete = pd.read_excel('https://archive.ics.uci.edu/ml/machine-learning-databases/concrete/compressive/Concrete_Data.xls')" ] }, { "cell_type": "code", "source": [ "concrete" ], "metadata": { "id": "8jI3om7dJjlT" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "q7EjScES0fWn" }, "outputs": [], "source": [ "concrete.columns = [item.split('(')[0].rstrip().replace(' ','_') for item in concrete.columns]" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 201 }, "id": "kgJc0wEa0fWn", "outputId": "973f60df-ec67-4c90-bc6d-67750e4a1c0f" }, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " Cement Blast_Furnace_Slag Fly_Ash Water Superplasticizer \\\n", "0 540.0 0.0 0.0 162.0 2.5 \n", "1 540.0 0.0 0.0 162.0 2.5 \n", "2 332.5 142.5 0.0 228.0 0.0 \n", "3 332.5 142.5 0.0 228.0 0.0 \n", "4 198.6 132.4 0.0 192.0 0.0 \n", "\n", " Coarse_Aggregate Fine_Aggregate Age Concrete_compressive_strength \n", "0 1040.0 676.0 28 79.986111 \n", "1 1055.0 676.0 28 61.887366 \n", "2 932.0 594.0 270 40.269535 \n", "3 932.0 594.0 365 41.052780 \n", "4 978.4 825.5 360 44.296075 " ], "text/html": [ "\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CementBlast_Furnace_SlagFly_AshWaterSuperplasticizerCoarse_AggregateFine_AggregateAgeConcrete_compressive_strength
0540.00.00.0162.02.51040.0676.02879.986111
1540.00.00.0162.02.51055.0676.02861.887366
2332.5142.50.0228.00.0932.0594.027040.269535
3332.5142.50.0228.00.0932.0594.036541.052780
4198.6132.40.0192.00.0978.4825.536044.296075
\n", "
\n", "
\n", "\n", "
\n", " \n", "\n", " \n", "\n", " \n", "
\n", "\n", "\n", "
\n", " \n", "\n", "\n", "\n", " \n", "
\n", "\n", "
\n", "
\n" ], "application/vnd.google.colaboratory.intrinsic+json": { "type": "dataframe", "variable_name": "concrete", "summary": "{\n \"name\": \"concrete\",\n \"rows\": 1030,\n \"fields\": [\n {\n \"column\": \"Cement\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 104.5071416428718,\n \"min\": 102.0,\n \"max\": 540.0,\n \"num_unique_values\": 280,\n \"samples\": [\n 194.68,\n 480.0,\n 145.4\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Blast_Furnace_Slag\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 86.27910364316895,\n \"min\": 0.0,\n \"max\": 359.4,\n \"num_unique_values\": 187,\n \"samples\": [\n 186.7,\n 212.0,\n 26.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Fly_Ash\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 63.99646938186508,\n \"min\": 0.0,\n \"max\": 200.1,\n \"num_unique_values\": 163,\n \"samples\": [\n 81.8,\n 137.9,\n 107.5\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Water\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 21.355567066911522,\n \"min\": 121.75,\n \"max\": 247.0,\n \"num_unique_values\": 205,\n \"samples\": [\n 164.9,\n 181.1,\n 185.7\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Superplasticizer\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 5.973491650590111,\n \"min\": 0.0,\n \"max\": 32.2,\n \"num_unique_values\": 155,\n \"samples\": [\n 4.14,\n 9.8,\n 6.13\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Coarse_Aggregate\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 77.75381809178927,\n \"min\": 801.0,\n \"max\": 1145.0,\n \"num_unique_values\": 284,\n \"samples\": [\n 852.1,\n 913.9,\n 914.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Fine_Aggregate\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 80.1754273990239,\n \"min\": 594.0,\n \"max\": 992.6,\n \"num_unique_values\": 304,\n \"samples\": [\n 698.0,\n 613.0,\n 689.3\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Age\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 63,\n \"min\": 1,\n \"max\": 365,\n \"num_unique_values\": 14,\n \"samples\": [\n 91,\n 100,\n 28\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Concrete_compressive_strength\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 16.705679174867946,\n \"min\": 2.331807832,\n \"max\": 82.5992248,\n \"num_unique_values\": 938,\n \"samples\": [\n 33.398217439999996,\n 56.63355864,\n 25.559564796\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" } }, "metadata": {}, "execution_count": 27 } ], "source": [ "concrete.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "hyj9BDum0fWn" }, "outputs": [], "source": [ "import seaborn as sns\n", "sns.stripplot(data = concrete, orient = 'h')" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "nsI8vQvR0fWo" }, "outputs": [], "source": [ "X = concrete.drop(columns = 'Concrete_compressive_strength')" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "8uonGP9Y0fWo" }, "outputs": [], "source": [ "y = concrete['Concrete_compressive_strength']" ] }, { "cell_type": "code", "source": [ "from sklearn.preprocessing import StandardScaler\n", "from sklearn.model_selection import train_test_split\n", "\n", "# Splitting the data into training and testing sets\n", "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)\n", "\n", "# Applying StandardScaler\n", "scaler = StandardScaler()\n", "\n", "# Fit the scaler on the training data and transform the training data\n", "X_train_scaled = scaler.fit_transform(X_train)\n", "\n", "# Transform the test data using the same scaler\n", "X_test_scaled = scaler.transform(X_test)" ], "metadata": { "id": "Om1BL3qMV_rX" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "8r5QhVEh0fWo" }, "outputs": [], "source": [ "lin_reg = LR()\n", "lin_reg.fit(X_train_scaled,y_train)\n", "print(lin_reg.score(X_train_scaled,y_train))\n", "print(lin_reg.score(X_test_scaled,y_test))" ] }, { "cell_type": "markdown", "metadata": { "id": "7DenlaKR0fWo" }, "source": [ "We will compare the predicted values to the actual values we had for y. What does it mean to make a prediction?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "HfbuDyrZ0fWo" }, "outputs": [], "source": [ "lin_reg.coef_" ] }, { "cell_type": "code", "source": [ "lin_reg.intercept_" ], "metadata": { "id": "pp7P3oz5R0GX" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [], "id": "Da6A07Es0fWq" }, "outputs": [], "source": [ "X.columns" ] }, { "cell_type": "code", "source": [ "X.head()" ], "metadata": { "id": "q6Wh401uSPej" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "X.iloc[0]" ], "metadata": { "id": "R2uav7dISRu4" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "7bL6vGuV0fWq" }, "outputs": [], "source": [ "obs = np.array(X.iloc[0]).reshape(1,-1)\n", "print(obs)\n", "\n", "obs = scaler.transform(obs)\n", "\n", "print(obs)\n", "print('Coefficients:',lin_reg.coef_)\n", "print('Intercept:', lin_reg.intercept_)\n", "print(sum(obs * lin_reg.coef_) + lin_reg.intercept_)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "tTXXhqVl0fWr", "outputId": "12823515-e2fc-4eda-d86f-8984030a2df2", "colab": { "base_uri": "https://localhost:8080/" } }, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "array([52.43959381])" ] }, "metadata": {}, "execution_count": 120 } ], "source": [ "lin_reg.predict(obs)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "JAsJwEe70fWr" }, "outputs": [], "source": [ "y_pred = lin_reg.predict(X_train_scaled)\n", "\n", "plt.figure(figsize=(8,8))\n", "plt.scatter(y_train, y_pred, alpha=0.5, ec='k')\n", "plt.plot([min(y), max(y)],[min(y),max(y)], ':k')\n", "plt.axis('equal')\n", "plt.xlabel('Actual Compressive Strength', fontsize=14)\n", "plt.ylabel('Predicted Compressive Strength', fontsize=14)\n", "plt.show()" ] }, { "cell_type": "markdown", "source": [ "--\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "--\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n" ], "metadata": { "id": "EQKcRcYVDU3T" } }, { "cell_type": "markdown", "source": [ "\n", "\n", "---\n", "\n" ], "metadata": { "id": "5yapCk2Isef4" } }, { "cell_type": "markdown", "source": [ "### Digression on Epistemic and Aleatoric" ], "metadata": { "id": "-qNasF3EKy_K" } }, { "cell_type": "code", "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "from matplotlib.animation import FuncAnimation, PillowWriter\n", "from IPython.display import Image\n", "\n", "# Enable inline plotting in Jupyter\n", "%matplotlib inline\n", "\n", "# Parameters for the target and darts\n", "target_center = np.array([0, 0]) # True center of the target\n", "\n", "# Simulation parameters\n", "n_darts = 60 # Number of darts to throw\n", "high_precision_var = 0.15 # Low variance for high precision darts\n", "low_precision_var = 0.75 # High variance for low precision darts\n", "high_accuracy_offset = np.array([0, 0]) # No offset for high accuracy\n", "low_accuracy_offset = np.array([2, 2]) # Offset for low accuracy (epistemic uncertainty)\n", "\n", "# Function to generate darts with precision and accuracy\n", "def generate_darts(accuracy_offset, precision_var):\n", " return np.random.randn(n_darts, 2) * precision_var + accuracy_offset\n", "\n", "# Suppress initial plot creation by using a context where we don't show output\n", "plt.ioff() # Turn off interactive plotting\n", "\n", "# Set up the figure and axis\n", "fig, ax = plt.subplots(figsize=(6, 6))\n", "ax.set_xlim(-4, 4)\n", "ax.set_ylim(-4, 4)\n", "ax.set_aspect('equal')\n", "\n", "# Target center visualization\n", "target = plt.Circle(target_center, 0.05, color='red', label='True Target')\n", "ax.add_artist(target)\n", "\n", "# Initialize scatter plot for darts\n", "darts_plot = ax.scatter([], [], s=50, c='blue', alpha=0.7)\n", "\n", "# Function to initialize the plot\n", "def init():\n", " darts_plot.set_offsets(np.empty((0, 2)))\n", " return darts_plot,\n", "\n", "# Function to update the dart positions for each frame\n", "def update(frame):\n", " ax.clear()\n", " ax.set_xlim(-4, 4)\n", " ax.set_ylim(-4, 4)\n", " ax.set_aspect('equal')\n", "\n", " # Vary the frame between different types of darts (precision and accuracy)\n", " if frame < 30:\n", " # High Precision, Low Accuracy (Epistemic Uncertainty)\n", " darts = generate_darts(low_accuracy_offset, high_precision_var)\n", " ax.set_title('High Precision, Low Accuracy (Epistemic Uncertainty)')\n", " else:\n", " # High Accuracy, Low Precision (Aleatoric Uncertainty)\n", " darts = generate_darts(high_accuracy_offset, low_precision_var)\n", " ax.set_title('High Accuracy, Low Precision (Aleatoric Uncertainty)')\n", "\n", " # Redraw the target\n", " ax.add_artist(plt.Circle(target_center, 0.05, color='red', label='True Target'))\n", "\n", " # Update the dart positions\n", " darts_plot = ax.scatter(darts[:, 0], darts[:, 1], s=50, c='blue', alpha=0.7)\n", "\n", " return darts_plot,\n", "\n", "# Create the animation\n", "ani = FuncAnimation(fig, update, frames=np.arange(60), init_func=init, blit=False, repeat=True)\n", "\n", "# Save the animation as a gif file\n", "ani.save(\"darts_simulation.gif\", writer=PillowWriter(fps=5))\n", "\n", "# Turn back on interactive plotting to ensure the animation shows up correctly\n", "plt.ion()\n", "\n", "# Display the gif in the notebook\n", "Image(filename=\"darts_simulation.gif\")\n" ], "metadata": { "id": "ayRY7WzCK7Ff" }, "execution_count": null, "outputs": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.9" }, "colab": { "provenance": [] } }, "nbformat": 4, "nbformat_minor": 0 }