Explorer les données tabulaires#
Lorsqu’on travaille avec des données sous forme de tableaux, la capacité à obtenir rapidement un aperçu des données est essentielle.
import pandas as pd
Charger des fichiers CSV depuis le disque#
Pour assurer la compatibilité entre différents logiciels de traitement de données tabulaires, le format de fichier CSV est couramment utilisé. Nous pouvons ouvrir ces fichiers en utilisant pandas.read_csv.
data = pd.read_csv('../../data/Results.csv', index_col=0, delimiter=';')
data
| Area | Mean | StdDev | Min | Max | X | Y | XM | YM | Major | Minor | Angle | %Area | Type | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 18.0 | 730.389 | 103.354 | 592.0 | 948.0 | 435.000 | 4.722 | 434.962 | 4.697 | 5.987 | 3.828 | 168.425 | 100 | A |
| 2 | 126.0 | 718.333 | 90.367 | 556.0 | 1046.0 | 388.087 | 8.683 | 388.183 | 8.687 | 16.559 | 9.688 | 175.471 | 100 | A |
| 3 | NaN | NaN | NaN | 608.0 | 964.0 | NaN | NaN | NaN | 7.665 | 7.359 | NaN | 101.121 | 100 | A |
| 4 | 68.0 | 686.985 | 61.169 | 571.0 | 880.0 | 126.147 | 8.809 | 126.192 | 8.811 | 15.136 | 5.720 | 168.133 | 100 | A |
| 5 | NaN | NaN | 69.438 | 566.0 | 792.0 | 348.500 | 7.500 | NaN | 7.508 | NaN | 3.088 | NaN | 100 | A |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 387 | 152.0 | 801.599 | 111.328 | 582.0 | 1263.0 | 348.487 | 497.632 | 348.451 | 497.675 | 17.773 | 10.889 | 11.829 | 100 | A |
| 388 | 17.0 | 742.706 | 69.624 | 620.0 | 884.0 | 420.500 | 496.382 | 420.513 | NaN | NaN | 3.663 | 49.457 | 100 | A |
| 389 | 60.0 | 758.033 | 77.309 | 601.0 | 947.0 | 259.000 | 499.300 | 258.990 | 499.289 | 9.476 | 8.062 | 90.000 | 100 | A |
| 390 | 12.0 | 714.833 | 67.294 | 551.0 | 785.0 | 240.167 | 498.167 | 240.179 | 498.148 | 4.606 | 3.317 | 168.690 | 100 | A |
| 391 | 23.0 | 695.043 | 67.356 | 611.0 | 846.0 | 49.891 | 503.022 | 49.882 | 502.979 | 6.454 | 4.537 | 73.243 | 100 | A |
391 rows × 14 columns
Visualiser les données#
Visualiser les données peut être délicat, surtout lorsqu’on travaille avec de grands tableaux.
data.head(10) # 10 premières lignes
| Area | Mean | StdDev | Min | Max | X | Y | XM | YM | Major | Minor | Angle | %Area | Type | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 18.0 | 730.389 | 103.354 | 592.0 | 948.0 | 435.000 | 4.722 | 434.962 | 4.697 | 5.987 | 3.828 | 168.425 | 100 | A |
| 2 | 126.0 | 718.333 | 90.367 | 556.0 | 1046.0 | 388.087 | 8.683 | 388.183 | 8.687 | 16.559 | 9.688 | 175.471 | 100 | A |
| 3 | NaN | NaN | NaN | 608.0 | 964.0 | NaN | NaN | NaN | 7.665 | 7.359 | NaN | 101.121 | 100 | A |
| 4 | 68.0 | 686.985 | 61.169 | 571.0 | 880.0 | 126.147 | 8.809 | 126.192 | 8.811 | 15.136 | 5.720 | 168.133 | 100 | A |
| 5 | NaN | NaN | 69.438 | 566.0 | 792.0 | 348.500 | 7.500 | NaN | 7.508 | NaN | 3.088 | NaN | 100 | A |
| 6 | 669.0 | 697.164 | 72.863 | 539.0 | 957.0 | 471.696 | 26.253 | 471.694 | 26.197 | 36.656 | 23.237 | 124.340 | 100 | A |
| 7 | 5.0 | 658.600 | 49.161 | 607.0 | 710.0 | 28.300 | 8.100 | 28.284 | 8.103 | 3.144 | 2.025 | 161.565 | 100 | A |
| 8 | 7.0 | 677.571 | 49.899 | 596.0 | 768.0 | 415.357 | 8.786 | 415.360 | 8.804 | 4.110 | 2.168 | 112.500 | 100 | A |
| 9 | 14.0 | 691.071 | 63.873 | 586.0 | 808.0 | 493.286 | 9.000 | 493.295 | 9.016 | 5.120 | 3.481 | 38.802 | 100 | C |
| 10 | 39.0 | 763.615 | 88.786 | 623.0 | 1016.0 | 157.526 | 12.731 | 157.592 | 12.757 | 8.815 | 5.633 | 46.437 | 100 | C |
data.tail(10) # 10 dernières lignes
| Area | Mean | StdDev | Min | Max | X | Y | XM | YM | Major | Minor | Angle | %Area | Type | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 382 | 45.0 | 734.356 | 68.637 | 575.0 | 867.0 | 171.500 | 494.789 | 171.492 | 494.739 | 14.630 | 3.916 | 95.698 | 100 | B |
| 383 | 94.0 | 746.617 | 85.198 | 550.0 | 1021.0 | 194.032 | 498.223 | 194.014 | 498.239 | 17.295 | 6.920 | 52.720 | 100 | B |
| 384 | 35.0 | 776.257 | 74.746 | 611.0 | 961.0 | 268.957 | 493.586 | 268.977 | NaN | NaN | 5.990 | 111.193 | 100 | A |
| 385 | 35.0 | 739.286 | NaN | 593.0 | 928.0 | 291.871 | 493.843 | 291.871 | 493.806 | NaN | 5.352 | 79.368 | 100 | A |
| 386 | 14.0 | 736.143 | 81.533 | 646.0 | 902.0 | 315.000 | 493.000 | 314.989 | 493.003 | NaN | 3.676 | 45.000 | 100 | A |
| 387 | 152.0 | 801.599 | 111.328 | 582.0 | 1263.0 | 348.487 | 497.632 | 348.451 | 497.675 | 17.773 | 10.889 | 11.829 | 100 | A |
| 388 | 17.0 | 742.706 | 69.624 | 620.0 | 884.0 | 420.500 | 496.382 | 420.513 | NaN | NaN | 3.663 | 49.457 | 100 | A |
| 389 | 60.0 | 758.033 | 77.309 | 601.0 | 947.0 | 259.000 | 499.300 | 258.990 | 499.289 | 9.476 | 8.062 | 90.000 | 100 | A |
| 390 | 12.0 | 714.833 | 67.294 | 551.0 | 785.0 | 240.167 | 498.167 | 240.179 | 498.148 | 4.606 | 3.317 | 168.690 | 100 | A |
| 391 | 23.0 | 695.043 | 67.356 | 611.0 | 846.0 | 49.891 | 503.022 | 49.882 | 502.979 | 6.454 | 4.537 | 73.243 | 100 | A |
Aperçu des statistiques descriptives#
Pour avoir un aperçu de la plage de valeurs qui existent dans le tableau donné, nous pouvons demander au DataFrame de se décrire en utilisant DataFrame.describe(). Cela affichera le nombre, la moyenne, l’écart-type et d’autres statistiques descriptives pour chaque colonne de notre tableau.
data.describe()
| Area | Mean | StdDev | Min | Max | X | Y | XM | YM | Major | Minor | Angle | %Area | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 389.000000 | 386.000000 | 388.000000 | 388.000000 | 388.000000 | 389.000000 | 388.000000 | 388.000000 | 386.000000 | 383.000000 | 388.000000 | 390.000000 | 391.0 |
| mean | 107.164524 | 743.455565 | 76.575309 | 610.414948 | 962.922680 | 256.419859 | 254.384088 | 256.183338 | 253.353005 | 12.481016 | 9.500662 | 86.598441 | 100.0 |
| std | 241.037082 | 42.252140 | 31.844864 | 57.156709 | 244.897224 | 152.261694 | 155.080074 | 152.380388 | 154.426250 | 11.979176 | 49.714280 | 60.593686 | 0.0 |
| min | 1.000000 | 587.000000 | 0.000000 | 516.000000 | 587.000000 | 3.978000 | 4.722000 | 4.012000 | 4.697000 | 1.128000 | 1.128000 | 0.000000 | 100.0 |
| 25% | 15.000000 | 717.060750 | 63.861000 | 570.750000 | 847.750000 | 127.142000 | 102.875250 | 126.923250 | 103.813750 | 5.098000 | 3.637250 | 34.517250 | 100.0 |
| 50% | 44.000000 | 741.077500 | 74.727000 | 599.000000 | 917.500000 | 243.300000 | 271.490000 | 242.288000 | 271.272000 | 9.374000 | 5.886000 | 89.703500 | 100.0 |
| 75% | 116.000000 | 767.260750 | 86.826500 | 633.250000 | 1014.500000 | 400.167000 | 395.058250 | 400.363500 | 393.800750 | 16.283000 | 9.017250 | 134.617250 | 100.0 |
| max | 2755.000000 | 912.938000 | 377.767000 | 877.000000 | 3880.000000 | 508.214000 | 503.022000 | 508.169000 | 502.979000 | 144.475000 | 981.000000 | 568.000000 | 100.0 |
Tri dans les tableaux#
Dans de nombreux cas, nous sommes intéressés par les lignes du tableau qui contiennent la valeur maximale, par exemple dans la colonne area nous pouvons trouver le plus grand objet :
data.sort_values(by = "Area", ascending=False)
| Area | Mean | StdDev | Min | Max | X | Y | XM | YM | Major | Minor | Angle | %Area | Type | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 190 | 2755.0 | 859.928 | 235.458 | 539.0 | 3880.0 | 108.710 | 302.158 | 110.999 | 300.247 | 144.475 | 24.280 | 39.318 | 100 | C |
| 81 | 2295.0 | 765.239 | 96.545 | 558.0 | 1431.0 | 375.003 | 134.888 | 374.982 | 135.359 | 65.769 | 44.429 | 127.247 | 100 | B |
| 209 | 1821.0 | 847.761 | 122.074 | 600.0 | 1510.0 | 287.795 | 321.115 | 288.074 | 321.824 | 55.879 | 41.492 | 112.124 | 100 | A |
| 252 | 1528.0 | 763.777 | 83.183 | 572.0 | 1172.0 | 191.969 | 385.944 | 192.487 | 385.697 | 63.150 | 30.808 | 34.424 | 100 | B |
| 265 | 1252.0 | 793.371 | 117.139 | 579.0 | 1668.0 | 262.071 | 394.497 | 262.268 | 394.326 | 60.154 | 26.500 | 50.147 | 100 | A |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 113 | 1.0 | 587.000 | 0.000 | 587.0 | 587.0 | 399.500 | 117.500 | 399.500 | 117.500 | 1.128 | 1.128 | 0.000 | 100 | A |
| 310 | 1.0 | 866.000 | 0.000 | 866.0 | 866.0 | 343.500 | 408.500 | 343.500 | 408.500 | 1.128 | 1.128 | 0.000 | 100 | A |
| 219 | 1.0 | 763.000 | 0.000 | 763.0 | 763.0 | 411.500 | 296.500 | 411.500 | 296.500 | 1.128 | 1.128 | 0.000 | 100 | A |
| 3 | NaN | NaN | NaN | 608.0 | 964.0 | NaN | NaN | NaN | 7.665 | 7.359 | NaN | 101.121 | 100 | A |
| 5 | NaN | NaN | 69.438 | 566.0 | 792.0 | 348.500 | 7.500 | NaN | 7.508 | NaN | 3.088 | NaN | 100 | A |
391 rows × 14 columns