Skip to main content

Ctrl+K

Site Navigation

预告片：使用Python进行生物图像分析
设置你的计算机
Python 基础
编写可持续的代码
图像分析基础

图像文件格式

机器学习基础

3D图像可视化

图像反卷积

机器学习用于图像分割

基于深度学习的图像分割

分割后处理

组织中的邻域分析

高级 Python 编程

GPU加速图像处理

图形用户界面

图像分块处理

Site Navigation

预告片：使用Python进行生物图像分析
设置你的计算机
Python 基础
编写可持续的代码
图像分析基础

图像文件格式

机器学习基础

3D图像可视化

图像反卷积

机器学习用于图像分割

基于深度学习的图像分割

分割后处理

组织中的邻域分析

高级 Python 编程

GPU加速图像处理

图形用户界面

图像分块处理

Ctrl+K

生物图像分析笔记本

Basics

预告片：使用Python进行生物图像分析
设置你的计算机
Python 基础
编写可持续的代码
图像分析基础
图像文件格式
远程文件
机器学习基础
3D图像可视化
图像滤波
图像反卷积
空间变换

Image Segmentation

图像分割
机器学习用于图像分割
基于深度学习的图像分割
分割后处理
斑点检测
- 局部极大值检测
- 斑点检测
表面处理

Quantitative analysis

特征提取
组织中的邻域分析
细胞分类
共定位
- 根据信号强度区分细胞核
算法验证
模拟数据
- 图像形成和图像恢复的模拟

Advanced techniques

高级 Python 编程
GPU加速图像处理
图形用户界面
图像分块处理

Appendix

术语表
版本说明

repository
open issue

.ipynb

缩放

Contents

对不同范围的数据进行聚类
标准缩放

(machine_learning_basics.scaling=)

缩放#

在使用机器学习算法处理数据时，参数的范围至关重要。为了使不同参数处于相同范围，可能需要进行缩放。

另请参阅

使用scikit-learn进行标准化

import matplotlib.pyplot as plt
import pandas as pd
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler

# local import; this library is located in the same folder as the notebook
from data_generator import generate_biomodal_2d_data

data1 = generate_biomodal_2d_data()

plt.scatter(data1[:, 0], data1[:, 1], c='grey')

<matplotlib.collections.PathCollection at 0x7f79e40aeca0>

../_images/f745af080311ea1d1d7171bfd879e36b2be9b194c18ea5cdd7943a5220597f72.png

data2 = generate_biomodal_2d_data()
data2[:, 1] = data2[:, 1] * 0.1

plt.scatter(data2[:, 0], data2[:, 1], c='grey')

<matplotlib.collections.PathCollection at 0x7f7980026b80>

../_images/b4de18f1594eeba63421e15c266c64fea436be6116e0e162adb120748dac4b30.png

对不同范围的数据进行聚类#

现在我们将使用k-均值聚类对这两个_看似相似_的数据集进行聚类。这种效果在使用其他算法时也可以观察到。为了确保我们对两个数据集应用相同的算法并使用相同的配置，我们将其封装成一个函数并重复使用。

def classify_and_plot(data):
    number_of_classes = 2
    classifier = KMeans(n_clusters=number_of_classes)
    classifier.fit(data)
    prediction = classifier.predict(data)

    colors = ['orange', 'blue']
    predicted_colors = [colors[i] for i in prediction]

    plt.scatter(data[:, 0], data[:, 1], c=predicted_colors)

当对两个数据集应用相同的方法时，我们可以观察到中心的数据点被不同地分类。这两个数据集之间的唯一区别是它们的数据范围。数据点在一个轴上的缩放不同。

classify_and_plot(data1)

../_images/cf8b67538674d91521d77f07004b6fc6594d5a4410f6377e6e35906319ba278b.png

classify_and_plot(data2)

../_images/2034f0d6e320695ba2d091dbea92ad3c9f71e2d06c14a9e2dc0421994753beed.png

标准缩放#

标准缩放是一种将数据范围改变为固定范围的技术，例如[0, 1]。它允许在处理原本处于不同范围的数据时获得相同的结果。

def scale(data):
    scaler = StandardScaler().fit(data)
    return scaler.transform(data)

scaled_data1 = scale(data1)

classify_and_plot(scaled_data1)

../_images/ed826aad3000b07e84f342802452aa85391d31e09967961e5efc8dc2ae2a4869.png

scaled_data2 = scale(data2)

classify_and_plot(scaled_data2)

../_images/ed826aad3000b07e84f342802452aa85391d31e09967961e5efc8dc2ae2a4869.png

previous

无监督机器学习

next

3D图像可视化

On this page

对不同范围的数据进行聚类
标准缩放

By Robert Haase, Guillaume Witz, Miguel Fernandes, Marcelo Leomil Zoccoler, Shannon Taylor, Mara Lampert, Till Korten & add-your-name-here-by-sending-a-pull-request-containing-a-notebook

Last updated on None.

Copyright: Licensed CC-BY 4.0 and BSD3 unless mentioned otherwise. Contribution and feedback welcome.