1 Star 0 Fork 0

devotion/ControlNetPlus

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
克隆/下载
贡献代码
同步代码
取消
提示: 由于 Git 不支持空文件夾,创建文件夹后会生成空的 .keep 文件
Loading...
README
Apache-2.0

ControlNetPlus

ControlNet++: 适用于图像生成和编辑的一体化ControlNet!

images_display

我们设计了一种新架构,可在条件文本到图像生成中支持10多种控制类型,并能生成与midjourney视觉上相媲美的高分辨率图像。该网络基于原始ControlNet架构,我们提出了两个新模块:1. 扩展原始ControlNet以使用相同的网络参数支持不同的图像条件。2. 在不增加计算负担的情况下支持多个条件输入,这对于希望详细编辑图像的设计师尤其重要。不同条件使用相同的条件编码器,无需增加额外的计算或参数。我们在SDXL上进行了彻底的实验,在控制能力和美学评分方面均表现出色。我们向开源社区发布了方法和模型,让每个人都能享受它。

如果你觉得这有用,请给我一个星标,非常感谢!!

超过500星,发布带有平铺和修复功能的ProMax版本!! 超过1000星,发布适用于SD3的ControlNet++模型!!

模型的优势

  • 使用类似novelai的桶训练,可以生成任何宽高比的高分辨率图像
  • 使用大量高质量数据(超过10000000张图像),数据集涵盖了多种情况
  • 使用类似DALLE.3的重新描述提示,利用CogVLM生成详细描述,具有良好的提示跟随能力
  • 训练期间使用了许多有用的技巧,包括但不限于数据增强、多种损失、多分辨率
  • 与原始ControlNet相比,使用几乎相同的参数,网络参数或计算没有明显增加
  • 支持10多种控制条件,在任何单一条件下的性能与独立训练相比没有明显下降
  • 支持多条件生成,条件融合在训练期间学习,无需设置超参数或设计提示
  • 与其它开源SDXL模型兼容,如BluePencilXL、CounterfeitXL,与其它Lora模型兼容

我们发布的其他热门模型

https://huggingface.co/xinsir/controlnet-openpose-sdxl-1.0
https://huggingface.co/xinsir/controlnet-scribble-sdxl-1.0
https://huggingface.co/xinsir/controlnet-tile-sdxl-1.0
https://huggingface.co/xinsir/controlnet-canny-sdxl-1.0

新闻

  • [07/06/2024] 发布ControlNet++及预训练模型。
  • [07/06/2024] 发布推理代码(单条件 & 多条件)。

待办事项:

  • 为gradio发布ControlNet++
  • 为Comfyui发布ControlNet++
  • 发布训练代码和训练指导。
  • 发布arxiv论文。

视觉示例

Openpose

这是最重要的ControlNet模型之一,我们在训练此模型时使用了许多技巧,与 https://huggingface.co/xinsir/controlnet-openpose-sdxl-1.0 的性能相当,是姿态控制的最新技术。 为了使Openpose模型达到最佳性能,你应该替换controlnet_aux包中的draw_pose函数(Comfyui有其自己的controlnet_aux包),详细信息请参阅推理脚本pose0 pose1 pose2 pose3 pose4

Depth

depth0 depth1 depth2 depth3 depth4

Canny

这是最重要的ControlNet模型之一,canny模型与lineart、anime lineart、mlsd混合训练。在处理任何细线时具有稳健的性能,该模型是降低畸形率的关键,建议使用细线重新绘制手/脚。 canny0 canny1 canny2 canny3 canny4

Lineart

lineart0 lineart1 lineart2 lineart3 lineart4

AnimeLineart

animelineart0 animelineart1 animelineart2 animelineart3 animelineart4

Mlsd

mlsd0 mlsd1 mlsd2 mlsd3 mlsd4

Scribble

这是最重要的ControlNet模型之一,涂鸦模型可以支持任何线宽和任何线型。其性能与 https://huggingface.co/xinsir/controlnet-scribble-sdxl-1.0 相当,让每个人都能成为灵魂画师。 scribble0 scribble1 scribble2 scribble3 scribble4

Hed

hed0 hed1 hed2 hed3 hed4

Pidi(Softedge)

pidi0 pidi1 pidi2 pidi3 pidi4

Teed(512 detect, higher resolution, thiner line)

ted0 ted1 ted2 ted3 ted4

Segment

segment0 segment1 segment2 segment3 segment4

Normal

normal0 normal1 normal2 normal3 normal4

多控制视觉示例

Openpose + Canny

注意:使用姿态骨架来控制人体姿势,使用细线绘制手部/脚部细节以避免畸形。 pose_canny0 pose_canny1 pose_canny2 pose_canny3 pose_canny4 pose_canny5

Openpose + Depth

注意:深度图包含细节信息,建议将深度用于背景,将姿态骨架用于前景。 pose_depth0 pose_depth1 pose_depth2 pose_depth3 pose_depth4 pose_depth5

Openpose + Scribble

注意:涂鸦是一种强大的线条模型,如果你想画一些不需要严格轮廓的东西,你可以使用它。Openpose + 涂鸦让你在生成初始图像时有更多自由,然后你可以使用细线来编辑细节。 pose_scribble0 pose_scribble1 pose_scribble2 pose_scribble3 pose_scribble4 pose_scribble5

Openpose + Normal

pose_normal0 pose_normal1 pose_normal2 pose_normal3 pose_normal4 pose_normal5

Openpose + Segment

pose_segment0 pose_segment1 pose_segment2 pose_segment3 pose_segment4 pose_segment5

数据集

我们收集了大量高质量的图像。图像经过严格筛选和注释,涵盖的题材广泛,包括摄影、动漫、自然、midjourney等。

网络架构

images

在ControlNet++中,我们提出了两个新模块,分别命名为Condition Transformer和Control Encoder。我们对一个旧模块进行了微调,以增强其表示能力。此外,我们提出了一种统一的训练策略,以实现单个和多个控制在一个阶段的实现。

Control Encoder

对于每个条件,我们为其分配一个控制类型id,例如,openpose--(1, 0, 0, 0, 0, 0),depth--(0, 1, 0, 0, 0, 0),多个条件将像(openpose, depth) --(1, 1, 0, 0, 0, 0)。在Control Encoder中,控制类型id将转换为控制类型嵌入(使用正弦位置嵌入),然后我们使用一个线性层将控制类型嵌入投射到与时间嵌入相同的维度。控制类型特征被添加到时间嵌入中,以指示不同的控制类型,此简单设置可以帮助ControlNet区分不同的控制类型,因为时间嵌入倾向于对整个网络产生全局影响。无论是单条件还是多条件,都有一个与其对应的唯一控制类型id。

Condition Transformer

我们扩展了ControlNet,使用相同的网络同时支持多个控制输入。Condition Transformer用于组合不同的图像条件特征。我们的方法有两大创新,首先,不同的条件共享相同的条件编码器,这使得网络更简单和轻量级。这与T2I或UniControlNet等主流方法不同。其次,我们添加了一个transformer层来交换原始图像和条件图像的信息,而不是直接使用transformer的输出,我们使用它来预测原始条件特征的条件偏差。这有点像ResNet,我们实验性地发现这种设置可以明显提高网络的性能。

修改后的Condition Encoder

ControlNet的原始条件编码器是一系列卷积层和Silu激活的堆叠。我们没有改变编码器架构,我们只是增加了卷积通道以获得一个“胖”编码器。这可以明显提高网络的性能。原因是,我们为所有图像条件共享相同的编码器,因此它要求编码器具有更高的表示能力。原始设置对于单条件可能很好,但对于10+条件则不那么好。请注意,使用原始设置也很好,只是在图像生成质量上会有所牺牲。

统一的训练策略

使用单个条件训练可能会受到数据多样性的限制。例如,openpose要求你使用有人的图像进行训练,mlsd要求你使用有线条的图像进行训练,因此可能会影响生成未见对象时的性能。此外,训练不同条件的难度不同,同时使所有条件收敛并达到每个单条件的最佳性能是棘手的。最后,我们将倾向于同时使用两个或多个条件,多条件训练将使不同条件的融合更加顺畅,增加网络的鲁棒性(因为单条件学习的知识有限)。我们提出了一种统一的训练阶段,以同时实现单条件优化收敛和多条件融合。

控制模式

ControlNet++需要向网络传递一个控制类型id。我们将10多种控制合并为6种控制类型,每种类型的含义如下: 0 -- openpose
1 -- depth
2 -- thick line(scribble/hed/softedge/ted-512)
3 -- thin line(canny/mlsd/lineart/animelineart/ted-1280)
4 -- normal
5 -- segment

安装

我们建议使用python版本 >= 3.8,你可以使用以下命令设置虚拟环境:

conda create -n controlplus python=3.8
conda activate controlplus
pip install -r requirements.txt

下载权重

你可以在 /xinsir/controlnet-union-sdxl-1.0 下载模型权重。任何新模型的发布都会放在huggingface上,你可以关注 /xinsir 以获取最新的模型信息。

推理脚本

我们为每个控制条件提供了一个推理脚本。请参考它获取更多细节。

存在一些预处理差异,为了获得最佳的openpose-control性能,请执行以下操作: 在controlnet_aux包中找到util.py,将draw_bodypose函数替换为以下代码

def draw_bodypose(canvas: np.ndarray, keypoints: List[Keypoint]) -> np.ndarray:
    """
    在给定的画布上绘制表示身体姿势的关键点和肢体。

    参数:
        canvas (np.ndarray): 一个3D numpy数组,表示要绘制身体姿势的画布(图像)。
        keypoints (List[Keypoint]): 一个Keypoint对象列表,表示要绘制的身体关键点。

    返回:
        np.ndarray: 一个3D numpy数组,表示带有绘制身体姿势的修改后的画布。

    注意:
        该函数期望关键点的x和y坐标在0和1之间进行归一化。
    """
    H, W, C = canvas.shape

    
    if max(W, H) < 500:
        ratio = 1.0
    elif max(W, H) >= 500 and max(W, H) < 1000:
        ratio = 2.0
    elif max(W, H) >= 1000 and max(W, H) < 2000:
        ratio = 3.0
    elif max(W, H) >= 2000 and max(W, H) < 3000:
        ratio = 4.0
    elif max(W, H) >= 3000 and max(W, H) < 4000:
        ratio = 5.0
    elif max(W, H) >= 4000 and max(W, H) < 5000:
        ratio = 6.0
    else:
        ratio = 7.0

    stickwidth = 4

    limbSeq = [
        [2, 3], [2, 6], [3, 4], [4, 5], 
        [6, 7], [7, 8], [2, 9], [9, 10], 
        [10, 11], [2, 12], [12, 13], [13, 14], 
        [2, 1], [1, 15], [15, 17], [1, 16], 
        [16, 18],
    ]

    colors = [[255, 0, 0], [255, 85, 0], [255, 170, 0], [255, 255, 0], [170, 255, 0], [85, 255, 0], [0, 255, 0], \
              [0, 255, 85], [0, 255, 170], [0, 255, 255], [0, 170, 255], [0, 85, 255], [0, 0, 255], [85, 0, 255], \
              [170, 0, 255], [255, 0, 255], [255, 0, 170], [255, 0, 85]]

    for (k1_index, k2_index), color in zip(limbSeq, colors):
        keypoint1 = keypoints[k1_index - 1]
        keypoint2 = keypoints[k2_index - 1]

        if keypoint1 is None or keypoint2 is None:
            continue

        Y = np.array([keypoint1.x, keypoint2.x]) * float(W)
        X = np.array([keypoint1.y, keypoint2.y]) * float(H)
        mX = np.mean(X)
        mY = np.mean(Y)
        length = ((X[0] - X[1]) ** 2 + (Y[0] - Y[1]) ** 2) ** 0.5
        angle = math.degrees(math.atan2(X[0] - X[1], Y[0] - Y[1]))
        polygon = cv2.ellipse2Poly((int(mY), int(mX)), (int(length / 2), int(stickwidth * ratio)), int(angle), 0, 360, 1)
        cv2.fillConvexPoly(canvas, polygon, [int(float(c) * 0.6) for c in color])

    for keypoint, color in zip(keypoints, colors):
        if keypoint is None:
            continue

        x, y = keypoint.x, keypoint.y
        x = int(x * W)
        y = int(y * H)
        cv2.circle(canvas, (int(x), int(y)), int(4 * ratio), color, thickness=-1)

    return canvas

对于单条件推理,你应该给出一个提示和一个控制图像,更改python文件中相应的行。

python controlnet_union_test_openpose.py

对于多条件推理,你应该确保你的输入image_list与你的control_type兼容,例如,如果你想使用openpose和深度控制,image_list --> [controlnet_img_pose, controlnet_img_depth, 0, 0, 0, 0],control_type --> [1, 1, 0, 0, 0, 0]。请参考controlnet_union_test_multi_control.py获取更多细节。 理论上,你不需要为不同条件设置条件尺度,网络设计和训练以自然融合不同条件。每个条件输入的默认设置是1.0,这与多条件训练相同。 然而,如果你想增加某个特定输入条件的影响,你可以在Condition Transformer模块中调整条件尺度。在该模块中,输入条件将与源图像特征一起加上偏差预测进行相加。 将其乘以特定的尺度将产生很大影响(但可能会导致一些未知的结果)。

python controlnet_union_test_multi_control.py
Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 1. Definitions. "License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document. "Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License. "Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity. "You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License. "Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files. "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types. "Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below). "Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof. "Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution." "Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work. 2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form. 3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed. 4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions: (a) You must give any other recipients of the Work or Derivative Works a copy of this License; and (b) You must cause any modified files to carry prominent notices stating that You changed the files; and (c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and (d) If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License. You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License. 5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions. 6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file. 7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License. 8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages. 9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability. END OF TERMS AND CONDITIONS APPENDIX: How to apply the Apache License to your work. To apply the Apache License to your work, attach the following boilerplate notice, with the fields enclosed by brackets "[]" replaced with your own identifying information. (Don't include the brackets!) The text should be enclosed in the appropriate comment syntax for the file format. We also recommend that a file or class name and description of purpose be included on the same "printed page" as the copyright notice for easier identification within third-party archives. Copyright [yyyy] [name of copyright owner] Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

简介

暂无描述 展开 收起
Python
Apache-2.0
取消

发行版

暂无发行版

贡献者

全部

近期动态

不能加载更多了
马建仓 AI 助手
尝试更多
代码解读
代码找茬
代码优化
1
https://gitee.com/hh19970318/ControlNetPlus.git
git@gitee.com:hh19970318/ControlNetPlus.git
hh19970318
ControlNetPlus
ControlNetPlus
main

搜索帮助