Automated machine learning (automated ML) builds high quality machine learning models for you by automating model and hyperparameter selection. Bring a labelled dataset that you want to build a model for, automated ML will give you a high quality machine learning model that you can use for predictions.
If you are new to Data Science, automated ML will help you get jumpstarted by simplifying machine learning model building. It abstracts you from needing to perform model selection, hyperparameter selection and in one step creates a high quality trained model for you to use.
If you are an experienced data scientist, automated ML will help increase your productivity by intelligently performing the model and hyperparameter selection for your training and generates high quality models much quicker than manually specifying several combinations of the parameters and running training jobs. Automated ML provides visibility and access to all the training jobs and the performance characteristics of the models to help you further tune the pipeline if you desire.
Below are the three execution environments supported by automated ML.
To run these notebook on your own notebook server, use these installation instructions. The instructions below will install everything you need and then start a Jupyter notebook.
The automl_setup script creates a new conda environment, installs the necessary packages, configures the widget and starts a jupyter notebook. It takes the conda environment name as an optional parameter. The default conda environment name is azure_automl. The exact command depends on the operating system. See the specific sections below for Windows, Mac and Linux. It can take about 10 minutes to execute.
Packages installed by the automl_setup script:
For more details refer to the automl_env.yml
Start an Anaconda Prompt window, cd to the how-to-use-azureml/automated-machine-learning folder where the sample notebooks were extracted and then run:
automl_setup
Install "Command line developer tools" if it is not already installed (you can use the command: xcode-select --install
).
Start a Terminal windows, cd to the how-to-use-azureml/automated-machine-learning folder where the sample notebooks were extracted and then run:
bash automl_setup_mac.sh
cd to the how-to-use-azureml/automated-machine-learning folder where the sample notebooks were extracted and then run:
bash automl_setup_linux.sh
To start your Jupyter notebook manually, use:
conda activate azure_automl
jupyter notebook
or on Mac or Linux:
source activate azure_automl
jupyter notebook
NOTE: Please create your Azure Databricks cluster as v7.1 (high concurrency preferred) with Python 3 (dropdown). NOTE: You should at least have contributor access to your Azure subcription to run the notebook.
See Configure automated machine learning experiments to learn how more about the the settings and features available for automated machine learning experiments.
Jupyter notebook provides a File / Download as / Python (.py) option for saving the notebook as a Python file. You can then run this file using the python command. However, on Windows the file needs to be modified before it can be run. The following condition must be added to the main code in the file:
if __name__ == "__main__":
The main code of the file must be indented so that it is under this condition.
conda info
. The platform
should be win-64
for Windows or osx-64
for Mac.conda -V
. If you have a previous version installed, you can update it using the command: conda update conda
.gcc: error trying to exec 'cc1plus': execvp: No such file or directory
, install build essentials using the command sudo apt-get install build-essential
.conda env list
and remove them with conda env remove -n <environmentname>
.If automl_setup_linux.sh fails on Ubuntu Linux with the error: unable to execute 'gcc': No such file or directory
sudo apt-get update
sudo apt-get install build-essential --fix-missing
automl_setup_linux.sh
again.subscription_id = "12345678-90ab-1234-5678-1234567890abcd"
has the valid format.eastus2
, eastus
, westcentralus
, southeastasia
, westeurope
, australiaeast
, westus2
, southcentralus
There were package changes in automated machine learning version 1.0.76, which require the previous version to be uninstalled before upgrading to the new version.
If you have manually upgraded from a version of automated machine learning before 1.0.76 to 1.0.76 or later, you may get the error:
ImportError: cannot import name 'AutoMLConfig'
This can be resolved by running:
pip uninstall azureml-train-automl
and then
pip install azureml-train-automl
The automl_setup.cmd script does this automatically.
If the call ws = Workspace.from_config()
fails:
configuration.ipynb
notebook successfully.configuration.ipynb
, copy the folder aml_config and the file config.json that it contains to the new folder. Workspace.from_config reads the config.json for the notebook folder or it parent folder.configuration.ipynb
notebook again. Changing config.json directly will only work if the workspace already exists in the specified resource group under the specified subscription.Workspace.create
will not create or update a workspace if it already exists, even if the region specified is different.If a sample notebook fails with an error that property, method or library does not exist:
Kernel | Change Kernel
menu option. For Azure Notebooks, it should be Python 3.6
. For local conda environments, it should be the conda envioronment name that you specified in automl_setup. The default is azure_automl. Note that the kernel is saved as part of the notebook. So, if you switch to a new conda environment, you will have to select the new kernel in the notebook.azureml.core.VERSION
in a jupyter notebook cell. You can download previous version of the sample notebooks from GitHub by clicking the Branch
button, selecting the Tags
tab and then selecting the version.Some Windows environments see an error loading numpy with the latest Python version 3.6.8. If you see this issue, try with Python version 3.6.7.
Check the tensorflow version in the automated ml conda environment. Supported versions are < 1.13. Uninstall tensorflow from the environment if version is >= 1.13 You may check the version of tensorflow and uninstall as follows
pip freeze
and look for tensorflow
, if found, the version listed should be < 1.13pip uninstall tensorflow
in the command shell and enter y for confirmation.If a new environment was created after 10 June 2020 using SDK 1.7.0 or lower, training may fail with the above error due to an update in the py-cpuinfo package. (Environments created on or before 10 June 2020 are unaffected, as well as experiments run on remote compute as cached training images are used.) To work around this issue, either of the two following steps can be taken:
Update the SDK version to 1.8.0 or higher (this will also downgrade py-cpuinfo to 5.0.0):
pip install --upgrade azureml-sdk[automl]
Downgrade the installed version of py-cpuinfo to 5.0.0:
pip install py-cpuinfo==5.0.0
There are several reasons why the DsvmCompute.create can fail. The reason is usually in the error message but you have to look at the end of the error message for the detailed reason. Some common reasons are:
Compute name is invalid, it should start with a letter, be between 2 and 16 character, and only include letters (a-zA-Z), numbers (0-9) and \'-\'.
Note that underscore is not allowed in the name.The requested VM size xxxxx is not available in the current region.
You can select a different region or vm_size.Automated ML uses the SSH protocol to communicate with remote DSVMs. This defaults to port 22. Possible causes for this error are:
This is often an issue with the get_data
method.
get_data
method is valid by running it locally.get_data
isn't referring to any local files. get_data
is executed on the remote DSVM. So, it doesn't have direct access to local data files. Instead you can store the data files with DataStore. See auto-ml-remote-execution-with-datastore.ipynbClick here to see the run in Azure portal
link, click Back to Experiment
, click on the highest run number and then click on Logs.Automated ML creates files under /tmp/azureml_runs for each iteration that it runs. It creates a folder with the iteration id. For example: AutoML_9a038a18-77cc-48f1-80fb-65abdbc33abe_93. Under this, there is a azureml-logs folder, which contains logs. If you run too many iterations on the same DSVM, these files can fill the disk. You can delete the files under /tmp/azureml_runs or just delete the VM and create a new one. If your get_data downloads files, make sure the delete them or they can use disk space as well. When using DataStore, it is good to specify an absolute path for the files so that they are downloaded just once. If you specify a relative path, it will download a file for each iteration.
This can be caused by insufficient memory on the DSVM. Automated ML loads all training data into memory. So, the available memory should be more than the training data size. If you are using a remote DSVM, memory is needed for each concurrent iteration. The max_concurrent_iterations setting specifies the maximum concurrent iterations. For example, if the training data size is 8Gb and max_concurrent_iterations is set to 10, the minimum memory required is at least 80Gb. To resolve this issue, allocate a DSVM with more memory or reduce the value specified for max_concurrent_iterations.
This can be caused by too many concurrent iterations for a remote DSVM. Each concurrent iteration usually takes 100% of a core when it is running. Some iterations can use multiple cores. So, the max_concurrent_iterations setting should always be less than the number of cores of the DSVM. To resolve this issue, try reducing the value specified for the max_concurrent_iterations setting.
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。