# scrala **Repository Path**: OrganizationStudy/scrala ## Basic Information - **Project Name**: scrala - **Description**: scala 写的爬虫 - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2020-05-01 - **Last Updated**: 2020-12-19 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # scrala [![Codacy Badge](https://api.codacy.com/project/badge/grade/563bbcd12d874610bca7313abe6e6fdd)](https://www.codacy.com/app/gaocegege/scrala) [![Build Status](https://travis-ci.org/gaocegege/scrala.svg?branch=master)](https://travis-ci.org/gaocegege/scrala) ![License](https://img.shields.io/pypi/l/Django.svg) [![scrala published](https://jitpack.io/v/gaocegege/scrala.svg)](https://jitpack.io/#gaocegege/scrala) [![Docker Pulls](https://img.shields.io/docker/pulls/gaocegege/scrala.svg)](https://hub.docker.com/r/gaocegege/scrala/) [![Join the chat at https://gitter.im/gaocegege/scrala](https://badges.gitter.im/Join%20Chat.svg)](https://gitter.im/gaocegege/scrala?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge) scrala is a web crawling framework for scala, which is inspired by [scrapy](https://github.com/scrapy/scrapy). ## Installation ### From Docker [![](https://images.microbadger.com/badges/image/gaocegege/scrala.svg)](https://microbadger.com/images/gaocegege/scrala "Get your own image badge on microbadger.com") [gaocegege/scrala in dockerhub](https://hub.docker.com/r/gaocegege/scrala/) #### Create a Dockerfile in your project. ``` FROM gaocegege/scrala:latest // COPY the build.sbt and the src to the container ``` #### Run a single command in docker ``` docker run -v :/app/src -v :/root/.ivy2 gaocegege/scrala ``` ### From SBT **Step 1.** Add it in your build.sbt at the end of resolvers: resolvers += "jitpack" at "https://jitpack.io" **Step 2.** Add the dependency libraryDependencies += "com.github.gaocegege" % "scrala" % "0.1.5" ### From Source Code git clone https://github.com/gaocegege/scrala.git cd ./scrala sbt assembly You will get the jar in `./target/scala-/`. ## Example import com.gaocegege.scrala.core.spider.impl.DefaultSpider import com.gaocegege.scrala.core.common.response.Response import java.io.BufferedReader import java.io.InputStreamReader import com.gaocegege.scrala.core.common.response.impl.HttpResponse import com.gaocegege.scrala.core.common.response.impl.HttpResponse class TestSpider extends DefaultSpider { def startUrl = List[String]("http://www.gaocegege.com/resume") def parse(response: HttpResponse): Unit = { val links = (response getContentParser) select ("a") for (i <- 0 to links.size() - 1) { request(((links get (i)) attr ("href")), printIt) } } def printIt(response: HttpResponse): Unit = { println((response getContentParser) title) } } object Main { def main(args: Array[String]) { val test = new TestSpider test begin } } Just like the scrapy, what you need to do is define a `startUrl` to tell me where to start, and override `parse(...)` to parse the response of the startUrl. And `request(...)` function is like `yield scrapy.Request(...)` in scrapy. You can get the example project in the `./example/` ## For Developer scrala is under active development, feel free to contribute documentation, test cases, pull requests, issues, and anything you want. I'm a newcomer to scala so the code is hard to read. I'm glad to see someone familiar with scala coding standards could do some code reviews for the repo :)