GOPA, A Spider Written in Go.
First of all, get it, two opinions: download the pre-built package or compile it yourself.
Go to Release or Snapshot page, download the right package for your platform.
Note: Darwin is for Mac
make build
to build the Gopa. So far, we have:
gopa
, the main program, a single binary.
config/
, elasticsearch related scripts etc.
gopa.yml
, main configuration for gopa.
By default, Gopa works well except indexing, if you want to use elasticsearch as indexing, follow these steps:
config/elasticsearch/gopa-index-mapping.sh
(!important settings!)Example
curl -XPUT "http://localhost:9200/gopa-index" -H 'Content-Type: application/json' -d'
{
"mappings": {
"doc": {
"properties": {
"host": {
"type": "keyword",
"ignore_above": 256
},
"snapshot": {
"properties": {
"bold": {
"type": "text"
},
"url": {
"type": "keyword",
"ignore_above": 256
},
"content_type": {
"type": "keyword",
"ignore_above": 256
},
"file": {
"type": "keyword",
"ignore_above": 256
},
"ext": {
"type": "keyword",
"ignore_above": 256
},
"h1": {
"type": "text"
},
"h2": {
"type": "text"
},
"h3": {
"type": "text"
},
"h4": {
"type": "text"
},
"hash": {
"type": "keyword",
"ignore_above": 256
},
"id": {
"type": "keyword",
"ignore_above": 256
},
"images": {
"properties": {
"external": {
"properties": {
"label": {
"type": "text"
},
"url": {
"type": "keyword",
"ignore_above": 256
}
}
},
"internal": {
"properties": {
"label": {
"type": "text"
},
"url": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
"italic": {
"type": "text"
},
"links": {
"properties": {
"external": {
"properties": {
"label": {
"type": "text"
},
"url": {
"type": "keyword",
"ignore_above": 256
}
}
},
"internal": {
"properties": {
"label": {
"type": "text"
},
"url": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
"path": {
"type": "keyword",
"ignore_above": 256
},
"sim_hash": {
"type": "keyword",
"ignore_above": 256
},
"lang": {
"type": "keyword",
"ignore_above": 256
},
"screenshot_id": {
"type": "keyword",
"ignore_above": 256
},
"size": {
"type": "long"
},
"text": {
"type": "text"
},
"title": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"version": {
"type": "long"
}
}
},
"task": {
"properties": {
"breadth": {
"type": "long"
},
"created": {
"type": "date"
},
"depth": {
"type": "long"
},
"id": {
"type": "keyword",
"ignore_above": 256
},
"original_url": {
"type": "keyword",
"ignore_above": 256
},
"reference_url": {
"type": "keyword",
"ignore_above": 256
},
"schema": {
"type": "keyword",
"ignore_above": 256
},
"status": {
"type": "integer"
},
"updated": {
"type": "date"
},
"url": {
"type": "keyword",
"ignore_above": 256
},
"last_screenshot_id": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}'
Note: Elasticsearch version should >= v5.3
gopa.yml
, update the elasticsearch's setting: - module: index
enabled: true
ui:
enabled: true
elasticsearch:
endpoint: http://localhost:9200
index_prefix: gopa-
username: elastic
password: changeme
Gopa doesn't require any dependencies, simply run ./gopa
to start the program.
Gopa can be run as daemon(Note: Only available on Linux and Mac):
[10-21 16:01:09] [INF] [instance.go:23] workspace: data/gopa/nodes/0
[gopa] started.Example
➜ gopa git:(master) ✗ ./bin/gopa --daemon
________ ________ __________ _____
/ _____/ \_____ \\______ \/ _ \
/ \ ___ / | \| ___/ /_\ \
\ \_\ \/ | \ | / | \
\______ /\_______ /____| \____|__ /
\/ \/ \/
[gopa] 0.10.0_SNAPSHOT
///last commit: 99616a2, Fri Oct 20 14:04:54 2017 +0200, medcl, update version to 0.10.0 ///
Also run ./gopa -h
to get the full list of command line options.
Usage of ./bin/gopa:
-config string
the location of config file (default "gopa.yml")
-cpuprofile string
write cpu profile to this file
-daemon
run in background as daemon
-debug
run in debug mode, gopa will quit with panic error
-log string
the log level,options:trace,debug,info,warn,error (default "info")
-log_path string
the log path (default "log")
-memprofile string
write memory profile to this file
-pidfile string
pidfile path (only for daemon)
-pprof string
enable and setup pprof/expvar service, eg: localhost:6060 , the endpoint will be: http://localhost:6060/debug/pprof/ and http://localhost:6060/debug/varsExample
➜ gopa git:(master) ✗ ./bin/gopa -h
________ ________ __________ _____
/ _____/ \_____ \\______ \/ _ \
/ \ ___ / | \| ___/ /_\ \
\ \_\ \/ | \ | / | \
\______ /\_______ /____| \____|__ /
\/ \/ \/
[gopa] 0.10.0_SNAPSHOT
///last commit: 99616a2, Fri Oct 20 14:04:54 2017 +0200, medcl, update version to 0.10.0 ///
It's safety to press ctrl+c
stop the current running Gopa, Gopa will handle the rest,saving the checkpoint,
you may restore the job later,the world is still in your hand.
If you are running Gopa
as daemon, you may stop it like this:
kill -QUIT `pgrep gopa`
http://127.0.0.1:9001/
http://127.0.0.1:9001/admin/
You are sincerely and warmly welcomed to play with this project, from UI style to core features, or just a piece of document, welcome! let's make it better.
Released under the Apache License, Version 2.0 .
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。