Siaw Young

JavaScript Object Creation and Prototype Chains

Last updated on 4th of August, 2015.
There are 4 ways to create new objects in JavaScript:
1. Object initializers, also known as literal notation
2. Object.create
3. Constructors
4. ES6 classes
Depending on which method you choose, the newly created object will have a different prototype chain¹.

1. Object initializers
```
let x = { a: 1 }

Object.prototype.isPrototypeOf(x) // true
```
Objects created in this manner will have Object.prototype as its top-level prototype:
```
x => Object.prototype
```
Arrays and functions also have their own literal notation:
```
let y = [1,2,3]

Array.prototype.isPrototypeOf(y) // true
Object.prototype.isPrototypeOf(Array.prototype) // true
```
```
let z = () => {} // ES6 fat arrow syntax

Function.prototype.isPrototypeOf(z) // true
Object.prototype.isPrototypeOf(Function.prototype) // true
```
In these cases, y’s and z’s prototype chains will be
```
y => Array.prototype => Object.prototype
```
and
```
z => Function.prototype => Object.prototype
```
respectively.

2. Object.create

Object.create takes in an arbitrary object (or null) as a first argument, which will be the prototype of the new object².
```
let x = {
  a: 1
}
let y = Object.create(x)

y.a === 1 // true

x.isPrototypeOf(y) // true
Object.prototype.isPrototypeOf(x) // true
```
Thus, y’s prototype chain is:
```
y => x => Object.prototype
```
Object.create is actually quite special because any arbitrary object can be specified as the prototype, so we can do otherwise nonsensical things such as:
```
let x = [1,2,3]
let y = Object.create(x)

y.forEach // is valid, returns function forEach()
x.isPrototypeOf(y) // true
```
In this case, y’s prototype chain will be:
```
y => x => Array.prototype => Object.prototype
```
3. Constructors

When a³ function Thing is invoked with the new keyword, as in let x = new Thing(), it behaves as a constructor function, which means the following things will happen:
1. A new, empty object is created, whose prototype is Thing.prototype (the prototype object of the Thing function object)
2. The body of the function Thing is executed, with its this set to the new empty object
3. The return value of the Thing function is the result of the new Thing() expression, unless no return value is specified, then the new object is returned
```
function Thing() {}

let z = new Thing()

Thing.prototype.isPrototypeOf(z) // true
```
To highlight the fact that the prototype property object is distinct from the object to which it belongs to, notice the following:
```
Function.prototype.isPrototypeOf(Thing) // true
Object.prototype.isPrototypeOf(Thing) // true

Function.prototype.isPrototypeOf(Thing.prototype) // false
Object.prototype.isPrototypeOf(Thing.prototype) // true
```
If we think of Thing.prototype as simply an object, this shouldn’t come as a surprise. In fact, if we were do something like this:
```
Object.prototype.a = 1
Function.prototype.b = 2
z.a // 1
z.b // undefined
```
Thus, z’s prototype chain looks like:
```
z => Thing.prototype => Object.prototype
```
and not
```
z => Thing.prototype => Function.prototype => Object.prototype
```
ES6 Classes

Prototype chains in ES6 classes behave almost exactly like constructors (that is because classes are syntactic sugar around constructors):
```
class Thing {
  a() { return 1 }
  b() { return 2 }
}

class AnotherThing extends Thing {
  b() { return 3 }
  c() { return 4 }
}

let x = new AnotherThing()
x.c = () => { return 5 }

x.a() // 1
x.b() // 3
x.c() // 5

AnotherThing.prototype.isPrototypeOf(x) // true

Thing.prototype.isPrototypeOf(AnotherThing.prototype) // true
```
Thus, x’s prototype chain is:
```
x => AnotherThing.prototype => Thing.prototype => Object.prototype
```
And of course, as mentioned earlier, classes really are just syntactic sugar for constructors:
```
Thing.isPrototypeOf(AnotherThing) // true

Function.prototype.isPrototypeOf(Thing) // true
Function.prototype.isPrototypeOf(AnotherThing) // true
```
See footnote⁴ for a little more detail on how subclassing with extends actually works and how it affects the prototype chain between the subclass and the superclass.

Footnotes
1. I’ve used isPrototypeOf here for better readability, but you can also substitute it for its inverse, __proto__, like
  
  js x.__proto__ === Object.prototype // true ↩
2. And another optional object as a second argument that specifies property descriptors. ↩
3. When I mean “a”, I actually mean any arbitrary function. Of course, functions meant to be used as useful constructors should look a certain way. ↩
4. Part of Babel’s transpiled output for extends includes a _inherits function, the full body of which is below:
  
  function _inherits(subClass, superClass) { if (typeof superClass !== "function" && superClass !== null) { throw new TypeError("Super expression must either be null or a function, not " + typeof superClass); } subClass.prototype = Object.create(superClass && superClass.prototype, { constructor: { value: subClass, enumerable: false, writable: true, configurable: true } }); if (superClass) subClass.__proto__ = superClass; }
  
  _inherits explicitly creates the subclass’s prototype object using Object.create, specifying the super class’s prototype as its prototype. It also sets the subclass’s __proto’s property to the superclass. ↩
Setting Up a Second Graylog2 Server Node

Last updated on 3rd of August, 2015.
Technical Context: Ubuntu 14.04, first Graylog2 IP: 11.11.11.11, second Graylog2 IP: 22.22.22.22

1. Install Graylog2

Instructions here.

(Note that the installing the Graylog web interface, graylog-web, is optional).

2. MongoDB

If your MongoDB instance already runs on a seperate machine from any of your Graylog, all you have to do is adjust your firewall rules for that machine (if any exists) to allow the IP address of the new Graylog2 server node to connect to port 27017 (or whatever custom port you’ve defined for your MongoDB instance).

Otherwise

If your MongoDB instance lives on the same machine as an existing Graylog2 node, that means your current configuration (/etc/mongod.conf) will look something like this (it should, or you’re in big trouble):
```
#port = 27017

# Listen to local interface only. Comment out to listen on all interfaces.
bind_ip = 127.0.0.1
```
This means that your MongoDB instance is only accessible to other processes running on the same machine. If so, you may or may not have authentication set up on your MongoDB instance - it doesn’t really matter.

You will need to change your MongoDB configuration to listen on a publicly accessible interface. Change bind_ip by either commenting it out, or changing it to 0.0.0.0.

Now that your MongoDB instance is publicly accessible, we’re going to have to take necessary security measures.

MongoDB authentication

Here, I’ll cover authentication in MongoDB very quickly. Open a MongoDB shell, make sure that you’re using the correct database, then create a new user with read and write privileges:
```
$ mongo
> use graylog2
> db.createUser({ user:"graylogusername", pwd:"graylogpassword", roles:[{role: "readWrite", db:"graylog2"}] })
```
Once that’s done, we can tell Graylog2 to use these credentials when connecting to MongoDB. In recent versions of Graylog2, the MongoDB connection is recommended to be specified using MongoDB connection string URI format, which may look something like this:
```
mongodb_uri = mongodb://graylogusername:graylogpassword@127.0.0.1:27017/graylog2
```
Firewall

After setting up authentication, you’d also want to set up appropriate firewall policies. Specifically, you should allow only the second Graylog2 server node to connect to MongoDB. I wrote a comprehensive guide to using APF and BFD here, which you should read. The APF rule for allowing 22.22.22.22 to connect to port 27017 looks like this:
```
# from the other graylog node to access MongoDB
tcp:in:d=27017:s=22.22.22.22
```
3. Graylog2

Most of these instructions come straight from the official docs:

Change is_master to false:
```
is_master = false
```
Copy the password_secret from the existing Graylog2 server node:
```
password_secret = KlU1JJYpKeJq9oy5JsWKSA8sf8aJ8anNnisNs1fWEWjAAq7bI246K42idz79r10E5Z1klrGAhtl1Af2fUp4NxNRAAk31lvVX
```
Change the MongoDB connection credentials (see above).

Change the Elasticsearch settings to match your first Graylog2 server node’s (most importantly, the elasticsearch_discovery_zen_ping_unicast_hosts setting, which tells Graylog2 which Elasticsearch nodes to connect to)

4. Graylog2 Web Interface

The web interface runs independently of any Graylog2 server nodes, so all we have to do now is inform it about the additional node that we’re adding¹:
```
$ vim graylog-web-interface.conf
```
If you were previously running the web interface on the same machine as an existing Graylog server node, then you’d see
```
graylog2-server.uris="http://127.0.0.1:12900/
```
which you can append to, like so:
```
graylog2-server.uris="http://127.0.0.1:12900/,http://22.22.22.22:12900/"
```
(In case you were wondering, yes, you can run multiple web interfaces for failover purposes, but I’m guessing the web interface is for internal consumption only so this may be overkill.)

Footnotes
1. More specifically, we’re pointing the web interface to the Graylog2 server nodes’ REST API, which is open on port 12900 by default. ↩
Setting Up Advanced Policy Firewall (APF) and Brute Force Detection (BFD)

Last updated on 31st of July, 2015.
This post is a fairly comprehensive reference to Advanced Policy Firewall (apf-firewall), a user-friendly interface of iptables. We will also cover BFD (bfd), a script that automates IP blocking using APF.

Technical Context: Ubuntu 14.04, APF v9.7, BFD v1.5-2

Installation
```
$ apt-get install apf-firewall

$ wget http://rfxnetworks.com/downloads/bfd-current.tar.gz
$ tar xfz bfd-current.tar.gz
$ cd bfd-1.5-2
$ ./install.sh
```
Basic Usage
```
apf -s - Start
apf -f - Stop
apf -r - Restart
apf -e - Refresh APF rules
apf -a <IP> - manually allow IP
apf -d <IP> - manually block IP
apf -u <IP> - manually unblock IP (works for BFD too)
```
What -a actually does is add the IP entry to the allow_hosts.rules file. -d does the same thing for deny_hosts.rules. -u removes the IP entry from either allow_hosts.rules or deny_hosts.rules, if it exists. All three commands will call apf -e as well.

APF supports CIDR notation for specifying rules for IP blocks, as well as fully qualified domain names (FQDR)¹.

There are basically three ways to use APF:
1. Restrict on a per-IP basis
2. Restrict on a per-port basis
3. Restrict on a IP-port combination basis
Restrict on a per-IP basis

The most straightforward to do this is, as mentioned earlier, by using -a, -d and -u. Of course, you can edit allow_hosts.rules or deny_hosts.rules directly as well (specify each IP address on a new line).

Restrict on a per-port basis

By default, APF blocks a number of known malicious ports (see the main config file for an exhaustive list). To allow all incoming or outgoing connections on a per-port basis, we can edit the IG_TCP_CPORTS or EG_TCP_CPORTS setting respectively in APF’s main config file /etc/apf-firewall/conf.apf:
```
# incoming connections
IG_TCP_CPORTS="22,80,443"
IG_UDP_CPORTS=""

# outgoing connections
EG_TCP_CPORTS="21,25,80,443,43"
EG_UDP_CPORTS="20,21,53"
```
Notably, these settings are overriden by rules in allow_hosts.rules and deny_hosts.rules.

Restrict on a IP-port combination basis

The allow_hosts.rules and deny_hosts.rules is very well commented regarding the syntax for specifying granular restrictions, so I’ll cover them only briefly here:
```
# Syntax:
# proto:flow:[s/d]=port:[s/d]=ip(/mask)
# s – source , d – destination , flow – packet flow in/out
```
For example:
```
tcp:in:d=22:s=192.168.2.1
```
in allow_hosts.rules will allow incoming connections from 192.168.2.1 to port 22.

Multiple IPs to the same port need to be specified on separate lines:
```
tcp:in:d=22:s=192.168.2.1
tcp:in:d=22:s=192.168.31.4
...
```
APF Configuration

Some other noteworthy APF configuration settings in /etc/apf-firewall/conf.apf that you should change:

Development Mode
```
DEVEL_MODE="1"
```
When set to "1", APF will deactivate itself after every 5 minutes. This prevents you from setting stupid rules and cutting yourself out from a remote machine.

Remember to set this to "0" once APF is determined to be functioning as desired.

Monokernel
```
SET_MONOKERN="0"
```
It might be an issue in situations where iptables is installed into the kernel rather than as a package. In those cases, you’ll see something like:
```
Unable to load iptables module (ip_tables), aborting.
```
or
```
$ apf -s
apf(17079): {glob} activating firewall
apf(17120): {glob} kernel version not equal to 2.4.x or 2.6.x, aborting.
```
Setting it to SET_MONOKERN="1" will fix the problem.

Ban Duration
```
RAB_TIMER="300"
```
I recommend setting this a lot higher than the default of 300 seconds. 21600 (6 hours), maybe?

Reactive Address Blocking
```
RAB="0"
```
Set this to “1” to activate APF’s reactive address blocking.

Subscriptions

APF can subscribe to known lists of bad IP addresses. The below is an abridged portion of the config file that deals with this:
```
##
# [Remote Rule Imports]
##
# Project Honey Pot is the first and only distributed system for identifying
# spammers and the spambots they use to scrape addresses from your website.
# This aggregate list combines Harvesters, Spammers and SMTP Dictionary attacks
# from the PHP IP Data at:  http://www.projecthoneypot.org/list_of_ips.php
DLIST_PHP="0"

DLIST_PHP_URL="rfxn.com/downloads/php_list"
DLIST_PHP_URL_PROT="http"

# The Spamhaus Don't Route Or Peer List (DROP) is an advisory "drop all
# traffic" list, consisting of stolen 'zombie' netblocks and netblocks
# controlled entirely by professional spammers. For more information please
# see http://www.spamhaus.org/drop/.
DLIST_SPAMHAUS="0"

DLIST_SPAMHAUS_URL="www.spamhaus.org/drop/drop.lasso"
DLIST_SPAMHAUS_URL_PROT="http"

# DShield collects data about malicious activity from across the Internet.
# This data is cataloged, summarized and can be used to discover trends in
# activity, confirm widespread attacks, or assist in preparing better firewall
# rules. This is a list of top networks that have exhibited suspicious activity.
DLIST_DSHIELD="0"

DLIST_DSHIELD_URL="feeds.dshield.org/top10-2.txt"
DLIST_DSHIELD_URL_PROT="http"
```
BFD Configuration

BFD barely has any configuration (which is A Good Thing™). The below is pretty much it:
```
$ vim /usr/local/bfd/conf.bfd
```
You can set the threshold for the number of attempts before an IP address is blocked:
```
TRIG="15"
```
The default number of 15 is quite generous - I’d lower it to at most 5 or 6.

BFD also has email alerts:
```
EMAIL_ALERTS="1"
EMAIL_ADDRESS="wow@example.com"
```
We can add whitelisted IP addresses in:
```
$ vim /usr/local/bfd/ignore.hosts
```
IP addresses whitelisted by BFD are still subjected to APF’s rules - they do not have any influence on each other.

Finally, and most importantly, BFD is started with:
```
$ bfs -s
```
which will also start a cron job² that goes through your access log files every 3 minutes and tells APF to ban any IP addresses that goes beyond the specified threshold in TRIG.

BFD Logs

BFD logs to /var/log/bfd_log.

Footnotes
1. I won’t be demonstrating this here, but this should apply to virtually any setting where an IP address is otherwise expected. ↩
2. You can verify this by checking /etc/cron.d/bfd. ↩
Load Balancing Graylog2 with HAProxy

Last updated on 29th of July, 2015.
This post covers quick and dirty TCP load balancing with HAProxy, and some specific instructions for Graylog2.

(As an aside, if you’re looking for a gem that can log Rails applications to Graylog2, the current official gelf-rb gem only supports UDP. I’ve forked the repo and merged @zsprackett’s pull request in, which adds TCP support by adding protocol: GELF::Protocol::TCP as an option. I’ll remove this message when the official maintainer for gelf-rb merges @zsprackett’s pull request in.)

Technical context: Ubuntu 14.04, CentOS 7

1. Install HAProxy

On Ubuntu 14.04:
```
$ apt-add-repository ppa:vbernat/haproxy-1.5
$ apt-get update
$ apt-get install haproxy
```
On CentOS 7:
```
# HAProxy has been included as part of CentOS since 6.4, so you can simply do
$ yum install haproxy
```
2. Configure HAProxy

You’ll probably need root privileges to configure HAProxy:
```
$ vim /etc/haproxy/haproxy.cfg
```
There will be a whole bunch of default configuration settings. You can delete those that are not relevant to you, but there’s no need to at this moment if you just need to get started.

Simply append to the file the settings that we need:
```
listen graylog :12203
    mode tcp
    option tcplog
    balance roundrobin
    server graylog1 123.12.32.127:12202 check
    server graylog2 121.151.12.67:12202 check
    server graylog3 183.222.32.27:12202 check
```
This directive block named graylog tells HAProxy to:
1. Listen on port 12203 - you can change this if you want
2. Operate in TCP (layer 4) mode
3. Enable TCP logging (more info here)
4. Use round robin load balancing, in which servers are distributed connections in turn. You can even specify weights for different servers with different hardware configurations. More on the different load balancing algorithms that HAProxy supports here
5. Proxy requests to these three backend Graylog2 servers through port 12202, and check their health periodically
3. Create a TCP input on Graylog2

Creating a TCP input on Graylog2 through the web interface is trivial. We’ll use port 12202 here as an example:

Go to System/Inputs > Inputs

Create a new GELF TCP input

Input (or ignore) your desired settings

Ta-da!

3. Start HAProxy
```
$ service haproxy start
```
You can test if HAProxy is proxying the requests successfully by sending TCP packets through to HAProxy and checking the number of active connections on Graylog2’s input page.
```
# assuming 123.41.61.87 is the IP of the machine running HAProxy
# run this on your dev machine
$ nc 123.41.61.87 12203
```
You should see something like:

Great success

4. Change HAProxy’s health check to Graylog2’s REST API

The last thing to do, and really, the only part of HAProxy that’s specific to Graylog2, is to change the way HAProxy checks the health of its backend Graylog2 servers.

Normally, HAProxy defaults to simply establishing a TCP connection.

However, HAProxy accepts a directive called option httpchk, in which HAProxy will send a HTTP request to some specified URL and check for the status of the response. 2xx and 3xx responses are good, anything else is bad.

For Graylog2, they’ve exposed a REST API for the express purpose of allowing load balancers like HAProxy to check its health:

The status knows two different states, ALIVE and DEAD, which is also the text/plain response of the resource. Additionally, the same information is reflected in the HTTP status codes: If the state is ALIVE the return code will be 200 OK, for DEAD it will be 503 Service unavailable. This is done to make it easier to configure a wide range of load balancer types and vendors to be able to react to the status.

The REST API is open on port 12900 by default, so you can try the endpoint out:
```
# the IP address of one of our Graylog2 servers
$ curl http://123.12.32.127:12900/system/lbstatus
ALIVE
```
(The web interface also exposes the full suite of endpoints that the REST API provides, which you can access by System > Nodes > API Browser)

With that, we can indicate in the HAProxy configuration that we want to use Graylog2’s health endpoint:
```
listen graylog :12203
    mode tcp
    option tcplog
    balance roundrobin
    option httpchk GET /system/lbstatus
    server graylog1 123.12.32.127:12202 check port 12900
    server graylog2 121.151.12.67:12202 check port 12900
    server graylog3 183.222.32.27:12202 check port 12900
```
Parting Notes

Right now, we have HAProxy installed on one instance that load balances requests between multiple instances running Graylog2. However, there’s still a single point of failure (if HAProxy goes down).

Ideally, the best way to set up what is commonly called a high availability cluster would be to set up several HAProxy nodes, then employ Virtual Router Redundancy Protocol (VRRP). Under VRRP, there is an active HAProxy node and one or more passive HAProxy nodes. All of the HAProxy nodes share a single floating IP. The passive HAProxy nodes will ping the active HAProxy node periodically. If the active HAProxy goes down, the passive HAProxy nodes will elect the next active HAProxy node amongst themselves to take over the floating IP. Keepalived is a popular solution for implementing VRRP.

Sadly, VPSes such as Digital Ocean do not support multiple IPs per instance, making Keepalived and VRRP impossible to implement (there’s a open suggestion on DO where many users are asking for this feature). To mitigate this issue somewhat, we’ve used Monit to monitor and automatically reboot HAProxy if it goes down. It’s not foolproof, and we’ll be on the lookout to improve this setup.
Topick - JavaScript NLP library to extract keywords from HTML documents

Last updated on 28th of July, 2015.
I recently wrote Topick, a library for extracting keywords from HTML documents.

Check it out here!

The initial use case for it was to be used as part of a Telegram bot which would archive shared links by allowing the user to tag the link with keywords and phrases:

This blog post details how it works.

HTML parsing

Topick uses htmlparser2 for HTML parsing. By default, Topick will pick out content from p, b, em, and title tags, and concatenate them into a single document.

Cleaning

That document is then sent for cleaning, using a few utility functions from the textminer library to:
- Expand contractions (e.g. from I’ll to I will)
- Remove interpunctuation (e.g. ? and !)
- Remove excess whitespace between words
- Remove stop words using the default stop word dictionary
- Remove stop words specified by the user
Stop words are common words that are unlikely to be classified as keywords. The stop word dictionary used by Topick is a set union of all six English collections found here.

Generating keywords

Finally, the cleaned document can be used as input for generating keywords. Topick includes three methods of doing so, which all relies on different combinations of nlp-compromise library functions to generate the final output:
- n-grams
- namedentities
- combined
The n-grams method relies solely on the generateNGrams method to generate keywords/phrases based on frequency. The generated words or phrases are then sorted by frequency and filtered (those with frequency 1 are discarded).

The namedentities method relies on the generateNamedEntitiesString method to guess keywords or phrases that are capitalized/don’t belong in the English language/are unique phrases. There’s also a frequency-based criterion here.

The combined method combines both by running both n-grams and namedentities and merging their output together before sorting them and filtering them. This method is the slowest but generally produces the best and most consistent output.

Custom options

Topick includes a few options for the user to customize.

ngram
```
{ min_count: 3, max_size: 1 }
```
The ngram method defines options for n-gram generation.

min_count is the minimum number of times a particular n-gram should appear in the document before being considered. There should be no need to change this number.

max_size is the maximum size of n-grams that should be generated (defaults to generating unigrams).

progressiveGeneration

This options defaults to true.

If set to true, progressiveGeneration will progressively generate n-grams with weaker settings until the specified number of keywords set in maxNumberOfKeywords is hit.

For example: if for a min_count of 3 and maxNumberOfKeywords of 10, Topick only generates 5 keywords initially, then progressiveGeneration will decrease the min_count to 2, and then to 1, until 10 keywords can be generated.

progressiveGeneration does not guarantee that maxNumberOfKeywords keywords will be generated (like if even at min_count of 1, your specified maxNumberOfKeywords still cannot be reached).