Creating Your Own Istio (Part 2.5)
In the last post we create our first container decorator, a container that when included into an arbitrary pod enhance the main container. In our particular case we created a container that override the HTTP 404 responses as an introduction, in this post we are going to build upon and develop some functionalities to monitor the performance of a running service.
Revisiting The Code
This is the code from the last post:
let { Ambassador } = require('../node-ambassador/')
const TARGET = process.env['target_port'] || 8087
const PORT = process.env['port'] || 8080
const HTTP404 = `...`
function override_404({service, server}) {
service.on('http:404', () => server.respond(HTTP404))
}
new Ambassador({port: PORT, target: TARGET})
.tunnel({override_404})
console.log(`listening for request in ${PORT} and targeting ${TARGET}`)
This code just connects to any server running in TARGET_PORT
and override their HTTP 404 responses with the content from the HTTP404
string. We are going to use this as our starting point.
Network Profiler
Configuration
We can start by writing a new function to subscribe to the Ambassador::tunnel
method.
function telemetry({service, server}) {}
new Ambassador({port: PORT, target: TARGET})
.tunnel({override_404, telemetry})
The function telemetry
will get called each time a new HTTP request is made by a HTTP client.
Request
The first functionality we want to write is the ability to register the HTTP request details, this will tell us what resources people or other services are looking in our web service.
function telemetry({service, server}) {
server.on('http:data', (header) => {} )
}
We setup a listener for the event http:data
in the server object and we receive a header
object with two fields:
method The HTTP Method
GET, POST, DELETE, PUT,...
.endpoint The resource URL
/Resource/1
.
Now we save the state into a class.
class Stats {
readRequest(header) {
this.method = header.HTTPMethod
this.endpoint = header.HTTPResource
return this
}
}
We create a new class Stats and create the readRequest
method taking saving the fields and returning this
object. By returning this
just make it easy for us to chain calls in the form of stats.a().b()
.
We instantiate the Stats class and bind the readRequest
method to the event http:data
:
class Stats {
readRequest(header) { /*...*/ }
}
let stats = new Stats()
function telemetry({service, server}) {
server.on('http:data', (header) => stats.readRequest(header) )
}
Tracking Responses
To capture responses, we need to listen for the http:data
event but this time from the service object.
service.on('http:data', (header) => {})
We listen the service
object for responses which generates a HTTP response object with the following shape:
{"status":"404","message":"File not found"}
What we do now is save this data:
class Stats {
readResponse(response) {
this.response = response
return this
}
}
We just need to again plug this method:
class Stats {
readRequest(header) { /*...*/ }
readResponse(response) { /*...*/ }
}
let stats = new Stats()
function telemetry({service, server}) {
server.on('http:data', (header) => stats.readRequest(header) )
service.on('http:data', (header) => stats.readResponse(header) )
}
We have information about the request and responses. Next step is to calculate the time it takes for the target container to resolve a request.
Latency
We are going to write two methods to calculate how much it takes for our a service to respond, one method will time the beginning of the service request (startProfile
) and a second method will time the response (endProfile
).
Then we are going to calculate difference and we got our total time:
latency = end_time - start_time
Let's implement this idea.
class Stats {
//...
startProfile(){
this.start = new Date().getTime()
return this
}
endProfile() {
this.end = new Date().getTime() - this.start
return this
}
// ...
}
We plug this two methods one at the start of the request server->startProfile
the other when the response is being delivered service->endProfile
.
class Stats {
readRequest(header) { /*...*/ }
readResponse(response) { /*...*/ }
startProfile(){ /*...*/ }
endProfile() { /*...*/ }
}
let stats = new Stats()
function telemetry({service, server}) {
server.on('http:data', (header) => stats.readRequest(header)
.startProfile())
service.on('http:data', (header) => stats.readResponse(header)
.endProfile())
}
We used the method chaining discussed before, this way we just subscribe once.
Saving State
To make our Stat class useful we are going to persist its state by creating a nice in-memory database.
class Stats {
constructor(){
this.db = {}
}
}
To save the object state in memory we are going to write the method save
and to retrieve the data the method all
.
class Stats {
constructor(){
this.db = {}
}
new(){
let URL = this.endpoint
this.db[URL] = this.db[URL] || {}
this.db[URL] = { /* state */ }
}
get all(){
return this.db
}
/*..*/
}
This would be enough for now for the db, let's focus now on gathering more information.
Resource Type
As you may notice our network profiler doesn't make distinction between a file or a URL. We can solve that by writing a function to detect file extensions.
class Stats {
/*...*/
isFile(endpoint) {
const file_regexp = /\.[0-9a-z]+$/i
return endpoint.search(file_regexp) !== -1
}
/*...*/
}
This is good enough for our purposes, let's persist this information.
class Stats {
/*...*/
save(){
let URL = this.endpoint
this.db[URL] = this.db[URL] || {}
this.db[URL] = { file: this.isFile(URL) }
}
/*...*/
}
Pod Name
In case of problems we would like to know where is happening, so it can be interesting to save the pod name.
If you remember in first post we said that the pod simulates a machine, knowing this we can know the pod name by just looking at the
hostname
which is simulated by the Linux UTS Namespace.
class Stats {
constructor(){
this.os = require('os')
}
/*...*/
host() {
return this.os.hostname()
}
/*...*/
}
We use Node.js os::hostname API to get the hostname.
class Stats {
/*...*/
save(){
let URL = this.endpoint
this.db[URL] = this.db[URL] || {}
this.db[URL] = {
file: this.isFile(URL),
pod: this.host()
}
}
/*...*/
}
Registry
To simplify the diagnose of problems is smart to keep a track record, so we can correlate information and research for obscure runtime errors.
Let's start by writing a new method called history:
history(obj) {
let history = obj.history || []
return history
}
This will read an arbitrary object and will check for a field called history
if its not there, it create a new field with an array.
Timing
We save here the response latency, request and response. This will give us a picture of the transaction.
history(obj) {
let history = obj.history || []
history.push({
request: {endpoint: this.endpoint, method: this.method},
response: this.response,
time: this.end + 'ms',
started: this.start
})
return history
}
This generates the following data structure:
{
"request": {
"endpoint": "/",
"method": "GET"
},
"response": {
"status": "200",
"state": "OK"
},
"time": "9ms",
"started": 1544042305989
}
Container Resource
Another useful information we can extract from the pod is the memory and CPU usage. You know, Linux can kill our container if we exceed the memory constraints, this feature we can keep track of resources in the container.
We are going to create a new method called resources
:
class Stats {
constructor(){
this.os = require('os')
/*...*/
}
/*...*/
resources(){
return {
free_memory: this.os.freemem(),
total_memory: this.os.totalmem(),
cpus: this.os.cpus()
}
}
}
Again here we just use some os functions to get the job done, I think this can be improve by studying the /proc
directory.
{
/*...*/
"resource": {
"free_memory": 31891456,
"total_memory": 8589934592,
"cpu": "cpus": [{
"model": "Intel Xeon...",
"speed": 5300,
"times": {
"user": 153938390,
"nice": 0,
"sys": 73413290,
"idle": 808839530
}
},
/* More Cores... */
}
}
CPU usage time and memory is local to the pod and shared between containers.
Let add our graph to our main report object.
class Stats {
/*...*/
resources(){ /*...*/}
history(obj) {
let history = obj.history || []
history.push({
request: {endpoint: this.endpoint, method: this.method},
response: this.response,
time: this.end + 'ms',
started: this.start,
resource: this.resources()
})
return history
}
new(){
/*..*/
this.db[URL] = {
history: this.history(this.db[URL]),
file: this.isFile(URL),
pod: this.host()
}
}
/*..*/
}
Reporting
Now it's time to plug our new feature into the tunnel
event bus.
class Stats {
readRequest(header) { /*...*/ }
readResponse(response) { /*...*/ }
startProfile(){ /*...*/ }
endProfile() { /*...*/ }
save() { /*...*/ }
isFile(endpoint) { /*...*/}
resources() { /*...*/ }
history(obj) { /*...*/ }
}
let stats = new Stats()
function telemetry({service, server}) {
server.on('http:data', (header) => stats.readRequest(header)
.startProfile())
service.on('http:data', (header) => stats.readResponse(header)
.endProfile()
.save())
}
To make this information available, let's create a 5 seconds refresh to show the data collected through standard output. In the next post we are going to replace this for HTTP calls.
let stats = new Stats()
setInterval(()=> {
console.log('logs -> \n ', JSON.stringify(stats.all))
}, 5000)
function handleConnection(server) {
/*
*/
}
Deploy
To deploy our changes we can reuse the build configuration we have created before.
cd /project
oc start-build bc/decorator --from-dir=. --follow
And we should see our project running.
This looks nice and all, but is very difficult to make sense of that huge block, for that reason in the next post we are going to write a dashboard so we can make sense of our telemetry at real time.
Here is the code for the decorator container, if you find any optimisation or improvement feel free to send a PR.