Catching Compromised Cookies – Slack Engineering
Slack makes use of cookies to trace session states for customers on slack.com and the Slack Desktop app. The ever-present cookie banners have made cookies mainstream, however as a fast rebrisker, cookies are slightly piece of client-side state related to an internet site that’s despatched as much as the online server on each request. Web sites use this piece of data to inject state into the inherently stateless protocol of HTTP. At Slack, meaning each time you signal right into a workspace, your cookie (which we name the session cookie) is up to date to mirror this.
Since session cookies are ceaselessly used to uniquely establish customers in functions throughout the web, they’ve grow to be an apparent goal for malicious actors seeking to achieve entry to methods. If hackers current a cookie as their very own, the web site will usually grant them entry as in the event that they have been the unique consumer. Malicious actors usually purchase these cookies by way of malware operating on a consumer’s gadget, utilizing the malware to silently steal cookies and different delicate knowledge and ship them to a server managed by the attackers. Utilizing this stolen knowledge permits them to achieve entry to a wide range of web functions starting from banking companies to social media websites. The results of this may be extreme, starting from monetary loss and id theft to the publicity of confidential communications and private info.
Slack workspaces include delicate knowledge and will be a horny goal for attackers. Take into account the state of affairs the place a menace actor phishes a consumer and manages to put in malware on their gadget. The malware might then steal cookies, that are saved within the gadget’s browser, and replay these cookies to impersonate the consumer. To take an actual world instance, think about you left your own home key below the mat and somebody managed to find it, clone it, and put it again so that you had no thought. One method to scale back the chance of a copied key’s to alter your locks frequently. For those who do this, a thief would have solely a restricted window of time to make use of the important thing they copied.
In Slack, the analogue of fixing your lock is the session period function. Admins can configure how lengthy they need somebody’s session to final earlier than they should log in once more. This helps restrict the chance of stolen cookies, nevertheless it’s not excellent. Attackers nonetheless have a window of time to make use of their copy of the cookie and session period doesn’t inform us when an attacker is lively. As well as, customers get pissed off when the session period is simply too brief as they discover themselves having to check in once they’re simply making an attempt to get work finished.
Cookies for numerous websites are ceaselessly compromised by actual attackers seeking to achieve entry to firm info. Malware operators steal cookies and promote them on darkish internet marketplaces to the best bidder. Whereas we are able to’t make sure the safety of the units our clients use to entry Slack, we wished to go additional to guard our clients’ knowledge. This weblog talks about how we are able to detect when cookies are stolen and alert workspace directors.
Detecting cookie misuse
The core thought behind our technique is to detect session forking. That’s, understanding if a cookie is getting used from a couple of gadget on the similar time:
To detect session forking, we use a number of parts to detect indicators in parallel. These parts can cowl the gaps between one another and enhance the accuracy of our system. An important part is the final entry timestamp.
Final entry timestamp
The final entry timestamp corresponds to when the server set the cookie on the consumer. We retailer the timestamp each within the cookie and within the database. On future requests, we examine the timestamp on the incoming cookie with the timestamp within the database. If they don’t match, this means that the consumer is sending an previous model of the cookie.
We frequently refresh the cookie with a more moderen final entry timestamp and replace the database accordingly. If a malicious actor obtains a stolen cookie, they’ll possible obtain an outdated model with an previous timestamp. After they use that cookie to entry Slack, we’ll examine the previous timestamp within the cookie with the newer worth within the database. Since they don’t match, we are going to detect that the session has been forked.
A foul actor would possibly attempt to forestall this by frequently interacting with Slack through the stolen cookie. In that case, we’d replace the final entry timestamp for the unhealthy actor’s cookie and the database. When the unique consumer begins Slack once more, they current their previous copy of the cookie. We examine that with the newer worth within the database and once more decide {that a} session fork has occurred. Based mostly on the final entry time, we don’t know which facet of a forked session is authentic. We will solely inform that there are two (or extra) copies of the cookie when there needs to be one.
Testing
As soon as we had a fundamental model of the system working, the following step was to guage its effectiveness. Our preliminary outcomes weren’t excellent. We had a real constructive within the type of a coworker who was utilizing their cookie to automate actions in Slack. However in numerous circumstances, our detection logic resulted in each false negatives and false positives. For the function to be a significant safety enchancment, we want dependable detection to have the ability to act on the indicators we generate. Our pilot clients deliberate on robotically invalidating classes that may have been forked, which meant that our excessive variety of false positives could be disruptive to their work.
False positives
From our investigation, we discovered that customers have been triggering detection occasions whereas going about their regular day. We discovered many various edge circumstances that brought about this. Typically, we’d attempt to set a brand new cookie with an up to date timestamp, however the consumer by no means acquired the brand new cookie. That meant the Slack consumer now had a distinct final entry time from the database, making it current equally to an previous, stolen cookie. This case would end in a false detection occasion.
So we launched the IP handle. If the final entry time is completely different, however the IP handle matches the IP saved within the database alongside the previous timestamp, the request is probably going coming from the identical laptop and subsequently unlikely to be stolen. This variation alone eradicated a big proportion of the false positives, however failed to handle among the key shortcomings within the design.
For the final entry timestamp to work, we want purchasers to reliably set cookies. We’ve got numerous hypotheses for why purchasers weren’t setting cookies, akin to laptops going to sleep earlier than the server might reply.
We must always solely replace the timestamp within the database after we all know the consumer has saved the brand new cookie. To perform this, we use a two-phased strategy, the place every request is idempotent. We replace the session cookie by setting a separate “session candidate” cookie. If we obtain a request with a more recent session candidate cookie set, we put it on the market to the session cookie. We replace the timestamp within the database after the consumer presents us with a more recent timestamp through the session candidate cookie.
With this strategy, if the consumer doesn’t obtain a response for any explicit request, we are going to decide up the place we left off within the course of. If the server tries to set a session candidate cookie, however the consumer doesn’t current a session candidate cookie on the following request, we’ll simply set it once more. Likewise, if the consumer doesn’t obtain the headers to advertise the worth within the session candidate cookie to the session cookie, we are going to simply embody these headers on the following request. When the consumer supplies each session candidate and session cookies, we are going to think about both timestamp worth when evaluating with the database timestamp. Within the above diagram, the session cookie would match the database since that is the primary request that the consumer sends the session candidate cookie. Within the final request of the diagram beneath, the session candidate cookie will match the timestamp within the database.
We’ve got additionally finished work to mitigate the impact of race circumstances the place the consumer sends a gaggle of API requests in fast succession. We need to keep away from the state of affairs the place we replace the database on the primary request that is available in, however different requests are already in flight with the previous model of the cookie. If the timestamp within the database was simply up to date, we don’t have a correct previous worth to match with the incoming cookie timestamp. To that finish, we ignore the timestamp in these requests. A request on this prompt might theoretically evade detection, however it will be very exhausting for an attacker to foretell precisely when the unique consumer sends the primary request inflicting the database to be up to date. An attacker can’t take a number of guesses to attempt to time the window as a result of if anyone request falls outdoors the window, we are going to detect that the cookie has been forked. This reduces false positives from in-flight requests with out compromising the worth provided by the function.
Threat degree measurement
We now have some new info along with the final entry timestamp (i.e. details about the gadget and community) that we are able to mix. We then algorithmically generate an evaluation about whether or not a detection is a real or false constructive. With our calculated chance, we categorized the chance as low, medium or excessive. For something decided to be excessive threat, we ship an occasion to the audit log. We’re persevering with to enhance our algorithm to additional scale back false positives.
Efficiency considerations
Within the diagrams above, we deal with the logic round updating the final entry timestamp within the cookie and database. That’s probably the most advanced interplay of this technique, however not the most typical. For the overwhelming majority of API requests, we merely examine the timestamp with the prevailing worth and decide if the request is an anomaly.
As a result of Slack’s real-time nature, our purchasers will be very chatty and ship many API requests throughout easy consumer interplay. As introduced above, our final entry timestamp must be learn from the database on each request. Introducing a brand new database learn on each request could be important by way of load. Whereas a few of this load may very well be taken by a cache, we are able to simplify additional and keep away from among the database reads within the first place.
If the final entry time within the cookie is current, we all know the cookie is in lively use since meaning the server simply set it. This implies if the session have been forked, we’d have already triggered a detection occasion. We will keep away from studying from the database till a while has handed, primarily based on the idea that attackers don’t immediately steal and promote cookies. When the cookie ages out of that window, we set a contemporary cookie. This strategy permits us to keep away from interacting with the database on a big majority of API requests. This strategy additionally lends nicely to the utilization patterns of Slack customers, who usually use Slack in bursts with many API requests.
Rollout
As with the opposite anomaly detections we’ve rolled out, we labored intently with pilot clients to develop their understanding of the function. Anomalies aren’t meant as a transparent indicator of malicious habits a lot as one thing sudden in an atmosphere and needs to be investigated as probably malicious. In some circumstances this cookie anomaly might occur for regular causes, akin to a pc being restored from a backup. We labored intently with our pilot clients to validate and enhance our detection capabilities.
This restricted rollout gave us the chance to raised perceive the efficiency traits of our design in addition to examine sources of noise within the knowledge. The data we collected at this stage led to a number of key enhancements, together with our two-phase cookie updating strategy. After decreasing the noise to an appropriate degree and validating that the function labored as anticipated, we step by step rolled out the detection logic to the remainder of Slack.
We talk detection occasions to clients through Slack’s audit log. Prospects can ingest audit logs into their very own Safety Occasion Supervisor akin to Splunk or ELK and mix it with different knowledge streams to attract a conclusion concerning the safety of their customers’ knowledge.
Future improvement
In the present day we’re delivering detections to clients through the audit log and permitting them to correlate logs of their inside instruments to make acceptable safety choices. Sooner or later, we consider we might additional enhance the system by robotically invalidating classes flagged with a high-risk detection. This is able to robotically signal out each the authentic customers and attackers. The authentic customers must re-authenticate with Slack, whereas attackers would lose the connection and skill to impersonate the consumer.
Taken with constructing revolutionary tasks and making builders’ work lives simpler? We’re hiring 💼