<?xml version="1.0" encoding="UTF-8"?>
<feed xml:lang="en-US" xmlns="http://www.w3.org/2005/Atom">
  <id>tag:docsbot.instatus.com,2005:/history</id>
  <link rel="alternate" type="text/html" href="https://docsbot.instatus.com"/>
  <link rel="self" type="application/atom+xml" href="https://docsbot.instatus.com/history.atom"/>
  <title>DocsBot AI Status - Incident history</title>
  <updated>2026-02-27T17:00:00.000+00:00</updated>
  <author>
    <name>DocsBot AI</name>
  </author>
  
<entry>
  <id>tag:docsbot.instatus.com,2005:Maintenance/cmm55kuc1003d12bbjwokhvc2</id>
  <published>2026-02-27T17:00:00.000+00:00</published>
  <updated>2026-02-27T17:00:00.000+00:00</updated>
  <link rel="alternate" type="text/html" href="https://docsbot.instatus.com/maintenance/cmm55kuc1003d12bbjwokhvc2"/>
  <title>Vector Database Migration</title>

  <content type="html">
  <![CDATA[
    <p><strong>Type:</strong> Maintenance</p>
    <p><strong>Duration:</strong> 1 day, 2 hours and 43 minutes</p>
    <p><strong>Affected Components:</strong> Website</p>
    <p><small>Feb <var data-var='date'> 27</var>, <var data-var='time'>17:00:00</var> GMT+0</small><br /><strong>Identified</strong> -
  We plan to begin migrating bots to our new Vector DB for better performance and stability. Migrations will happen bot by bot, and are expected to cause no downtime in chatbots. During the migration there may be a short period of a few minutes where managing sources will be disabled on the bot being migrated..</p>
<p><small>Feb <var data-var='date'> 28</var>, <var data-var='time'>19:43:26</var> GMT+0</small><br /><strong>Completed</strong> -
  Maintenance has completed successfully..</p>

        ]]>
  </content>
</entry>

<entry>
  <id>tag:docsbot.instatus.com,2005:Incident/cmlzhbubb00tijn2mjhvv7ew6</id>
  <published>2026-02-23T07:30:00.000+00:00</published>
  <updated>2026-02-23T07:30:00.000+00:00</updated>
  <link rel="alternate" type="text/html" href="https://docsbot.instatus.com/incident/cmlzhbubb00tijn2mjhvv7ew6"/>
  <title>Intermittent Database search errors for subset of bots</title>

  <content type="html">
  <![CDATA[
    <p><strong>Type:</strong> Incident</p>
    <p><strong>Duration:</strong> 2 days, 21 hours and 41 minutes</p>
    <p><strong>Affected Components:</strong> Chatbots</p>
    <p><small>Feb <var data-var='date'> 23</var>, <var data-var='time'>07:30:00</var> GMT+0</small><br /><strong>Identified</strong> -
  We&#039;ve seen increasing error rates with our training database that show up as a database search error or &quot;I couldn&#039;t access the documentation&quot; when chatting with a bot. The problem is intermittent. At its peak, we are seeing as much as 5% of requests triggering the error.   
  
We are in communication with our vector database cloud provider, who is working on fixing this issue on their side..</p>
<p><small>Feb <var data-var='date'> 23</var>, <var data-var='time'>16:30:00</var> GMT+0</small><br /><strong>Monitoring</strong> -
  Error rates have dropped nearly completely, but our cloud provider is still working on fixing the underlying root cause with configuration tweaks..</p>
<p><small>Feb <var data-var='date'> 23</var>, <var data-var='time'>18:43:19</var> GMT+0</small><br /><strong>Monitoring</strong> -
  For a subset of bots error rates remain higher than 75% when searching docs. Our cloud provider is still working on clearing this issue that seems to be affecting a subset of our high availability nodes..</p>
<p><small>Feb <var data-var='date'> 25</var>, <var data-var='time'>23:28:32</var> GMT+0</small><br /><strong>Monitoring</strong> -
  Only one bot remains in the repair queue. The beta release of our DB software that contains the permanent fix is in RCA..</p>
<p><small>Feb <var data-var='date'> 23</var>, <var data-var='time'>22:50:29</var> GMT+0</small><br /><strong>Identified</strong> -
  The cloud team is continuing to work on a fix for this incident. They attempted a full cluster restart but that did not solve the issue..</p>
<p><small>Feb <var data-var='date'> 23</var>, <var data-var='time'>23:49:09</var> GMT+0</small><br /><strong>Identified</strong> -
  Our cloud provider has identified a performance configuration on the cluster that was preventing entrypoint repairs from completing. They are now rolling out an update to remove this setting and restore normal repair operations..</p>
<p><small>Feb <var data-var='date'> 24</var>, <var data-var='time'>00:15:28</var> GMT+0</small><br /><strong>Identified</strong> -
  We realize that part of the confusion with this outage is that, previously, when there would be database issues, it would return a user-friendly error message in the chat interface. With our architecture recently launched a few weeks ago, this was no longer a fatal error, and the AI instead was passed the error message. It would still return an answer to the user, just letting them know that it couldn&#039;t find the sources.   
  
We have pushed a hotfix to our API so that when searching for sources fails, it again returns an actual error instead of responding fully to the user with the potential for hallucination.   
  
Waiting on the status from our provider on how those fixes are going. .</p>
<p><small>Feb <var data-var='date'> 24</var>, <var data-var='time'>00:38:00</var> GMT+0</small><br /><strong>Monitoring</strong> -
  They implemented a fix and are currently monitoring the results as it gradually repairs affected bots across all database shards..</p>
<p><small>Feb <var data-var='date'> 24</var>, <var data-var='time'>01:26:40</var> GMT+0</small><br /><strong>Monitoring</strong> -
  Affected bots count is dropping as they are fixed one by one. It&#039;s projected that all bots will be repaired within the hour. .</p>
<p><small>Feb <var data-var='date'> 25</var>, <var data-var='time'>15:29:57</var> GMT+0</small><br /><strong>Monitoring</strong> -
  We’ve been working closely with our cloud provider who have now confirmed the root causes of the recent intermittent vector index errors.

What happened

\- When compression was enabled, some tenants were in a lazy-loading state.

\- In certain cases, compression could begin before the cache was fully loaded, leading to intermittent shard-level errors.

\- Because each tenant runs across multiple shards, responses can come from different shards, which made the issue appear inconsistent.

\- This has only affected a small subset of bots at any given time.

Impact

\- Of 592 tenants with compressed data (those potentially impacted), only 4 required requantization due to errors.

\- Errors have been limited and intermittent rather than systemic.

Actions taken by our cloud provider

\- Cluster upgraded to patch version 1.35.10, which includes a fix for the RQ compression behavior.

\- Async indexing enabled to immediately compress remaining shards.

\- Repair tasks are running to restore any missing vector index elements or entry points.

\- A full repair script is in progress to ensure all tenants are fully remediated.

\- The fix has been identified and is moving through their QA pipeline.

Current status

\- Repair tasks are approximately 50% complete and progressing on the impacted bots.

\- A separate issue impacting new tenant creation is under investigation.

\- We expect remaining cleanup and stabilization to be completed by mid day.

We will continue monitoring closely and provide further updates as remediation completes..</p>
<p><small>Feb <var data-var='date'> 24</var>, <var data-var='time'>03:34:59</var> GMT+0</small><br /><strong>Resolved</strong> -
  This incident has been resolved. Repair script seems to have been completed..</p>
<p><small>Feb <var data-var='date'> 24</var>, <var data-var='time'>16:47:35</var> GMT+0</small><br /><strong>Monitoring</strong> -
  It seems that overnight the issue resurfaced with a small handful of bots. The cloud team is still looking into a permanent fix for the root issue that caused that to reoccur. .</p>
<p><small>Feb <var data-var='date'> 26</var>, <var data-var='time'>05:11:28</var> GMT+0</small><br /><strong>Resolved</strong> -
  This incident has been resolved..</p>

        ]]>
  </content>
</entry>

<entry>
  <id>tag:docsbot.instatus.com,2005:Maintenance/cmkfslaxg0eyw129dna3yphqh</id>
  <published>2026-01-24T20:00:00.000+00:00</published>
  <updated>2026-01-24T20:00:00.000+00:00</updated>
  <link rel="alternate" type="text/html" href="https://docsbot.instatus.com/maintenance/cmkfslaxg0eyw129dna3yphqh"/>
  <title>Source Vector Database Upgrade to HA</title>

  <content type="html">
  <![CDATA[
    <p><strong>Type:</strong> Maintenance</p>
    <p><strong>Duration:</strong> 4 hours and 5 minutes</p>
    <p><strong>Affected Components:</strong> Chatbots, Website</p>
    <p><small>Jan <var data-var='date'> 24</var>, <var data-var='time'>20:00:00</var> GMT+0</small><br /><strong>Identified</strong> -
  Our cloud providers will be working on upgrading our Vector databases to a new more powerful cluster with High Availability. This will provide faster and more reliable service.

This may lead to temporary periods of read only access that would show error messages on bot creation or during source training or refreshing.

Deployed bots should remain functional for users to chat with..</p>
<p><small>Jan <var data-var='date'> 24</var>, <var data-var='time'>20:00:00</var> GMT+0</small><br /><strong>Identified</strong> -
  Reminder about DB maintenance starting tomorrow. All sources will be read only and bot creation will be disabled during the migration..</p>
<p><small>Jan <var data-var='date'> 24</var>, <var data-var='time'>21:10:19</var> GMT+0</small><br /><strong>Identified</strong> -
  During the migration we are also experiencing intermittent DB connection errors that may cause chatbot responses to fail..</p>
<p><small>Jan <var data-var='date'> 25</var>, <var data-var='time'>00:05:10</var> GMT+0</small><br /><strong>Completed</strong> -
  Maintenance has completed successfully..</p>

        ]]>
  </content>
</entry>

<entry>
  <id>tag:docsbot.instatus.com,2005:Incident/cmgzf41r102b2bfmyow4k23qd</id>
  <published>2025-10-20T07:30:00.000+00:00</published>
  <updated>2025-10-20T07:30:00.000+00:00</updated>
  <link rel="alternate" type="text/html" href="https://docsbot.instatus.com/incident/cmgzf41r102b2bfmyow4k23qd"/>
  <title>Widespread AWS outage</title>

  <content type="html">
  <![CDATA[
    <p><strong>Type:</strong> Incident</p>
    <p><strong>Duration:</strong> 15 hours and 33 minutes</p>
    <p><strong>Affected Components:</strong> Chatbots, Website</p>
    <p><small>Oct <var data-var='date'> 20</var>, <var data-var='time'>07:30:00</var> GMT+0</small><br /><strong>Investigating</strong> -
  Is having an intermittent problems with our website and chat widgets on which our hosting provider has a dependency on AWS (though designed for global failover)..</p>
<p><small>Oct <var data-var='date'> 20</var>, <var data-var='time'>11:03:00</var> GMT+0</small><br /><strong>Monitoring</strong> -
  We&#039;re only seeing intermittent impacts to customers, such as chat widgets disappearing for some bots, and inability to start webcrawls..</p>
<p><small>Oct <var data-var='date'> 20</var>, <var data-var='time'>17:03:00</var> GMT+0</small><br /><strong>Resolved</strong> -
  This incident has been resolved..</p>

        ]]>
  </content>
</entry>

<entry>
  <id>tag:docsbot.instatus.com,2005:Incident/cmfc38k1m001pzgxebjr52r20</id>
  <published>2025-09-09T03:45:00.000+00:00</published>
  <updated>2025-09-09T05:40:06.532+00:00</updated>
  <link rel="alternate" type="text/html" href="https://docsbot.instatus.com/incident/cmfc38k1m001pzgxebjr52r20"/>
  <title>Chat APIs down</title>

  <content type="html">
  <![CDATA[
    <p><strong>Type:</strong> Incident</p>
    <p><strong>Duration:</strong> 2 hours and 35 minutes</p>
    <p><strong>Affected Components:</strong> Chatbots, Website</p>
    <p><small>Sep <var data-var='date'> 9</var>, <var data-var='time'>05:40:06</var> GMT+0</small><br /><strong>Monitoring</strong> -
  We implemented a temporary workaround for DNS resolution while our cloud provider is still experiencing a global outage and are currently monitoring the result. Bots seem to be working, though it may take a bit of time for server load and response times to decrease..</p>
<p><small>Sep <var data-var='date'> 9</var>, <var data-var='time'>03:45:00</var> GMT+0</small><br /><strong>Investigating</strong> -
  We are currently investigating this incident. It appears to be DNS related with our API cloud provider..</p>
<p><small>Sep <var data-var='date'> 9</var>, <var data-var='time'>05:12:15</var> GMT+0</small><br /><strong>Identified</strong> -
  Our API cloud provider has confirmed the DNS resolution incident affecting many of their regional datacenters:  
&quot;We are currently experiencing a partial outage in our Chicago, Dallas, Frankfurt, London, Los Angeles, Seoul, Singapore, Sydney, Tokyo, Toronto location. Our network engineers are actively working with our upstream providers to restore connectivity as quickly as possible. This has been effecting DNS resolution and overall network connectivity at this time.&quot;  
&lt;https://status.vultr.com/&gt;  
  
We are awaiting their resolution. When back online our chat APIs should also be restored..</p>
<p><small>Sep <var data-var='date'> 9</var>, <var data-var='time'>06:19:36</var> GMT+0</small><br /><strong>Resolved</strong> -
  This incident has been resolved thanks to our workaround. We will continue to monitor our cloud provider incident to learn the root cause..</p>

        ]]>
  </content>
</entry>

<entry>
  <id>tag:docsbot.instatus.com,2005:Incident/cmelxaay00001kw75hug0smyu</id>
  <published>2025-08-14T19:30:00.000+00:00</published>
  <updated>2025-08-14T19:30:00.000+00:00</updated>
  <link rel="alternate" type="text/html" href="https://docsbot.instatus.com/incident/cmelxaay00001kw75hug0smyu"/>
  <title>Chatbots not responding due to VectorDB provider outage</title>

  <content type="html">
  <![CDATA[
    <p><strong>Type:</strong> Incident</p>
    <p><strong>Duration:</strong> 1 hour and 34 minutes</p>
    <p><strong>Affected Components:</strong> Chatbots, Website</p>
    <p><small>Aug <var data-var='date'> 14</var>, <var data-var='time'>19:30:00</var> GMT+0</small><br /><strong>Investigating</strong> -
  Our vectorDB provider had an outage in their cloud, leading to failures in training and querying bots..</p>
<p><small>Aug <var data-var='date'> 14</var>, <var data-var='time'>21:04:00</var> GMT+0</small><br /><strong>Resolved</strong> -
  This incident has been resolved..</p>

        ]]>
  </content>
</entry>

<entry>
  <id>tag:docsbot.instatus.com,2005:Incident/cmcw0ea6y01ekwcu7ogoghuxn</id>
  <published>2025-07-09T05:30:00.000+00:00</published>
  <updated>2025-07-09T13:11:20.317+00:00</updated>
  <link rel="alternate" type="text/html" href="https://docsbot.instatus.com/incident/cmcw0ea6y01ekwcu7ogoghuxn"/>
  <title>iFrame Embed Issues</title>

  <content type="html">
  <![CDATA[
    <p><strong>Type:</strong> Incident</p>
    <p><strong>Duration:</strong> 7 hours and 41 minutes</p>
    <p><strong>Affected Components:</strong> Chatbots</p>
    <p><small>Jul <var data-var='date'> 9</var>, <var data-var='time'>13:11:20</var> GMT+0</small><br /><strong>Resolved</strong> -
  This incident has been resolved..</p>
<p><small>Jul <var data-var='date'> 9</var>, <var data-var='time'>05:30:00</var> GMT+0</small><br /><strong>Identified</strong> -
  We made a minor configuration change to our security headers to improve site security, but it appears to have overridden our custom rules for the iframe embed urls, causing iframe embeds of our widget to break.  
  
We have reverted this change and iframe embeds should be working again with no changes needed on your side.  
  
Thank you for your patience and understanding!.</p>

        ]]>
  </content>
</entry>

<entry>
  <id>tag:docsbot.instatus.com,2005:Incident/cmbb3kibq00052kpuds5767zz</id>
  <published>2025-05-30T15:50:00.000+00:00</published>
  <updated>2025-05-30T15:50:00.000+00:00</updated>
  <link rel="alternate" type="text/html" href="https://docsbot.instatus.com/incident/cmbb3kibq00052kpuds5767zz"/>
  <title>DDoS Against our Chat APIs</title>

  <content type="html">
  <![CDATA[
    <p><strong>Type:</strong> Incident</p>
    <p><strong>Duration:</strong> 3 hours and 29 minutes</p>
    <p><strong>Affected Components:</strong> Chatbots</p>
    <p><small>May <var data-var='date'> 30</var>, <var data-var='time'>15:50:00</var> GMT+0</small><br /><strong>Investigating</strong> -
  Our chat API is currently being attacked with a DDoS. We are working on mitigating this traffic. It is causing slow or timeout responses primarily for our Agent API..</p>
<p><small>May <var data-var='date'> 30</var>, <var data-var='time'>19:18:36</var> GMT+0</small><br /><strong>Resolved</strong> -
  This incident has been resolved..</p>
<p><small>May <var data-var='date'> 30</var>, <var data-var='time'>18:16:10</var> GMT+0</small><br /><strong>Monitoring</strong> -
  We implemented blocks and are currently monitoring the result..</p>

        ]]>
  </content>
</entry>

<entry>
  <id>tag:docsbot.instatus.com,2005:Incident/cm9ytoyvq0078v4f0oa4vqzhh</id>
  <published>2025-04-26T22:10:00.000+00:00</published>
  <updated>2025-04-26T22:10:00.000+00:00</updated>
  <link rel="alternate" type="text/html" href="https://docsbot.instatus.com/incident/cm9ytoyvq0078v4f0oa4vqzhh"/>
  <title>Chat API outage</title>

  <content type="html">
  <![CDATA[
    <p><strong>Type:</strong> Incident</p>
    <p><strong>Duration:</strong> 8 minutes</p>
    <p><strong>Affected Components:</strong> Chatbots</p>
    <p><small>Apr <var data-var='date'> 26</var>, <var data-var='time'>22:10:00</var> GMT+0</small><br /><strong>Investigating</strong> -
  We are currently investigating this incident affecting chatting with your bots..</p>
<p><small>Apr <var data-var='date'> 26</var>, <var data-var='time'>23:07:12</var> GMT+0</small><br /><strong>Resolved</strong> -
  This incident has been resolved. We apologize for the short outage that was due to an incomplete weekend automatic software update. We are putting in place additional automated checks to detect this kind of issue more quickly in the future..</p>

        ]]>
  </content>
</entry>

<entry>
  <id>tag:docsbot.instatus.com,2005:Incident/cm9qk2x6300r06jg2dkhdd2q8</id>
  <published>2025-04-21T01:08:00.000+00:00</published>
  <updated>2025-04-23T18:34:26.283+00:00</updated>
  <link rel="alternate" type="text/html" href="https://docsbot.instatus.com/incident/cm9qk2x6300r06jg2dkhdd2q8"/>
  <title>Bot Training DB Errors</title>

  <content type="html">
  <![CDATA[
    <p><strong>Type:</strong> Incident</p>
    <p><strong>Duration:</strong> 2 days, 17 hours and 26 minutes</p>
    <p><strong>Affected Components:</strong> Website</p>
    <p><small>Apr <var data-var='date'> 23</var>, <var data-var='time'>18:34:26</var> GMT+0</small><br /><strong>Resolved</strong> -
  This incident has been resolved..</p>
<p><small>Apr <var data-var='date'> 21</var>, <var data-var='time'>04:26:23</var> GMT+0</small><br /><strong>Resolved</strong> -
  We triggered a restart and minor update of the DB cluster, and that seems to have fixed the write issues. We are currently monitoring the result, and awaiting a full postmortem from our cloud provider..</p>
<p><small>Apr <var data-var='date'> 21</var>, <var data-var='time'>01:08:00</var> GMT+0</small><br /><strong>Investigating</strong> -
  We have reports and our monitoring is showing problems with our vector database provider when creating or updating sources.

This may show as timeouts, or strange errors like: class TenantDocument has multi-tenancy enabled, but request was without tenant

We are currently investigating this incident and are in contact with our cloud provider..</p>
<p><small>Apr <var data-var='date'> 21</var>, <var data-var='time'>04:29:40</var> GMT+0</small><br /><strong>Resolved</strong> -
  Note, if any sources were added or refreshed during the outage, they may be temporarily stuck in a failed or queued state. You can simply click Retry if available, wait a bit for stuck sources to timeout, or simply add the same source again and delete the old one..</p>
<p><small>Apr <var data-var='date'> 23</var>, <var data-var='time'>15:47:59</var> GMT+0</small><br /><strong>Identified</strong> -
  The incident seems to have recurred, impacting training sources. We are communicating with our cloud provider.  .</p>
<p><small>Apr <var data-var='date'> 23</var>, <var data-var='time'>17:08:10</var> GMT+0</small><br /><strong>Identified</strong> -
  Our cloud provider is having trouble finding the root cause, but they are rebooting the DB for now as a temporary fix (this worked last week for the same issue)..</p>
<p><small>Apr <var data-var='date'> 23</var>, <var data-var='time'>17:27:02</var> GMT+0</small><br /><strong>Monitoring</strong> -
  The reboot seems to have solved the DB issues. A small number of new bots created during the outage may be broken, we are recreating those now..</p>

        ]]>
  </content>
</entry>

<entry>
  <id>tag:docsbot.instatus.com,2005:Incident/cm8rkghmq0020u5nqdfl3us30</id>
  <published>2025-03-27T16:00:00.000+00:00</published>
  <updated>2025-03-27T16:27:53.860+00:00</updated>
  <link rel="alternate" type="text/html" href="https://docsbot.instatus.com/incident/cm8rkghmq0020u5nqdfl3us30"/>
  <title>Default OpenAI key 401 errors</title>

  <content type="html">
  <![CDATA[
    <p><strong>Type:</strong> Incident</p>
    <p><strong>Duration:</strong> 50 minutes</p>
    <p><strong>Affected Components:</strong> Website, Chatbots</p>
    <p><small>Mar <var data-var='date'> 27</var>, <var data-var='time'>16:27:53</var> GMT+0</small><br /><strong>Monitoring</strong> -
  We implemented a fix and are currently monitoring the result..</p>
<p><small>Mar <var data-var='date'> 27</var>, <var data-var='time'>16:00:00</var> GMT+0</small><br /><strong>Investigating</strong> -
  Bots using our default OpenAI credentials are returning an API key error message. We are investigating..</p>
<p><small>Mar <var data-var='date'> 27</var>, <var data-var='time'>16:49:44</var> GMT+0</small><br /><strong>Resolved</strong> -
  Our internal monitoring notified us immediately to the increased error rate and we began investigating.

It appears that the internal OpenAI key we use by default for chatbots that have not added their own key was deleted from our OpenAI account. We have verified account security and that this deletion was inadvertent.  
  
After reissuing a new key and deploying all services should be back online. We apologize that this affected our customers, and will further restrict production key delete capabilities to make sure this won&#039;t happen again..</p>

        ]]>
  </content>
</entry>

<entry>
  <id>tag:docsbot.instatus.com,2005:Incident/cm4myi3e1000b9kfqnodg1q30</id>
  <published>2024-12-13T16:22:26.035+00:00</published>
  <updated>2024-12-13T16:22:26.035+00:00</updated>
  <link rel="alternate" type="text/html" href="https://docsbot.instatus.com/incident/cm4myi3e1000b9kfqnodg1q30"/>
  <title>403 Errors when training from some document types</title>

  <content type="html">
  <![CDATA[
    <p><strong>Type:</strong> Incident</p>
    <p><strong>Duration:</strong> 32 minutes</p>
    <p><strong>Affected Components:</strong> Website</p>
    <p><small>Dec <var data-var='date'> 13</var>, <var data-var='time'>16:22:26</var> GMT+0</small><br /><strong>Identified</strong> -
  It looks like there is some kind of Github outage with a package server we depend on for processing Word, PPT, and txt document files. We are looking at workarounds.  .</p>
<p><small>Dec <var data-var='date'> 13</var>, <var data-var='time'>16:54:50</var> GMT+0</small><br /><strong>Resolved</strong> -
  This incident has been resolved by adding a workaround from the package server..</p>

        ]]>
  </content>
</entry>

<entry>
  <id>tag:docsbot.instatus.com,2005:Maintenance/cm3epla1e0049ehyzd4xg4jr1</id>
  <published>2024-11-13T10:00:00.000+00:00</published>
  <updated>2024-11-13T10:00:00.000+00:00</updated>
  <link rel="alternate" type="text/html" href="https://docsbot.instatus.com/maintenance/cm3epla1e0049ehyzd4xg4jr1"/>
  <title>Database Maintenance</title>

  <content type="html">
  <![CDATA[
    <p><strong>Type:</strong> Maintenance</p>
    <p><strong>Duration:</strong> 4 hours</p>
    <p><strong>Affected Components:</strong> Website, Chatbots</p>
    <p><small>Nov <var data-var='date'> 13</var>, <var data-var='time'>10:00:00</var> GMT+0</small><br /><strong>Identified</strong> -
  We&#039;re rolling out some key improvements to our VectorDB to keep your experience top-notch and to reduce any potential issues..</p>
<p><small>Nov <var data-var='date'> 13</var>, <var data-var='time'>10:00:01</var> GMT+0</small><br /><strong>Identified</strong> -
  Maintenance is now in progress.</p>
<p><small>Nov <var data-var='date'> 13</var>, <var data-var='time'>14:00:00</var> GMT+0</small><br /><strong>Completed</strong> -
  Maintenance has completed successfully.</p>

        ]]>
  </content>
</entry>

<entry>
  <id>tag:docsbot.instatus.com,2005:Incident/cm1810r7e0015y3prs5n30ztk</id>
  <published>2024-09-18T14:36:00.000+00:00</published>
  <updated>2024-09-18T14:36:00.000+00:00</updated>
  <link rel="alternate" type="text/html" href="https://docsbot.instatus.com/incident/cm1810r7e0015y3prs5n30ztk"/>
  <title>Chat API outage</title>

  <content type="html">
  <![CDATA[
    <p><strong>Type:</strong> Incident</p>
    <p><strong>Duration:</strong> 1 hour and 58 minutes</p>
    <p><strong>Affected Components:</strong> Chatbots</p>
    <p><small>Sep <var data-var='date'> 18</var>, <var data-var='time'>14:36:00</var> GMT+0</small><br /><strong>Identified</strong> -
  We are continuing to work on a fix for this incident..</p>
<p><small>Sep <var data-var='date'> 18</var>, <var data-var='time'>15:37:36</var> GMT+0</small><br /><strong>Monitoring</strong> -
  We implemented a fix and are currently monitoring the result..</p>
<p><small>Sep <var data-var='date'> 18</var>, <var data-var='time'>16:34:26</var> GMT+0</small><br /><strong>Resolved</strong> -
  We&#039;ve confirmed the incident has not recurred, and implementing monitoring improvements to automatically deal with this failure mode in the future..</p>

        ]]>
  </content>
</entry>

<entry>
  <id>tag:docsbot.instatus.com,2005:Incident/cm0pd9wjw001mylqn2bt4np2f</id>
  <published>2024-09-05T09:12:00.000+00:00</published>
  <updated>2024-09-05T14:56:56.286+00:00</updated>
  <link rel="alternate" type="text/html" href="https://docsbot.instatus.com/incident/cm0pd9wjw001mylqn2bt4np2f"/>
  <title>DB errors during bot training</title>

  <content type="html">
  <![CDATA[
    <p><strong>Type:</strong> Incident</p>
    <p><strong>Duration:</strong> 11 hours and 9 minutes</p>
    <p><strong>Affected Components:</strong> Website</p>
    <p><small>Sep <var data-var='date'> 5</var>, <var data-var='time'>14:56:56</var> GMT+0</small><br /><strong>Identified</strong> -
  From provider:

About 6 hours ago the pod for the primary docsbot production cluster was moved to a new kubernetes node. due to this, it is going through a startup process still where it is cleaning up old/stale data and in read-only mode. Unfortunately we do not have a good estimate for how long this will take.

Unfortunately we received no notice of this maintenance or that the DB would be unwritable for an extended period of time. We have disabled all training actions on the site temporarily until the DB is ready to avoid further confusion or accidental deleting of sources..</p>
<p><small>Sep <var data-var='date'> 5</var>, <var data-var='time'>09:12:00</var> GMT+0</small><br /><strong>Investigating</strong> -
  We are currently investigating this incident. It appears that while bots are functional, training bots is triggering a timeout error message &#039;properties&#039;. We are in contact with our cloud provider to look into the status of our cluster..</p>
<p><small>Sep <var data-var='date'> 5</var>, <var data-var='time'>17:39:53</var> GMT+0</small><br /><strong>Identified</strong> -
  We are continuing to work on a fix for this incident. Writes are working again but running to slow to enable for customers. A backup and upgrade of the DB is being performed right now to see if that can improve the issue with batch imports..</p>
<p><small>Sep <var data-var='date'> 5</var>, <var data-var='time'>18:22:35</var> GMT+0</small><br /><strong>Monitoring</strong> -
  After performing a DB upgrade and restart, training data ingestion seems to be performing well now. We are now re-enabling training in our Dashboard, and will continue to monitor performance..</p>
<p><small>Sep <var data-var='date'> 5</var>, <var data-var='time'>20:20:43</var> GMT+0</small><br /><strong>Resolved</strong> -
  This incident has been resolved..</p>

        ]]>
  </content>
</entry>

<entry>
  <id>tag:docsbot.instatus.com,2005:Maintenance/clzkc6qeq633626hbootue2ch0g</id>
  <published>2024-08-07T21:02:26.576+00:00</published>
  <updated>2024-08-07T22:02:26.576+00:00</updated>
  <link rel="alternate" type="text/html" href="https://docsbot.instatus.com/maintenance/clzkc6qeq633626hbootue2ch0g"/>
  <title>API software upgrades</title>

  <content type="html">
  <![CDATA[
    <p><strong>Type:</strong> Maintenance</p>
    <p><strong>Duration:</strong> 1 hour</p>
    <p><strong>Affected Components:</strong> Chatbots</p>
    <p><small>Aug <var data-var='date'> 7</var>, <var data-var='time'>22:02:26</var> GMT+0</small><br /><strong>Completed</strong> -
  Maintenance has completed successfully.</p>
<p><small>Aug <var data-var='date'> 7</var>, <var data-var='time'>21:02:27</var> GMT+0</small><br /><strong>Identified</strong> -
  Maintenance is now in progress.</p>

        ]]>
  </content>
</entry>

<entry>
  <id>tag:docsbot.instatus.com,2005:Incident/clz8jo3xc135614i3onl8r3j6ea</id>
  <published>2024-07-30T06:57:00.000+00:00</published>
  <updated>2024-07-30T06:57:00.000+00:00</updated>
  <link rel="alternate" type="text/html" href="https://docsbot.instatus.com/incident/clz8jo3xc135614i3onl8r3j6ea"/>
  <title>Code regression impacting &lt;2% of Bots</title>

  <content type="html">
  <![CDATA[
    <p><strong>Type:</strong> Incident</p>
    <p><strong>Duration:</strong> 8 hours and 3 minutes</p>
    <p><strong>Affected Components:</strong> Chatbots</p>
    <p><small>Jul <var data-var='date'> 30</var>, <var data-var='time'>06:57:00</var> GMT+0</small><br /><strong>Investigating</strong> -
  A code regression was deployed that was affecting a small subset of bots that had a specific unexpected metadata format in their training data (a url field saved with empty string).  
  
Unfortunately our automated testing and monitoring scripts did not detect this because it did not affect our test bots and only impacted &lt;2% of customer bots. We are working on adding an additional monitoring solution that could detect these kind of edge-case regressions in the future hopefully..</p>
<p><small>Jul <var data-var='date'> 30</var>, <var data-var='time'>15:00:08</var> GMT+0</small><br /><strong>Resolved</strong> -
  A code regression was deployed that was affecting a small subset of bots that had a specific unexpected metadata format in their training data (a url field saved with empty string).  
  
Unfortunately our automated testing and monitoring scripts did not detect this because it did not affect our test bots and only impacted &lt;2% of customer bots. We are working on adding an additional monitoring solution that could detect these kind of edge-case regressions in the future hopefully..</p>

        ]]>
  </content>
</entry>

<entry>
  <id>tag:docsbot.instatus.com,2005:Incident/clysaou4i283964hcofgz1akbif</id>
  <published>2024-07-19T02:40:00.000+00:00</published>
  <updated>2024-07-19T02:40:00.000+00:00</updated>
  <link rel="alternate" type="text/html" href="https://docsbot.instatus.com/incident/clysaou4i283964hcofgz1akbif"/>
  <title>Training data VectorDB Cluster Issues</title>

  <content type="html">
  <![CDATA[
    <p><strong>Type:</strong> Incident</p>
    <p><strong>Duration:</strong> 12 hours and 38 minutes</p>
    <p><strong>Affected Components:</strong> Website, Chatbots</p>
    <p><small>Jul <var data-var='date'> 19</var>, <var data-var='time'>02:40:00</var> GMT+0</small><br /><strong>Identified</strong> -
  We have been unable to access our primary production VectorDB managed by our cloud provider. Their status shows it undergoing unplanned maintenance, and we are waiting to hear back from them a status report on progress of restoring access. This affects bot creation, source creation/updates, and bot usage. We will update as soon as we have any news..</p>
<p><small>Jul <var data-var='date'> 19</var>, <var data-var='time'>07:42:20</var> GMT+0</small><br /><strong>Identified</strong> -
  Our cloud provider has confirmed they can see where the issue lies and are right now liaising with the SRE team for right approach to repairing the cluster. .</p>
<p><small>Jul <var data-var='date'> 19</var>, <var data-var='time'>08:58:00</var> GMT+0</small><br /><strong>Monitoring</strong> -
  Our cloud provider has implemented a fix which appears to have brought the cluster back online and are currently monitoring the result. Analysis to follow. .</p>
<p><small>Jul <var data-var='date'> 19</var>, <var data-var='time'>09:17:00</var> GMT+0</small><br /><strong>Resolved</strong> -
  Cluster has been running smoothly for many hours now. Our provider reports that the root cause was a pod wasn&#039;t scheduling as it got stuck when the node it was on was upgrading machine. They have fixed the process internally.

We are working on getting assurances that better monitoring is in place to minimize downtime if something similar happens again. We are also researching if moving to a HA setup would prevent similar issues in the future. .</p>

        ]]>
  </content>
</entry>

<entry>
  <id>tag:docsbot.instatus.com,2005:Incident/clyod4lta1106i2od8nwtqwrc</id>
  <published>2024-07-16T06:00:00.000+00:00</published>
  <updated>2024-07-16T06:00:00.000+00:00</updated>
  <link rel="alternate" type="text/html" href="https://docsbot.instatus.com/incident/clyod4lta1106i2od8nwtqwrc"/>
  <title>DNS Propagation Errors</title>

  <content type="html">
  <![CDATA[
    <p><strong>Type:</strong> Incident</p>
    <p><strong>Duration:</strong> 5 hours and 30 minutes</p>
    <p><strong>Affected Components:</strong> Website, Chatbots</p>
    <p><small>Jul <var data-var='date'> 16</var>, <var data-var='time'>06:00:00</var> GMT+0</small><br /><strong>Investigating</strong> -
  DNS propagation + SSL renewal causing an infinite redirect on the website..</p>
<p><small>Jul <var data-var='date'> 16</var>, <var data-var='time'>11:30:00</var> GMT+0</small><br /><strong>Resolved</strong> -
  This incident has been resolved. Last night we switched our DNS provider to add more sophisticated DDoS protection to the [DocsBot.ai](http://DocsBot.ai) website and API. Unfortunately for some global regions the DNS propagation caused an SSL redirection loop that made the website, admin API, and widget inaccessible. We apologize profusely for the outage and want to assure our users that this was a one-time maintenance operation that will not need to happen again..</p>

        ]]>
  </content>
</entry>

<entry>
  <id>tag:docsbot.instatus.com,2005:Maintenance/clucwa1vt64686b5orw7ehvjz9</id>
  <published>2024-03-29T16:43:56.141+00:00</published>
  <updated>2024-03-29T23:50:38.072+00:00</updated>
  <link rel="alternate" type="text/html" href="https://docsbot.instatus.com/maintenance/clucwa1vt64686b5orw7ehvjz9"/>
  <title>DB MIgrations</title>

  <content type="html">
  <![CDATA[
    <p><strong>Type:</strong> Maintenance</p>
    <p><strong>Duration:</strong> 16 hours and 53 minutes</p>
    <p><strong>Affected Components:</strong> Chatbots</p>
    <p><small>Mar <var data-var='date'> 29</var>, <var data-var='time'>23:50:38</var> GMT+0</small><br /><strong>Identified</strong> -
  Migration is proceeding, no outages so far..</p>
<p><small>Mar <var data-var='date'> 29</var>, <var data-var='time'>16:43:56</var> GMT+0</small><br /><strong>Identified</strong> -
  We are planning for a scheduled maintenance during this time. It may lead to temporary slowness or outages for some bots during the migration process..</p>
<p><small>Mar <var data-var='date'> 30</var>, <var data-var='time'>16:43:56</var> GMT+0</small><br /><strong>Completed</strong> -
  Maintenance has completed successfully.</p>

        ]]>
  </content>
</entry>

<entry>
  <id>tag:docsbot.instatus.com,2005:Maintenance/clu3bhvy2223298blof2jd0zz5j</id>
  <published>2024-03-26T20:26:00.000+00:00</published>
  <updated>2024-03-26T23:26:00.000+00:00</updated>
  <link rel="alternate" type="text/html" href="https://docsbot.instatus.com/maintenance/clu3bhvy2223298blof2jd0zz5j"/>
  <title>DB Maintenance</title>

  <content type="html">
  <![CDATA[
    <p><strong>Type:</strong> Maintenance</p>
    <p><strong>Duration:</strong> 2 hours and 48 minutes</p>
    <p><strong>Affected Components:</strong> Chatbots</p>
    <p><small>Mar <var data-var='date'> 26</var>, <var data-var='time'>23:26:00</var> GMT+0</small><br /><strong>Completed</strong> -
  Maintenance has completed successfully.</p>
<p><small>Mar <var data-var='date'> 26</var>, <var data-var='time'>20:26:00</var> GMT+0</small><br /><strong>Identified</strong> -
  We are doing DB maintenance and a migration. We do not expect much if any downtime as we migrate bots one-by-one..</p>
<p><small>Mar <var data-var='date'> 26</var>, <var data-var='time'>20:38:22</var> GMT+0</small><br /><strong>Identified</strong> -
  Maintenance is now in progress..</p>

        ]]>
  </content>
</entry>

<entry>
  <id>tag:docsbot.instatus.com,2005:Incident/clsj6pv5y80342bglnyu810geq</id>
  <published>2024-02-12T17:04:53.048+00:00</published>
  <updated>2024-02-12T22:04:17.291+00:00</updated>
  <link rel="alternate" type="text/html" href="https://docsbot.instatus.com/incident/clsj6pv5y80342bglnyu810geq"/>
  <title>VectorDB Connection issues</title>

  <content type="html">
  <![CDATA[
    <p><strong>Type:</strong> Incident</p>
    <p><strong>Duration:</strong> 5 hours and 20 minutes</p>
    <p><strong>Affected Components:</strong> Website, Chatbots</p>
    <p><small>Feb <var data-var='date'> 12</var>, <var data-var='time'>22:04:17</var> GMT+0</small><br /><strong>Identified</strong> -
  An update from the WCS DB team:

* Cluster was in crashloop with a fairly repeating pattern. We thought it might have been hitting liveness limits
* However, then we saw it crash live after only \~7min, so we know it had to be something else

  
Our engineers are still working on this and we hope to have this resolved as soon as possible. Thank you for your patience.  .</p>
<p><small>Feb <var data-var='date'> 12</var>, <var data-var='time'>22:24:53</var> GMT+0</small><br /><strong>Resolved</strong> -
  This incident has been resolved. We will followup with root cause when we have one..</p>
<p><small>Feb <var data-var='date'> 12</var>, <var data-var='time'>17:04:53</var> GMT+0</small><br /><strong>Investigating</strong> -
  We are currently investigating this incident with our managed database provider..</p>
<p><small>Feb <var data-var='date'> 12</var>, <var data-var='time'>19:26:50</var> GMT+0</small><br /><strong>Identified</strong> -
  They have engaged our WCS engineering team to investigate further into this issue. They have now attempted to move to a new node and were waiting for startup. I have escalated this issue now and have raised a higher level incident for this ticket.  
  
Unfortunately, I don&#039;t have an exact timeframe for when this will be resolved. Rest assured we are prioritizing this with our WCS engineering team and I am monitoring activities actively..</p>
<p><small>Feb <var data-var='date'> 12</var>, <var data-var='time'>17:18:39</var> GMT+0</small><br /><strong>Identified</strong> -
  Our Database provider is currently investigating..</p>

        ]]>
  </content>
</entry>

<entry>
  <id>tag:docsbot.instatus.com,2005:Incident/clo5wsjnt48313beopaf2exoz2</id>
  <published>2023-10-25T08:05:00.000+00:00</published>
  <updated>2023-10-25T08:05:00.000+00:00</updated>
  <link rel="alternate" type="text/html" href="https://docsbot.instatus.com/incident/clo5wsjnt48313beopaf2exoz2"/>
  <title>Elevated Chat Error rates</title>

  <content type="html">
  <![CDATA[
    <p><strong>Type:</strong> Incident</p>
    <p><strong>Duration:</strong> 1 day and 16 hours</p>
    <p><strong>Affected Components:</strong> Chatbots</p>
    <p><small>Oct <var data-var='date'> 25</var>, <var data-var='time'>08:05:00</var> GMT+0</small><br /><strong>Investigating</strong> -
  There was a very high error rate from the OpenAI API around that time, but our VectorDB cluster also began having connection issues. It may be that the OpenAI outage was the root cause of the DB issues.

I&#039;m having our DB provider review the health of our cluster again today just to be sure..</p>
<p><small>Oct <var data-var='date'> 25</var>, <var data-var='time'>08:45:00</var> GMT+0</small><br /><strong>Resolved</strong> -
  This incident has been resolved..</p>

        ]]>
  </content>
</entry>

<entry>
  <id>tag:docsbot.instatus.com,2005:Incident/clo4m1o2n14158bloan51oj7kl</id>
  <published>2023-10-24T17:38:35.915+00:00</published>
  <updated>2023-10-24T17:38:35.915+00:00</updated>
  <link rel="alternate" type="text/html" href="https://docsbot.instatus.com/incident/clo4m1o2n14158bloan51oj7kl"/>
  <title>Indexing new sources shows &quot;store is read-only&quot; error</title>

  <content type="html">
  <![CDATA[
    <p><strong>Type:</strong> Incident</p>
    <p><strong>Duration:</strong> 6 hours and 23 minutes</p>
    <p><strong>Affected Components:</strong> Website</p>
    <p><small>Oct <var data-var='date'> 24</var>, <var data-var='time'>17:38:35</var> GMT+0</small><br /><strong>Identified</strong> -
  We are currently investigating this incident.

It appears that our cloud provider needs to increase disk space for our DB cluster. We are working with them now to do this (it should be automated)..</p>
<p><small>Oct <var data-var='date'> 24</var>, <var data-var='time'>23:36:07</var> GMT+0</small><br /><strong>Identified</strong> -
  Our cloud provider has increased the disks and is continuing to work on a fix for this incident..</p>
<p><small>Oct <var data-var='date'> 25</var>, <var data-var='time'>00:01:07</var> GMT+0</small><br /><strong>Resolved</strong> -
  This incident has been resolved..</p>

        ]]>
  </content>
</entry>

<entry>
  <id>tag:docsbot.instatus.com,2005:Incident/clnoynrep11511bcokc2m0yl10</id>
  <published>2023-10-12T22:45:00.000+00:00</published>
  <updated>2023-10-12T22:45:00.000+00:00</updated>
  <link rel="alternate" type="text/html" href="https://docsbot.instatus.com/incident/clnoynrep11511bcokc2m0yl10"/>
  <title>Database connection issues</title>

  <content type="html">
  <![CDATA[
    <p><strong>Type:</strong> Incident</p>
    <p><strong>Duration:</strong> 7 hours and 15 minutes</p>
    <p><strong>Affected Components:</strong> Website, Chatbots</p>
    <p><small>Oct <var data-var='date'> 12</var>, <var data-var='time'>22:45:00</var> GMT+0</small><br /><strong>Identified</strong> -
  Our database provider has recently updated our credentials to include multifactor authentication, which unfortunately caused a disruption in the authentication process to the database on our API. Rest assured, your data is safe we just can&#039;t access it for the moment, and we are actively trying to get in contact with them to revert back to the previous authentication method.

We deeply regret the inconvenience caused, and should hopefully be back online shortly!
.</p>
<p><small>Oct <var data-var='date'> 13</var>, <var data-var='time'>06:00:00</var> GMT+0</small><br /><strong>Resolved</strong> -
  This incident has been resolved..</p>

        ]]>
  </content>
</entry>

<entry>
  <id>tag:docsbot.instatus.com,2005:Incident/clnoyudbn15326bcokfkcz6ev1</id>
  <published>2023-10-11T20:50:00.000+00:00</published>
  <updated>2023-10-11T20:50:00.000+00:00</updated>
  <link rel="alternate" type="text/html" href="https://docsbot.instatus.com/incident/clnoyudbn15326bcokfkcz6ev1"/>
  <title>OpenAI API Bug</title>

  <content type="html">
  <![CDATA[
    <p><strong>Type:</strong> Incident</p>
    <p><strong>Duration:</strong> 4 hours</p>
    <p><strong>Affected Components:</strong> Chatbots, OpenAI → API</p>
    <p><small>Oct <var data-var='date'> 11</var>, <var data-var='time'>20:50:00</var> GMT+0</small><br /><strong>Investigating</strong> -
  OpenAI introduced a breaking bug in their steaming API. This is resulting in about 30% of chat widget responses to return an error message and not save the question to logs. We are waiting for them to resolve this bug..</p>
<p><small>Oct <var data-var='date'> 12</var>, <var data-var='time'>00:50:00</var> GMT+0</small><br /><strong>Resolved</strong> -
  This incident has been resolved..</p>

        ]]>
  </content>
</entry>

</feed>