Writing a Dns Resolver from scratch in Haskell
What is DNS anyway
Every device on internet has an ip address associated to it, which is used to make a connection with it, but ip addresses doesn’t mean more than random numbers separated by decimal for humans so we came up with a method to map ip address of these resources to a human readable key
Components of DNS
Their are three parts to DNS
- Domain name and resource record: This includes the human-readable domain names and corresponding resource records.
- Name Servers: These servers store DNS information and respond to DNS queries.
- Resolvers: Resolvers, typically part of operating systems or network configurations, initiate DNS queries on behalf of clients.
DNS vs. Distributed KV Stores
- Extensibility: DNS supports various data types and query types, making it more adaptable to diverse needs
- Bandwidth Efficiency: Unlike distributed databases, DNS reduces unnecessary bandwidth required for adding, updating, and syncing key values.
- Network Performance: DNS employs UDP as its underlying protocol, which eliminates the need for a handshake, resulting in faster connections.
- Hierarchy: The hierarchical structure of DNS allows for the mapping of multiple resources under a single namespace and different subdomains.
Anatomy of a DNS Packet
DNS uses two types of messages, queries (which we send to name server) and response (which we receive from nameserver), both of them have same format A DNS packet consist of 5 segments,
- DNS header: Contains metadata with a fixed length of 12 bytes, including flags and counts of records in other sections.
- DNS Question: Specifies the query for the name server, including the domain name, query type, and class, can be of variable length
- DNS Answer: Resource records providing answers to the query.
- Authority Section: Resource records indicating authority for the queried domain.
- Additional Section: Resource records holding additional information.
DNS Header
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| ID |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|QR| Opcode |AA|TC|RD|RA| Z | RCODE |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| QDCOUNT |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| ANCOUNT |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| NSCOUNT |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| ARCOUNT |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
headers consist of several flags and number of records in other sections QR, if the packet is a query (0) or response (1) OPCODE can be standard query (0), inverse query (1), or server status request (2) AA, if DNS server is authoritative for the queried host TC, if message is truncated due to excessive length or not RD, if query is a recursive query RA, if nameserver supports recursive query or not Z is reserved for future use RCODE indicates the response code like no error (0), format error (1), server fail (2), domain does not exist (3), etc
DNS Question
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| QNAME |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| QTYPE |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| QCLASS |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
QNAME, name of requested resources QTYPE, types of question, A for IPv4 address, AAAA for IPv6 address, MX for mail exchange, CNAME for canonical name etc, our DNS resolver currently supports A record QCLASS, the class of query being made like internet (1, most widely used), chaos (3, rarely used) and hesiod (4, also rarely used)
DNS Resource Record
Each record has a type, expiration date and type specific data
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| NAME |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| TYPE |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| CLASS |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| TTL |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| RDLENGTH |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--|
| RDATA |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
NAME, domain name belonging to resource record TYPE, type of resource record similar to type of question CLASS, class of data similar to class of question TTL, time to live RDLENGTH, length of data in RR RDATA, contains specific data associated with type of RR, for A record it will contain IPv4 address
Communicating with a Nameserver
Their are a couple of steps we need for implementing any network protocol like,
- whether to use TCP or UDP, on which port we need to send request or listen to
- represent the packet format described for that protocol in the language we are using
- construction the request and send it to server
- receiving and processing the response we will tackle these one by one
DNS protocol specs
So the DNS runs on port 53 and uses UDP as a transport protocol, which is not as reliable as TCP and we might encounter packet loss. But why do DNS chooses to use UDP instead ? The answer is simple, Cost to establish connection. One of the things that makes TCP reliable is a 3 way handshake and it makes TCP connection establishment slower as compared to UDP. It is not acceptable for DNS resolver to do a handshake with nameserver for every fresh query.
Representing the DNS packet
We can easily represent DNS header, question, resource record and the complete DNS packet using custom data types in haskell like shown below
import qualified Data.ByteString.Char8 as BS
import Data.Word (Word16, Word32)
data DNSHeader = DNSHeader
{ dnsHeaderId :: Data.Word.Word16,
dnsHeaderFlags :: Data.Word.Word16, -- QR OPCODE AA TC RD RA Z RCODE
dnsHeaderNumQuestion :: Data.Word.Word16,
dnsHeaderNumAnswer :: Data.Word.Word16,
dnsHeaderNumAuthority :: Data.Word.Word16,
dnsHeaderNumAdditional :: Data.Word.Word16
}
deriving (Show)
data DNSQuestion = DNSQuestion
{ dnsQuestionName :: BS.ByteString,
dnsQuestionType :: Data.Word.Word16,
dnsQuestionClass :: Data.Word.Word16
}
deriving (Show)
data DNSRecord = DNSRecord
{ dnsRecordName :: BS.ByteString,
dnsRecordType :: Data.Word.Word16,
dnsRecordClass :: Data.Word.Word16,
dnsRecordTtl :: Data.Word.Word32,
dnsRecordData :: BS.ByteString
}
deriving (Show)
data DNSPacket = DNSPacket
{ dnsPacketHeader :: DNSHeader,
dnsPacketQuestions :: [DNSQuestion],
dnsPacketAnswers :: [DNSRecord],
dnsPacketAuthorities :: [DNSRecord],
dnsPacketAdditionals :: [DNSRecord]
}
deriving (Show)
Construct the request and sending to server
So a DNS query consist of the domain name we want to resolve and the query type for example we want to resolve IPv4 address for google.com which is a type A query we’ll use a helper function to create a DNS packet for our query
encodeQuery :: String -> Data.Word.Word16 -> IO BS.ByteString
encodeQuery domainName recordType = do
let _id = 1
recursionDesired = 1 `Data.Bits.shiftL` 8
header = DNSHeader _id recursionDesired 1 0 0 0
question = DNSQuestion (encodeDNSName domainName) recordType classIn
queryBytes = headerToBytes header <> questionToBytes question
return queryBytes
the above function will return a bytestring of DNS packet which we will send to the nameserver
Decoding and Processing the response
We will receive a DNS packet in bytestring form from the nameserver which may contain the actual answer we are looking for, we just need to decode and process it.
decodeQuery :: BS.ByteString -> Either String DNSPacket
decodeQuery bs =
case runGetOrFail (getDNSPacket bs) (LBS.fromStrict bs) of
Left (_, _, err) -> Left err
Right (_, _, dnsPacket) -> Right dnsPacket
getDNSPacket :: BS.ByteString -> Get DNSPacket
getDNSPacket input = do
header <- getDNSHeader
questions <- getDNSQuestionNE input (dnsHeaderNumQuestion header)
answers <- getDNSRecordList input (dnsHeaderNumAnswer header)
authorities <- getDNSRecordList input (dnsHeaderNumAuthority header)
additionals <- getDNSRecordList input (dnsHeaderNumAdditional header)
return $ DNSPacket header questions answers authorities additionals
getDNSHeader :: Get DNSHeader
getDNSHeader = DNSHeader <$> getWord16be <*> getWord16be <*> getWord16be <*> getWord16be <*> getWord16be <*> getWord16be
getDNSQuestionNE :: BS.ByteString -> Word16 -> Get [DNSQuestion]
getDNSQuestionNE input x = do
replicateM (fromIntegral x) getDNSQuestion
where
getDNSQuestion :: Get DNSQuestion
getDNSQuestion = DNSQuestion <$> getDomainName input <*> getWord16be <*> getWord16be
getDNSRecordList :: BS.ByteString -> Word16 -> Get [DNSRecord]
getDNSRecordList input count = do
replicateM (fromIntegral count) getDNSRecord
where
getDNSRecord = do
domain <- getDomainName input
type' <- getWord16be
class' <- getWord16be
ttl <- getWord32be
data_len <- getInt16be
data_ <- getRecordData (fromIntegral type') (fromIntegral data_len)
return $ DNSRecord {dnsRecordName = domain, dnsRecordType = type', dnsRecordClass = class', dnsRecordTtl = ttl, dnsRecordData = data_}
getRecordData :: Data.Word.Word16 -> Int -> Get BS.ByteString
getRecordData type_ data_len
| type_ == typeNs = getDomainName input
| type_ == typeA = do
ipBytes <- getByteString $ fromIntegral data_len
return $ BS.pack $ ipToString $ B.unpack ipBytes
| otherwise = getByteString $ fromIntegral data_len
ipToString :: [Word8] -> String
ipToString = intercalate "." . map show
Now the decoding most of the fields are straightforward but, DNS uses domain name compression for space optimization, for example if we query for domain “google.com”, the string “google.com” will be present multiple times in the DNS packet. Instead of storing it multiple times what DNS do is it stores the pointer to that string present previously. So we also need to take this compression in account while decoding domain name.
getDomainName :: BS.ByteString -> Get BS.ByteString
getDomainName input' = do
len <- getInt8
let lengthValue = len Data.Bits..&. 63
getDomainName' input' len lengthValue
where
getDomainName' input len lengthValue
| len == 0 = return BS.empty
| isPointer len = do
d <- getInt8
let offset = fromIntegral $ lengthValue * 256 + fromIntegral d -- converting bits to int
decodeCompressed offset input
| otherwise = do
label <- getByteString $ fromIntegral lengthValue
rest <- getDomainName input
return $
if BS.null rest
then label
else label <> BS.pack "." <> rest
decodeCompressed :: Int -> BS.ByteString -> Get BS.ByteString
decodeCompressed offset input = do
let msg = BS.drop offset input
case runGetOrFail (getDomainName input) (LBS.fromStrict msg) of
Left (_, _, err) -> traceShow ("err: " ++ show err) $ return BS.empty
Right (_, _, domain) -> return domain
isPointer c = Data.Bits.testBit c 7 && Data.Bits.testBit c 6
if the two MSB are set, then it indicates that remaining 14 bits represents pointer to the domain name backwards
Recursive Nature of DNS
See it is not the case that every time we query for a domain name, nameserver will respond with its ip. DNS resolution can involve multiple steps, as it may recursively query different name servers until it obtains the desired information. It can we possible that it may return ip of another nameserver which many have the answer or even the domain name of that nameserver, so our resolve function will look like this
matchRecordType :: Word16 -> [DNSRecord] -> Maybe String
matchRecordType recordType records = do
let matchingRecord = L.find (\x -> dnsRecordType x == recordType) records
(\x -> Just (BS.unpack $ dnsRecordData x)) =<< matchingRecord
getAnswer :: [DNSRecord] -> Maybe String
getAnswer = matchRecordType typeA
getNsIp :: [DNSRecord] -> Maybe String
getNsIp = matchRecordType typeA
getNs :: [DNSRecord] -> Maybe String
getNs = matchRecordType typeNs
resolve :: String -> Data.Word.Word16 -> String -> IO (Maybe String)
resolve domainName recordType nameserver = do
query <- encodeQuery domainName recordType
byteString <- sendUDPRequest nameserver 53 query
case decodeQuery byteString of
Left err -> do
traceShow ("Error parsing DNS packet: " ++ err) (return Nothing)
Right packet -> do
let mIp = getAnswer $ dnsPacketAnswers packet
let mNsIp = getNsIp $ dnsPacketAdditionals packet
let mNsDomain = getNs $ dnsPacketAuthorities packet
case (mIp, mNsIp, mNsDomain) of
(Just ip, _, _) -> return $ Just ip
(_, Just nsIp, _) -> resolve domainName recordType nsIp
(_, _, Just nsDomain) -> do
nameserver <- resolve nsDomain typeA nameserver
resolve domainName recordType $ fromMaybe "" nameserver
(_, _, _) -> traceShow "Error Occured" $ return Nothing
Final Step (testing)
Our poorman’s DNS resolver is finally completed now, you can test it by querying any domain name.
> cabal run haskell-dns-resolver google.com
Just "142.251.42.78"
> dig @8.8.8.8 google.com
; <<>> DiG 9.10.6 <<>> @8.8.8.8 google.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 5539
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;google.com. IN A
;; ANSWER SECTION:
google.com. 227 IN A 142.250.70.78
;; Query time: 49 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Mon Mar 18 19:17:16 IST 2024
;; MSG SIZE rcvd: 55