MESSAGE
DATE | 2004-11-24 |
FROM | Billy
|
SUBJECT | Re: [hangout] C/C++ coding problem.
|
From owner-hangouts-destenys-at-mrbrklyn.com Wed Nov 24 05:21:19 2004 X-UIDL: h#X"!IgT!!WF?!!AnW"! Received: from www2.mrbrklyn.com (localhost [127.0.0.1]) by mrbrklyn.com (8.12.11/8.11.2/SuSE Linux 8.11.1-0.5) with ESMTP id iAOALJnN027821 for ; Wed, 24 Nov 2004 05:21:19 -0500 Received: (from mdom-at-localhost) by www2.mrbrklyn.com (8.12.11/8.12.3/Submit) id iAOALId5027820 for hangouts-destenys; Wed, 24 Nov 2004 05:21:18 -0500 X-Authentication-Warning: www2.mrbrklyn.com: mdom set sender to owner-hangouts-at-www2.mrbrklyn.com using -f Received: from mail.dadadada.net (MAIL.DADADADA.NET [209.48.2.106]) by mrbrklyn.com (8.12.11/8.11.2/SuSE Linux 8.11.1-0.5) with ESMTP id iAOALICR027815 for ; Wed, 24 Nov 2004 05:21:18 -0500 Received: from billy by mail.dadadada.net with local (Exim 3.35 #1 (Debian)) id 1CWuJm-0005ZV-00; Wed, 24 Nov 2004 05:24:10 -0500 Date: Wed, 24 Nov 2004 05:24:10 -0500 From: Billy Cc: swd , hangout-at-nylxs.com Subject: Re: [hangout] C/C++ coding problem. Message-ID: <20041124102409.GA16323-at-mail.dadadada.net> References: <200411230202.17826.sderrick-at-optonline.net> <20041123080651.GA32587-at-www2.mrbrklyn.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20041123080651.GA32587-at-www2.mrbrklyn.com> User-Agent: Mutt/1.3.28i Sender: owner-hangouts-at-mrbrklyn.com Precedence: bulk Reply-To: Billy List: New Yorker GNU Linux Scene Admin: To unsubscribe send unsubscribe name-at-domian.com in the body to hangout-request-at-www2.mrbrklyn.com X-Spam-Checker-Version: SpamAssassin 3.0.0 (2004-09-13) on www2.mrbrklyn.com X-Spam-Status: No, score=-2.5 required=4.0 tests=AWL,BAYES_00,MISSING_HEADERS autolearn=ham version=3.0.0 X-Spam-Level: X-Keywords: X-UID: 39122 Status: RO X-Status: A Content-Length: 3168 Lines: 120
On Tue, Nov 23, 2004 at 03:06:51AM -0500, Ruben Safir Secretary NYLXS wrote: > On Tue, Nov 23, 2004 at 02:02:17AM -0500, swd wrote: > > > > C/C++ Coders, > > I need to code a function that retrieves the HTML source code > > from a web site. I want to be able to do this from a command > > prompt. How the heck do I do this? Thanks for the help. > > And what is wrong with wget or LWP?
Well, he did say he needed a FUNCTION.
LWP is libwww-perl, there's a libwww (for C) published by the W3C which (eventually) does what he wants:
http://www.w3.org/Library/src/
But libcurl appears to be far superior:
http://curl.netmirror.org/libcurl/
Available in Debian via: apt-get install libcurl-dev
> Otherwise you need to open sockets and read the httpd protocals and > follow them.
Sheesh! Do you think Perl is the only language with libraries?
> Unless this is a homework assignemd, it's not worth it.
Sometimes, it's worth it. C can go many places Perl scripts and shell scripts cannot, and compiled C programs are much easier to distribute, because they can be linked for minimal host environmental dependency.
Also:
easiest solution to implement != easiest solution to support.
You have forced my hand, sir.. I present a poor-man's lwp-request, which compiles down to a 6k executable on my Debian system.
Most of the code is error checking.. if you didn't care about that, you could probably do this in 10 lines and just dump the URL to stdout.
===== curlbilly.c =====
#include
int download(char *url, FILE*fp) { CURLcode res; CURL *curl; char curlerrbuf[CURL_ERROR_SIZE]; curl = curl_easy_init(); curl_easy_setopt(curl, CURLOPT_ERRORBUFFER, curlerrbuf); curl_easy_setopt(curl, CURLOPT_FAILONERROR, 1); curl_easy_setopt(curl, CURLOPT_FOLLOWLOCATION, 1); curl_easy_setopt(curl, CURLOPT_URL, url); curl_easy_setopt(curl, CURLOPT_FILE, fp); if(!curl){ fprintf(stderr, "curl initialization error\n"); return -1; } res = curl_easy_perform(curl); if(res){ fprintf(stderr, "download failure:\n%s\n", curlerrbuf); } curl_easy_cleanup(curl); return res; }
int main(int argc, char **argv) { char *url; char *ofn; FILE *ofile; if(argc!=3){ fprintf(stderr, "Usage:\n\n\t%s url outfile\n\n", argv[0]); exit(1); } curl_global_init(CURL_GLOBAL_ALL); url = argv[1]; ofn = argv[2]; if(!(ofile = fopen(ofn, "w"))){ fprintf(stderr, "Error: couldn't open '%s' for writing:\n", ofn); perror(""); exit(1); } fprintf(stderr, "getting url: '%s' as file '%s'\n", url, ofn); download(url, ofile); fclose(ofile); curl_global_cleanup(); return 0; }
===== Makefile =====
CC=`curl-config --cc` CFLAGS=`curl-config --cflags` LDFLAGS=`curl-config --libs`
TESTURL="http://www.nylxs.com"
all: curlbilly
test: nylxs-index.html
nylxs-index.html: curlbilly ./curlbilly $(TESTURL) nylxs-index.html ____________________________ NYLXS: New Yorker Free Software Users Scene Fair Use - because it's either fair use or useless.... NYLXS is a trademark of NYLXS, Inc
|
|