Blake's Coding Blog

Wednesday, October 01, 2014

MySQL Date Comparison Performance

As the webmaster of a major IT solutions provider, one of my responsibilities involves optimizing queries against our MySQL database in order to ensure fast page loads. Like most programming languages and technologies, there are often multiple ways to perform a certain action or get a certain result, but only a few of those possible approaches are efficient. MySQL is no exception. Hundreds of articles exist that compare the performance of different query strategies. However, not all scenarios have been exhausted yet. Today, I will be examining one of those scenarios: range comparisons against columns of the DATE datatype in InnoDB.

New Website Launch and Job Hunting

In case you haven't noticed, the BlakeNet site has been down for quite a while. In it's place comes a much more serious website whose scope is beyond BlakeNet's original experimental sandbox and download repository purposes. My new site, Subnet ROOT (SNR), serves to bring professional contacts, as well as personal and casual ones, all the information they might want to know about me. All content previously provided by BlakeNet will still be available through SNR. If there are any articles here that still link to the defunct BlakeNet, please let me know and I will be happy to provide the working alternative.

In the meantime, did I mention I'm looking for employment opportunities? I just finished up my Bachelor's of Science in Engineering and Information Sciences (Game and Simulation Programming concentration). Now I'm looking to get my foot in the door in the video game industry or other computer programming careers. Find me on LinkedIn (professional inquiries only, please) or contact me through SNR (professional or personal inquiries).

Edit: I'm no longer actively seeking opportunities, though I am always open to professional networking. Feel free to continue reaching out to me!

Thursday, August 02, 2012

VSFTPD "refusing to run with writable root inside chroot" Work-around

Hey there, everyone! I've been very busy lately with a couple different projects, but I thought I'd take the time to share my latest challenge and how I worked around it. The other day, I updated my Ubuntu servers from 10.04 to 12.04. This update brings with it a new version of VSFTPD (Very Secure FTP Daemon) which boasts some security improvements. Unfortunately, one of these improvements is causing a lot of headaches for me and other server maintainers.

VSFTPD has buffed up security pertaining to chroot'ed users. The (brief) reasoning behind it is that users jailed to directories they have write access on may alter scripts in such a way that would allow them to "break out of" the jail. While I acknowledge this as something to take into consideration, I feel the compromise between security and usability currently being forced onto us is unacceptable. Many configurations that worked perfectly fine before the update now throw an error when a chroot'ed user logs in:

500 OOPS: vsftpd: refusing to run with writable root inside chroot ()

A quick Google search turns up some obvious solutions, like stripping write permissions from the user on their home directory, moving files you want the user to have access to into subfolders (where they can have write access), or using the local_root config directive to jail users to the parent directory instead. You can even replace VSFTPD with the vsftpd-ext package, which supports the allow_writeable_chroot config directive. But, these weren't viable solutions for me; I didn't want to remove write access from users on the one place on the system I feel they should be guaranteed full-run over, I didn't want to allow nosey users to see what other files and folders were outside of their own area, and I didn't want to install an alternative package just to fix one simplistic issue. So I came up with my own solution!

My configuration is set up to not jail users by default. This allows accounts like mine and my operating partners to navigate the entire system, while also allowing me to limit what is visible to specific users for simplicity or to keep them out of places they shouldn't be in.

$ nano /etc/vsftpd.conf
...
chroot_local_user=NO
chroot_list_enable=YES
chroot_list_file=/etc/vsftpd.chroot_list
user_config_dir=/etc/vsftpd_user_conf

$ nano /etc/vsftpd.chroot_list
# List of local users to chroot() jail in their home directory when FTPing
foo
bar
foobar

Above, we can see that my config file calls for local users to not be chroot jailed by default, but to jail specific users listed in /etc/vsftpd.chroot_list, and to also find user-specific config files in the /etc/vsftpd_user_conf directory. The list file is pretty straight-forward (one username per line). This next part is where the magic comes in. The objective is to jail the users in such a directory that the only thing they could possibly see in their FTP listing would be their home directory. This directory must not have write access, and it must preserve the existing directory structure. There is a great utility for this sort of thing: links! But therein lies a problem. Directories can only be symlinked/softlinked, and FTP clients typically can't follow symlinks.

Luckily, there is another, much more powerful way to create links to directories. If you are familiar with the mount command, you know that it's used to create links between the filesystem and physical data. Did you know it can also create links between one area of the filesystem and another? The magical command uses the syntax mount --bind originalfile linkname. Now let's put it to work!

This next part will require you to know the superuser password. If you don't know it, but your account is allowed to sudo, skip the su root command and preceed each of the other lines with sudo.

$ nano /etc/vsftpd_user_conf/foo
local_root=/srv/ftp/foo

$ su root
# mkdir /srv/ftp/foo
# chown foo:foo /srv/ftp/foo
# chmod ug-w /srv/ftp/foo

# mkdir /srv/ftp/foo/foohome
# mount --bind /home/foo /srv/ftp/foo/foohome

In the first part, we see that we change foo's local root directory to /srv/ftp/foo instead of his typical home directory (in this example, /home/foo, though it could be anywhere). I chose to use the /srv/ftp directory to house these jails, but you can use any directory you'd like as long as it has world read-execute permissions. In the second part, we run a couple commands to create the necessary directories, set the owner and access flags, and finally link the original home directory to the new area.

That's all there is to it! Simply repeat the above process for each jailed user, giving each one their own vsftpd_user_conf entry and /srv/ftp folder. Please note that you will have to execute the mount --bind command again for each user after every system reboot. To avoid this chore, I created a script and referenced it in my /etc/rc.local file. You can find more details about writing scripts and making them execute each time your system starts on plenty of other sites. I hope this has helped!

Saturday, April 21, 2012

SDS Status Report: April 2012

Hello! I have a couple things here to talk about. First off, as I often do, I must apologize for my lack of attention to the SDS project lately. I was involved with my senior project and another project involving Second Life society, the two of them taking up pretty much all of my time. I've since graduated and restarted work on some projects I had abandoned during my educational pursuits, SDS included. So, I have some bad news, good news, and more bad news involving the fate of the project.

The bad news first. To put it bluntly, I discovered that what I was promising the final version of SDS to do is just not possible right now. Looking back on my proposals, I had adopted two mentalities about all of this. They were "this project is absolutely possible. It's just a matter of making things run fast enough to be convenient to the user" and "this shares concepts with GPS technology, and GPS is fairly fast, therefore this project can be made to be as fast as GPS." I found out empirically that it just can't be made to run fast enough (given the tools immediately available to me), and it's very different from GPS. More on the GPS bit later (in a future article). In conclusion, this project will never reach the state I assumed it would, and so I must insist that SixDegreeSteam be retired.

There's good news too. I'm determined to leave this thing in as much of a working state as can be managed. My intention is to abandon the database approach, create an easy-to-follow graphical output, and have it accomplish the original objective I set out to meet, which was to outline connectivity of one user to the rest of the community. I've already gotten a lot of work in on these points in the past couple days. There's a lot more to go, but at this rate, it'll be done before anyone knows it.

Now the final piece of bad news. Parts of the Steam Community API got updated recently, throwing a generously sized monkey wrench into what used to work in SDS. This slows progress down because I now have to go back and rethink what I already had down. In conjunction, I contacted the Valve employee I understand to be the lead developer on the Steam Community to ask for any information pertinent to the project, or some added public API functionality, or anything to help me out. I was met with a shockingly uninterested response I interpretted as "we've made tools available already. We have no more time to spend helping you datamine," which was understandable albeit surprising given Valve's invested interests in other community members and modders. Things have stacked against me and this project's success from the beginning. I was really hoping to catch a second wind from Valve.

To finish off this status report, I'd like to reiterate just how sorry I am to have taken so long with all of this, only to let everyone waiting patiently on it down. I've appreciated the interest, comments, feedback, and time each person has given to SixDegreeSteam. It's with a heavy heart and a shameful disposition that I must break this news to you. SixDegreeSteam will soon be concluded. After this next client release, it'll all be put away until I can acquire better means to meet all my promises. If you want to keep an ear in it so you can be aware of such a day, I suggest subscribing to this blog (for news on SDS and my other projects) or staying a member of the SDS Steam group. Again, sorry.

Thursday, January 05, 2012

56 Days of Second Life

Coming back from a long hiatus (wow, has it really been a year?), I went back to school earlier this week. My new classes have officially started, and boy are they monsters! The good news is this is the last stretch for me before graduation. The bad news is that these classes are going to eat up every ounce of my spare time.

One of the two classes I'm enrolled in now is an Emerging Technologies course. Along with a couple other assignments, we have to work on a project that demonstrates, well, emerging technologies! I've decided to do an 8-week-long project focusing on the social interactions that take place in Second Life. It was between that or a robotics project, and I don't feel confident enough with physical electronics to have my academic success ride upon it. At any rate, this project should be really interesting! If you'd like to follow it, I'm required to maintain a blog detailing my findings and analysis of what I observe. You can find it over here.

On a related note, expect an update on SDS pretty soon. It'll probably upset a lot of people, but it's high time I speak up about it. See you in a day or two!

Saturday, January 22, 2011

SixDegreeSteam 1.6.6 "Pure Client" Beta Live

Im very proud to announce the release of SixDegreeSteam version 1.6.6 ("Pure Client") available at the SDS site. This is a closed beta that anyone can download and use, but will only work with Steam profiles that are members of the SDS Steam group. This version is NOT meant to be representative of SDS, which is to say this is just a technical demo. It doesnt include all planned features, is expected to run pretty slowly (its dependant on your processor, memory, and internet speed), and will most likely not find relationships larger than 5 profiles long (at least not with the default user limit. Even if you up the limit, itll still take a really long time). So what DOES it do? Heres the list:

DOES find your relationship to closely related users (5 or less recommended)
DOES print found relationships in a list structure
DOES show the potential of the project, and why the dependence on the server is so important (seriously, do you really want to have to use a program this slow?)
DOES NOT generate links using groups or private profiles
DOES NOT have a graphical user interface (unless you count the command prompt as graphical)
DOES NOT print the relationships in a very user-friendly manner (yet...Im working on that)
DOES NOT find relationships unless at least one of the given profiles is a member of the SDS Steam group

This is really only meant to show the current standing of the technology and give a sense of why the project is taking so long to complete. Trying to get something that is inherently very slow to work fast enough to be usable is not an easy task. Just remember that this is only the start of a potentially great project. Please be patient.

In the meantime, I hope everyone who tries the beta understands that it isnt, nor is it meant to be, very good or user-friendly. There is a lot to still be done. If you have comments, please let me know. I love hearing from you all!

Thanks so very much for your gratious and humbling interest
- Blake (ROOT)

Saturday, July 03, 2010

Getting Past OnLive's LAN Requirement

I was lucky enough to be part of the OnLive Founding Members promotion since I pre-ordered way back when the service first came around. If you were too, you might know about the current restriction in place on WiFi connections. Basically, it's a temporary means to limit the ways players can connect to the service in order to rule out connection reliability as a factor during these early stages when troubleshooting is inevitable. As far as I know, this is only meant to help cut down on the amount of calls-for-help from the technologically inapt. Imagine every noob using the service via a wireless connection flooding support forums, inboxes, and phones with problems pertaining to their network connection speed, which is totally not OnLive's problem/fault.

However, I don't think it's all that fair to limit those of us with prominent networking experience and wireless connections. So, I found a way past it. WARNING: Using this method of connecting to the OnLive service will most likely make you ineligible for support from OnLive until either wireless connections are supported or you actually connect through LAN. YOU HAVE BEEN WARNED!

The fix is very simple and can be done on Windows Vista or 7 (perhaps other versions as well, but I haven't been able to verify). Follow these steps:

Right click your network connection icon in the notification area, then choose Open Network and Sharing Center.
1. OR open your Control Panel, click the Network and Internet category, then Network and Sharing Center.
Click Change adapter settings in the left pane.
Select the wireless network connection you use to connect to the Internet and an unused local area connection (click one, hold CTRL, then click the other).
Right click one of the selected items and choose Bridge Connections. A Network Bridge item should be created.

Enjoy! The only real side effect I noticed about this procedure is that your network connection icon may display the disabled/unconnected icon even if you are fully connected to the network and Internet. If you don't mind that, this work-around is pretty solid. When OnLive adds WiFi support or if you decide you want your LAN connection back, just go back into the adapter settings and delete the network bridge item.

PS: Add me as a friend or spectate me sometime! My gamertag is Blakeo_x.

Monday, March 01, 2010

SDS Status Report: February 10

The past month has been very dull for the SixDegreeSteam project, unfortunately. A lot of personal baggage has bullied progress to the bottom of the stack. However, with a new month comes a renewed working spirit. February wasnt completely without change for SDS, though. The general crawling was halted early in the month. The decision to do so was based on the fact that an accurate backbone has already been developed (the purpose of the general crawling) and that crawling on a relational chain basis, which is much more time- and resource-friendly, will likely replace a huge portion of the general crawling dataset upon client release. The SDS crawler has also reached a new version. SDS Local Server 1.6.1 reduces the residential memory to the SteamUser class, features a slightly modified database querying scheme to [fractionally] speed up present and future data acquisition, and utilizes a new community page wherein friend data is stored in XML format, speeding up crawling and reducing bandwidth consumption.

March is likely to be a very productive month for SixDegreeSteam. With any luck, the client will be ready for public use soon!

Sunday, January 31, 2010

SDS Status Report: January 10

A full month of crawling has passed, leaving the database pretty full and everyone involved with the project ecstatic to see everything coming together. A lot of progress has been made since the last report, so there is a lot to report on here. Strap in!

On January 3, this discussion took place:

January 3, 2010 chat between Blake "ROOT" and Harry "BlackSyte":

Blake: SDS is running a lot slower now that things are filling up, and the partitioning [of the database in an attempt to make things faster] didnt go over too well. My next attempt will be to do manual partitioning, but I dont want to do it right now because I just worked pretty hard to repair the damage the automatic partitioning did...It slowed things down to 1 process per minute.

With the current table loads and everything back to normal, its about 50 profile processes per minute. Its not THAT bad, but it isnt fast enough. Something exciting, though; SDS has reached the edge nodes. The very first Steam user has been crawled, and the very last Steam user (as of this writing) has been crawled. All that is left is everyone in between! But, it stands to prove that my theory might hold weight afterall. It shows that someone, thereby everyone, in the SDS Steam group is connected to both the minimum and maximum edges.

On January 18, SixDegreeSteam Local Server 1.5.87C (the current version) was implemented. It is thus far the most stable release with many improvements over the initial release from December. Family 1.5 brought in better error handling, a logging mechanism, parameter-based execution, a "prioritize child nodes" option for high-priority queue entries, and some programmatic fine-tuning. Release 1.5.87 introduced multi-threading for running multiple local servers (crawlers) consecutively and fixed some profile tracking issues that were previously irreproducible in production.

On January 21, the client algorithm was successfully executed. The interface is still a couple weeks out, but is coming quickly. Work on the interface was stalled after the 21st due to coursework and may continue to be stalled for 4 more weeks.

As of this writing, the database reports these statistics:

- Users crawled: 2,518,356
- Groups crawled: 748,140
- Profiles pending: 5,988,765
- Total discovered users: 7,791,431
- Total discovered groups: 748,141
- Average time between discovery and crawling: 9 days

Bonus: This is a discussion that took place on January 5. It doesnt do much to prove progress of the project, but does offer some interesting food for thought:

January 5, 2010 chat between Blake "ROOT", Dallas, and Harry "BlackSyte":

Blake: Its amazing how quickly the graph edges progressed.
Dallas: But the surface has only been barely scratched.
Blake: Exactly! I mean, we arent even half way through all the users, yet a streamlined backbone has emerged. That raises a concern...Perhaps there are more profiles with a betweenness centrality less than or equal to their degree centrality than I thought. Since a rigid, well-defined backbone has already formed so early in the project, yet there are relatively no user discoveries, it makes me think[...]

There are two possible conditions under which this would happen, guessing that the crawler itself is not at fault, and my confidence of that is fairly high as of 1.5C. First [possibility], there are a shitload of people who form "clicks" [or] small collections of friends who do not join groups or befriend "outsiders"...and by shitload, I mean over 80% of the entire demographic. Thats hard to believe, but not improbable.

The second case, which is even mmore unlikely but is very possible given a generational standpoint, is that there are sections of the entire demographic who are only friends with other members of the same section. So, you have 4 million users in segment A who are friends with other users in that same section, but none of them are friends with users from section B.Its incredibly unlikely, but its a valid portrayal of an existing graph theory called generational demography -- Newer users tend to be friends with other newer users, while older users tend to be friends with other older users, and never shall the two meet.

Heres another interesting observation Ive made. The queue timeframe is currently 9 days...Now, what that means is that there is a 9-day waiting period between discovery and crawling. Its a common occurrence in the crawler that a profile will become invalidated within those 9 days; A wildly common occurrence, in fact. People are deleting their profiles or changing their profile names way too often, somewhere between 1 to 9 days!
Harry: Does that hamper the crawling process?
Blake: In the first release, yes. The crawler would actually crash with a fatal error because it was expecting the profile to be there, but it wasnt. That would happen in the first couple days of launch before I patched it, which is pretty funny because that means the profiles were becoming invalidated within 1-3 days!

Thursday, January 14, 2010

SixDegreeSteam: Synopsis

I was curious as to a means that people like myself could come to an understanding on roughly how this program works... I understand what the goal of it is, but how do you intend to do this?...

- TopRaman

While I have explained a lot of SixDegreeSteam's abstract mechanics in previous articles, I feel they were not programmatically detailed enough to give a good picture of how exactly the program operates. So, Ill try to fill that gap here. There are three components to the project; Ill try to discuss all of them as simply and cleanly as possible.

WARNING: Previous articles have focused on previous versions of the project. Likewise, this article with focus on the current version of the project as of this writing (1.5C). I feel this current version is the most efficient, so it is likely to stick around for a long time. But, this is a disclaimer just in case the project does change again substantially.

The first and (thus far) most time-consuming component is the crawler. Without it, the project would not have a dataset. The crawler has the straight-forward job of collecting links to other profiles from one profile, saving (queueing) those links, then opening them back up sequentially to collect more links. The process continues until all links are collected and thereby all profiles are analyzed. This process of collecting, storing, reading, and repeating is called crawling. The specific crawling logic will not be described here for the sake of brevity, but all the details can be found in an earlier post titled SixDegreeSteam: Challenges. Program-wise, though, the crawler downloads the pages containing the links, extracts the links using a combination of Regular Expressions and XML parsing, and inserts them into a SQL database table called the crawler queue. The crawler also extracts some basic profile information, such as SteamID, profile name, and avatar, in a similar manner and inserts it all into another table called the user dataset (or group dataset). The complete list of datums stored per user is SteamID, last crawl time (to prevent recrawling a user too frequently), profile name, avatar, friends, and group memberships. The complete list of datums stored per group is similar: SteamID, last crawl time, group name, avatar, and members.

Given that the crawler does its job correctly, we are left with a database filled (and I mean FILLED) with information about the users and groups of the Steam Community. Oddly enough, the information is pretty inconsequential without a way to harness the datasets. So, our second component, aptly named Pathfinder, is arguably as important as the crawler. Pathfinder has the soul purpose of using the information available through the database to calculate a lowest-cost path between two given nodes. To do this, Ive opted for an object-oriented rendition of the breadth-first search graph theory algorithm. Once again, for the sake of brevity, I wont go into detail about the algorithm. If you are curious about it, follow the link or run a Google search. There are tons of articles that have covered it far better than I can. After the algorithm is applied, a list of users and groups used to reach profile B from profile A the quickest is displayed with links to each profile and avatars for recognition purposes.

The third and newest addition to the component set is a sort of network browser. Using a graphical, navigable web of the dataset, users will be able to quickly traverse the entire Steam Community social network, allowing them to get a better idea of just where they fit into it all. This component is simply an auxiliary to the project and is not a real focus. As such, it will be the last to be implemented and will only even enter the development picture after the crawler and Pathfinder are thoroughly completed.

As was said about Pathfinder, the information gathered by the crawler is pretty useless to the end user until a method of putting it to work is created. This explains why the crawler has been operational for nearly a month now, yet the site is still empty.